We recently, as of RELEASE.2023-05-04T21-44-30Z, extended the functionality of Batch Replication to enable two-way replication between MinIO and AWS S3, and allows for configuring and launching replication from either the receiver or transmitter MinIO deployment (Pull Replication). This is the latest addition to the MinIO Batch Framework that allows you to create, manage, monitor and execute jobs using a YAML job definition file. Batch jobs run directly on MinIO Server to leverage available server-side compute resources.
This blog post focuses on the newly added functionality. Please see Announcing MinIO Batch Framework – Feature #1: Batch Replication for implementation details and sample YAML for configuration.
Two Way AWS S3-MinIO Batch Replication
Have you ever wished for an easy way to copy data from S3 to MinIO, or vice versa? Wish no more, S3-MinIO Batch Replication is here.
Any S3 compatible source or target can be used as long as the other endpoint is a MinIO deployment. Replication is efficient and speedy because it is a simple one-way copy of the newest version of an object and its metadata, with the only caveat being that the object version ID and Modification Time cannot be preserved at the target. This is a great way to get data out of an S3 compatible source – including AWS S3 – and into MinIO. Or, simply use S3-MinIO Batch Replication to make a point-in-time copy.
Pull Batch Replication
Previously, Batch Replication had to be configured and launched from the source MinIO deployment. Now, Batch Replication can be configured and launched from either the source or target MinIO deployment.
We did this in order to provide Batch Replication to customers running multiple versions of MinIO. There’s always going to be some reason why something in the enterprise stack hasn’t been upgraded to the latest version. Some customers choose to run older MinIO releases, yet they want Batch Replication to move data between deployments, locations or clouds. This addition enables them to stay on their old release for as long as they want (five year release support is included with the Enterprise license) and still make full use of new Batch Replication features.
As a refresher, in late 2022, we introduced the Batch Framework and its first feature, Batch Replication, to MinIO. Batch Replication has rapidly become a widely-implemented feature for our customers, and it is used to build data pipelines and to copy or move data between MinIO deployments.
Batch functions run directly on the MinIO Server, removing any potential bottlenecks or possible failure points that could be caused by running these operations from a workstation. Users don’t need special permissions to run batches because they run on the server side. At runtime, batches read in a YAML configuration file, which in the case of replication contains origin bucket, filters/flags and a destination target (bucket and credentials).
Creating and editing the file,
replication.yaml, allows you to set the origin bucket, object filters/flags and a destination bucket (with credentials). MinIO Server can run multiple Batch Replication jobs at the same time, and you can list them using
mc batch list and check their status using
mc batch status.
Batch Replication replicates existing objects that meet the filters specified in the YAML configuration file (if there is no configuration file then the job runs anyway, without filters). Batch Replication is an effortless way to fill a new bucket, in a new location, with objects. Any number of objects can be replicated with minimal code in a single job and notifications are sent when replication is complete.
MinIO provides additional replication methods, including Active-Active Replication, to copy objects in the background that were created after the replication rule was configured. MinIO has hundreds of customers that rely on bidirectional Active-Active Replication to keep deployments synchronized. Active-Active Replication is continuous, with no defined start and end time. It runs on MinIO in the background to continuously update the second MinIO deployment with objects that are written to the first. This type of replication is best for HA, geographic load balancing and BC/DR.
Set Your Data Free with Batch Replication
MinIO Batch Replication enables you to build data pipelines that push data wherever you need it and verify that the job completed successfully, and now you can use any version of MinIO or any S3 compatible storage as a replication endpoint. One way this can be used is to ingest data in one location and move it elsewhere for analysis or AI/ML. For example, IoT data can be ingested into MinIO at the edge, replicated to a central data lake built on MinIO in the datacenter and then replicated to an S3-compatible cloud provider to run specific tools, such as Spark and Iceberg on AWS S3.
Batch Replication provides a powerful, efficient mechanism for building data-driven applications across the multi-cloud. The recent addition of pull functionality extends Batch Replication to previous versions of MinIO, while the ability to replicate to and from S3-compatible object storage opens a world of data pipeline possibilities.