Faster Multi-Site Replication and Resync

on Multicloud 23 November 2022

Faster Multi-Site Replication and Resync

First introduced in late 2021, multi-site Active-Active Replication has grown to be one of the most impactful MinIO features. MinIO has long had the ability to replicate between buckets to synchronize objects, delete operations and metadata changes, but Multi-site Active-Active Replication goes beyond bucket replication to synchronize all buckets, IAM, security tokens, service accounts and bucket-level configurations.

As businesses gather and analyze greater and greater volumes of data, they may need to replicate data between data centers, clouds and other geographic locations, perhaps due to security regulations or compliance requirements, or to improve data accessibility and decrease latency. Replication must be secure and efficient, and not interfere with other cluster operations, in order to be viable.

To this end, we recently improved resynchronization capabilities, as well as the efficiency, reporting and overall performance of Active-Active Replication.

Replication can be configured using the MinIO Console or the MinIO Client (mc replicate). Replication requires versioning be enabled on source and destination sites/buckets. Replication preserves the last-modified system metadata property from the source to the destination object. Replication is resilient and automated – any healing failures will automatically be fixed so replication continues unabated. Applications do not require modification to access data on either cluster. For more information, please see Enable Server Side Multi-Site Bucket Replication

In most cases, MinIO replication is set-and-forget, and resynchronization isn’t necessary. MinIO replication automatically synchronizes objects and settings, healing and copying objects in the background asynchronously. Resynchronization is only used when MinIO clusters fall out of sync, perhaps because of hardware failure. Resynchronization is essentially processing a HEAD command on both clusters, then examining metadata version and time, and finally replicating only what is needed. The replicate resync and/or admin replicate resync commands are needed only in DR-type situations as they are inherently costly operations to actively list and compare objects.

Resynchronization and replication figure prominently into the use of multi-site replication for disaster recovery strategies, where one cluster may be active while the other is simply a hot spare for failover. In this architecture, you start with one active cluster, deploy another, then configure and start site replication. This maintains very high availability in the event of a disaster, with a load balancer configured to automatically route traffic to the healthy MinIO deployment. In the unlikely event of a total site failure, you would deploy a new MinIO cluster and then completely resynchronize from the hot spare/failover cluster to the new cluster.

In an instance where you are already running bucket replication between clusters, if you choose to upgrade to site replication, you will first run a manual resync and then MinIO will continue to synchronize between clusters.

Use the mc replicate resync command to completely resynchronize the remote target to the source (documentation). Resynchronization checks all objects in the source cluster against all configured replication rules. For each object that matches a replication rule, the resynchronization process places the object into the replication queue regardless of that object’s current replication status. MinIO skips synchronizing those objects whose remote copy exactly matches the source, including metadata.

We took this opportunity to improve Replication efficiency and usability every way we could. This is a heavily used feature for multi-cloud enterprises so we anticipate each improvement yielding a large impact.

Replication now heals objects more often. Previously, replication failures were only healed when the scanner passed over the namespace prefix where an object resides. Now, separate from replication activities, every time MinIO runs operations such as LIST, GET, or PUT, a heal operation runs so it doesn't have to run during replication. We also improved the efficiency of the replication failure queue. Now replication failure healing is actively managed by the replication system without dependency on the scanner activities.

Finally, we’ve improved reporting of replication status and statistics. MinIO now persists all the in-memory replication-related statistics to disk and reads it on cluster reboot to create a more accurate picture of the replication backlog. This enables more accurate reporting of replication statistics.

Replication Drives the Multi-Cloud

MinIO site replication allows you to synchronize data between deployments, enabling disaster recovery, geographic load-balancing and complex multi-cloud data processing and analytics. Ingest data where it is generated or captured – at the edge, in the cloud or in the datacenter – and replicate it to where it will do you the most good. In the multi-cloud model, you leverage best-of-breed applications across cloud providers seamlessly to realize the greatest value from your data. Replicate to build redundancy for high-availability architectures.

Download MinIO and get started replicating today. Any questions? Reach out to us on Slack.