Active-Active Example Using an Email Provider

AJ AJ on Architect's Guide |
Active-Active Example Using an Email Provider

Valuable data must be protected against corruption and loss, yet increasing volumes of data – and increasingly distributed data – make this a daunting task. MinIO includes multiple data protection mechanisms, and this blog post focuses on replication best practices, a key protection for software-defined object storage, and a key enabler of the creation and maintenance of multi-cloud data lakes so you can run workloads where they run best, with your organization’s most current data.

Site replication underlies the functionality of a multi-cloud architecture by setting up multiple sites and handling the replication on the server side. This allows you to simplify data management, metadata and configuration across multiple sites without any additional overhead, all that is required is a modification to the endpoint for the application in case of failures to drives, nodes, or even the entire site.

We will show you a brief example of how to deploy to multiple sites/regions and configure replication – at the same time, we’ve deployed for high availability, placing each of the multi-site MinIO clusters in a separate region to enable a robust business continuity and disaster recovery strategy.

Geographically Distributed Architecture for an Email Service

Email is the ultimate performance-at-scale use case as it generally only goes up in terms of data volume. Further, the more data that’s stored, the more valuable the data becomes. MinIO’s multi-site active-active replication focuses on keeping the cluster in top performance. Configured correctly, it allows you to replicate data across multiple datacenters, cloud - even disaster recovery where one site going offline will not decrease availability. Everything is handled on the server side, the email application does not need to be modified in any way as everything is handled by MinIO. You can change your backend with MinIO because of its API-friendly interface and the SDKs will allow you to get up and running without much modification.

In a distributed, production-ready, software-defined infrastructure (MinIO, VPC, Unbound, NGINX), we generally recommend deploying three identical MinIO clusters, each in its own region, with active-active replication between all three. The advantage of this design is that if one of the MinIO nodes in a particular site is down the other nodes can reroute email data because NGINX will reroute the traffic to the healthy nodes. Moreover, if the entire site goes offline the other two sites can handle reads and writes without any changes to the application since any data written to the other sites will get replicated over the site which was offline once it comes online.

How to Set up Active-Active Replication

Now we will actually set up Active-Active replication on MinIO. This involves launching the VMs using Terraform and then configuring site replication with the MinIO mc client.

Go into each of the minio-1, minio-2 and minio-3 directories and run the terraform command to launch infrastructure.

cd minio-1
terraform apply
cd minio-2
terraform apply
cd minio-3
terraform apply

Once all three sites are up, the output will show the Public IPs of the Unbound and NGINX interfaces along with the Private IPs of the 3 MinIO nodes similar to below.

hello_minio_aws_instance_nginx = "<public_ip>"
hello_minio_aws_instance_unbound = "<public_ip>"
minio_hostname_ips_map = {
"server-1.minio.local" = "<private_ip>"
"server-2.minio.local" = "<private_ip>"
"server-3.minio.local" = "<private_ip>"
}

Don’t forget to configure the Unbound A records using the configuration we showed earlier in the previous step when configuring unbound.

Log into one of the NGINX nodes in site 1 to set up mc alias for all three sites. Ensure there is no data in any of the sites.

mc alias set minio1 http://<nginx_public_ip> minioadmin minioadmin
mc alias set minio2 http://<nginx_public_ip> minioadmin minioadmin
mc alias set minio3 http://<nginx_public_ip> minioadmin minioadmin

Let's set it up so it replicates across all 3 sites

$ mc admin replicate add minio1 minio2 minio3
Requested sites were configured for replication successfully.

Verify the 3 sites are configured correctly

mc admin replicate info minio1
SiteReplication enabled for:
Deployment ID                    	| Site Name   	| Endpoint
f96a6675-ddc3-4c6e-907d-edccd9eae7a4 | minio1      	| http://<nginx_public_ip>
0dfce53f-e85b-48d0-91de-4d7564d5456f | minio2      	| http://<nginx_public_ip>
8527896f-0d4b-48fe-bddc-a3203dccd75f | minio3      	| http://<nginx_public_ip>

Check to make sure replication is working properly

mc admin replicate status minio1
Bucket replication status:
No Buckets present
Policy replication status:
●  5/5 Policies in sync
User replication status:
No Users present
Group replication status:
No Groups present

Create a bucket in minio1

/opt/minio-binaries/mc mb minio1/testbucket

Add any object into the bucket

/opt/minio-binaries/mc cp my_object  minio1/testbucket

List the objects in the other sites, in this case both minio2 and minio3

/opt/minio-binaries/mc ls minio2/testbucket
[2022-12-19 18:52:09 UTC] 3.0KiB STANDARD my_object 
/opt/minio-binaries/mc ls minio3/testbucket
[2022-12-19 18:52:09 UTC] 3.0KiB STANDARD my_object

As you can see it's almost instantaneous to replicate data to other sites even though they are geographically disparate. For a couple of objects you can use ls but for a large group of objects in MinIO buckets the differences between them can be seen using mc diff.

Final Thoughts

As the volume of email increases not only does it need to be protected against corruption and loss, but also it must also be subjected to the daunting task of distributing the data for load-balancing and BC/DR. In this blog post, we focused on replication best practices using active-active replication for email data. Site replication features the functionality of MinIO’s multi-cloud architecture by setting up multiple sites and handling the replication on the server side. This avoids any additional overhead required for the email application to be modified in case of failures for the entire sites or individual disks.

What are you waiting for? If you have any questions regarding any of our replication strategies be sure to reach out to us on Slack!