Migrate from AWS S3 to MinIO on Equinix Metal

AJ AJ on Hybrid Cloud |
Migrate from AWS S3 to MinIO on Equinix Metal

One of the strong use cases for MinIO is the fact that it can run anywhere and on everything. As the industry slowly shifts towards repatriating data to a colo or data center, more and more companies want the same object storage capabilities that they had in the cloud, with full control of the infrastructure.

Why would you want to have data closer to home? There are a number of reasons, but first and foremost is cost. The public cloud has become very expensive. For example, some time ago I had an ElasticSearch managed cluster running in AWS. I was eager to try out this new managed service, but I was not eager to discuss my surprise $30K bill with my boss. This was a painful, yet familiar wake-up call because at that moment I realized that I had just paid AWS six months of cloud budget to do something that I could have set up myself. The moral of the story is that unless you are very careful and monitor your cloud spend closely, it can get out of control very fast.

There is also the matter of security. No matter where your data is located in the public cloud, it's almost always on a node or storage pool that is shared by someone completely unrelated to you; this is the nature of the cloud because it is how virtualization works. The cloud provides a warm feeling of comfort because now someone else must tackle security challenges, but if there are any security-related issues, there will be no insight into the issue (if someone was even able to detect it) and how to resolve it. The feeling of comfort rapidly evaporates when you get stuck securing someone else’s infrastructure in order to protect your data. Many enterprises have enjoyed the return to total control provided by repatriating to MinIO onto hardware they manage.

To make the most of your repatriation efforts, MinIO comes with a number of enterprise-ready features such as Bitrot Protection to ensure data integrity, Tiering to siphon off data to a cold storage tier, Erasure Coding which saves objects as a collection of data and parity blocks, and reconstructs them on the fly without any additional hardware or software. In addition to these, MinIO supports both Encryption at rest and in transit. This ensures that data is encrypted in all facets of the transaction from the moment the call is made until the object is placed in the bucket, where it is then protected with IAM S3-style policies and a built-in or external IDP, see MinIO Best Practices - Security and Access Control for more information.

Repatriation must be planned thoroughly and carefully. If you are dealing with petabytes of data generally it's more cost-effective to have your own infrastructure and servers to run on, you can even build a private cloud with your own (or leased) hardware. In addition, this also includes managing the real estate (colo space), power/UPS, cooling/HVAC, among other components. Don’t be deterred by these as we will demonstrate to you how you can migrate yet the overall ROI is still better than in the public cloud.

A private cloud is like an apartment (as our CEO AB Periasamy likes to say). You are in full control of the costs and expenses that are associated with it, you never wake up to an alert of a surprise bill caused by some recursive loop function that ran overnight. There is, of course, some friction to moving when you are trying to make things better, for instance, when you are trying to expand a highway you inevitably have to close down some lanes so construction can proceed safely, but once it's done you will not only be able to drive on the original lanes but also the newly built ones to handle the capacity.

Two of the most important cost considerations we need to make in the public cloud are the amount of storage space you need and egress costs when accessing/moving that data – these can be around 39% and 42% higher respectively in comparison to your own hardware in your data center or colocation facility. In addition to this, some of the other cost factors to consider are software, hardware, networking/switches, real estate/rack space/colocation rental, S3-API calls – everything you can think of and more. Learn more about possible savings that result from moving to your own private cloud in The Lifecycle of the Cloud.

Between the public cloud and your data center exists a middle ground where you can have full control over infrastructure hardware, without the high initial cost of investment. Equinix Metal, as the name states, provides bare metal servers with the exact specifications requested by the customer. If you want to use NVMe SSDs, then you can add those disks to the bare metal server. Equinix provides a management API to simplify hardware deployment and operations. To the developer/end user, it's as straightforward as launching an instance in the cloud. In fact, there is even a Terraform provider for Equinix Metal (which we’ll show you later).

Let's get started!

Deploy the Infrastructure

While we can deploy resources manually, the DevOps in me wants to automate at least some of the repetitive portions of this process to save time and effort,  especially when we want to do site-to-site replication among other things.

Set Up Equinix Metal Terraform

Equinix is one of the few bare metal providers that have an API to fully automate the infrastructure management process. Using their API, you can automate deploying physical servers, shutting them down and even terminating them. You can do all this without using your own hardware, switches, routers and other resources. This is as close as you can get to public-cloud-level automation while still guaranteeing that no one else is sharing your hardware. Because Equinix Metal supports a myriad of instance types and storage options and interconnects such as SAS or SATA, and SSD, NVMe SSD, or HDD, in a variety of sizes. You can also configure the hardware that MinIO runs on to your exact specifications – right down to the exact type of drive to house MinIO partitions.

No one expects you to write Python scripts to talk to the Metal API; Equinix Metal has a Terraform Provider that allows us to connect to it and provide the high-level information needed to deploy cluster resources, while abstracting the internal jugglery required to get the networking, hardware, MinIO and other applications set up.

provider "metal" {
auth_token = var.auth_token
}

If you don’t have Terraform installed already, you can download it from their downloads page.

Clone the GitHub repo equinix/terraform-metal-distributed-minio to your local workstation.

git clone https://github.com/equinix/terraform-metal-distributed-minio.git

Go into the repo and initialize Terraform so that it can download all the required modules and plugins from upstream.

$ cd terraform-metal-distributed-minio
$ terraform init

This will ensure all the modules required are downloaded automatically. Now, let's make sure a couple of mandatory variables are set. You can either set them as environment variables or there is a file in the repo cloned above called vars.template, which you can copy as cp vars.template terraform.tfvars.

Ultimately, whichever method you choose, you need to set the following two variables

  • auth_token
  • project_id

You can find more information on these in the API docs.

There are several other variables that you can modify in terraform.tfvars, and we will modify the following later when we do site-to-site replication.

Once you have your preferred configuration set, apply the Terraform plan. If the plan looks okay, run the approve command.

$ terraform plan
$ terraform apply --auto-approve

If the resources have been applied properly with the right configuration then the resulting output should look something like this

Apply complete! Resources: 10 added, 0 changed, 0 destroyed.

Outputs:

minio_access_key = Xe245QheQ7Nwi20dxsuF
minio_access_secret = 9g4LKJlXqpe7Us4MIwTPluNyTUJv4A5T9xVwwcZh
minio_endpoints = [
  "minio-storage-node1 minio endpoint is http://147.75.65.29:9000",
  "minio-storage-node2 minio endpoint is http://147.75.39.227:9000",
  "minio-storage-node3 minio endpoint is http://147.75.66.53:9000",
  "minio-storage-node4 minio endpoint is http://147.75.194.101:9000",
]
minio_region_name = us-east-1

This is the whole shebang. When you see this output, not only have your physical servers been provisioned, but also MinIO has been deployed to these nodes, and the nodes have been configured as a cluster of distributed storage.

Access MinIO Cluster

We used Terraform to automate most of the process, so now all that is left is to access the MinIO cluster. Our recommended tool is to use mc. Use the following command to download the binary

curl https://dl.min.io/client/mc/release/linux-amd64/mc \
  --create-dirs \
  -o $HOME/minio-binaries/mc

chmod +x $HOME/minio-binaries/mc
export PATH=$PATH:$HOME/minio-binaries/

Create an alias that points to the MinIO cluster we deployed

mc config host add minio1 $MINIO_ENDPOINT $MINIO_ACCESS_KEY $MINIO_SECRET_KEY

You can replace the variables above with the values that you set while launching the MinIO cluster via Terraform, but make sure to set the alias name to minio1. This will make sense later when we show you how to do site-to-site replication.

Check to see if you are able to connect successfully by fetching some metadata from the cluster

$ mc admin info minio1 --json | jq .info.backend

{
  "backendType": "Erasure",
  "onlineDisks": 48,
  "rrSCData": 6,
  "rrSCParity": 2,
  "standardSCData": 6,
  "standardSCParity": 2
}

If you see an output similar to the above then you are able to successfully access the MinIO cluster via the mc command. So what’s next? When should we migrate the data from S3?

Load Balancing the MinIO Cluster

We can migrate the data from S3, or even add some of our own data, and start using the cluster. But let’s take it one step further. We want to achieve the same level of redundancy as AWS S3, meaning if one site goes down we want to make sure our data is accessible on another site. AWS accomplished this with regions, but how do we accomplish this with MinIO?

Now, we can see the beauty of the little automation we did with Terraform earlier. Let me show you how simple it is to get another MinIO region up in Equinix Metal.

Lets git clone our source repo again, but this time in a new directory terraform-metal-distributed-minio-site-2

git clone https://github.com/equinix/terraform-metal-distributed-minio.git terraform-metal-distributed-minio-site-2

Go into the terraform-metal-distributed-minio-site-2 repo and initialize Terraform so that it can download all the required modules and plugins from upstream similar to the original MinIO deployment.

$ cd terraform-metal-distributed-minio-site-2
$ terraform init

Once all the modules have been downloaded, copy the variables file cp vars.template terraform.tfvars and set the two variables

  • auth_token
  • project_id

So far the process should look very similar to how we launched the first cluster, but this is where things will differ.

Let's set the variables that differentiate the second site from the first site.

First, let's set the facility to sv16  or pick one from this list of facilities. Next set the minio_region_name to us-west-1 or anything that differentiates it from the other cluster.

Run the plan to ensure the changes you made get reflected in the output.

$ terraform plan
$ terraform apply --auto-approve

If the resources have been applied properly, with the right configuration, then the resulting output should look something like this

Apply complete! Resources: 10 added, 0 changed, 0 destroyed.

Outputs:

minio_access_key = Xe245QheQ7Nwi20dxsuF
minio_access_secret = 9g4LKJlXqpe7Us4MIwTPluNyTUJv4A5T9xVwwcZh
minio_endpoints = [
  "minio-storage-node1 minio endpoint is http://144.45.65.29:9000",
  "minio-storage-node2 minio endpoint is http://144.45.39.227:9000",
  "minio-storage-node3 minio endpoint is http://144.45.66.53:9000",
  "minio-storage-node4 minio endpoint is http://144.45.194.101:9000",
]
minio_region_name = us-west-1

If you see minio_region_name as us-west-1 then you have successfully launched the second cluster. Let's get that added to mc.

mc config host add minio2 $MINIO_ENDPOINT $MINIO_ACCESS_KEY $MINIO_SECRET_KEY

Set the alias name to minio2 and check to see if you are able to connect successfully by fetching some metadata from the cluster

$ mc admin info minio2 --json | jq .info.backend

{
  "backendType": "Erasure",
  "onlineDisks": 48,
  "rrSCData": 6,
  "rrSCParity": 2,
  "standardSCData": 6,
  "standardSCParity": 2
}

At this point, you should have 2 sites: minio1 and minio2.

Let's set up replication across both clusters

$ mc admin replicate add minio1 minio2
Requested sites were configured for replication successfully.

Verify the both sites are configured correctly

mc admin replicate info minio1

SiteReplication enabled for:

Deployment ID                    	| Site Name   	| Endpoint
f96a6675-ddc3-4c6e-907d-edccd9eae7a4 | minio1      	| http://<site1_public_ip>
0dfce53f-e85b-48d0-91de-4d7564d5456f | minio2      	| http://<site2_public_ip>


Check to make sure replication is working properly

mc admin replicate status minio1

Bucket replication status:
No Buckets present

Policy replication status:
●  5/5 Policies in sync

User replication status:
No Users present

Group replication status:
No Groups present

Test by creating a bucket in minio1

/opt/minio-binaries/mc mb minio1/testbucket

Add any object into the bucket

/opt/minio-binaries/mc cp my_object  minio1/testbucket

List the objects in the other sites, in this case on minio2

/opt/minio-binaries/mc ls minio2/testbucket
[2023-07-20 18:52:09 UTC] 3.0KiB STANDARD my_object

As you can see it's almost instantaneous to replicate data to other MinIO deployments, even though they are geographically disparate.

Let’s do a quick test to see if this is really as simple as it appears. Remember that MinIO is a drop-in replacement for AWS S3, so everything that is supposed to work with S3 will work with MinIO too. In this case, we will use a Terraform to upload an object to a MinIO bucket. In Terraform this is done through the AWS provider which is essentially a module that connects to the AWS API to perform various operations in the AWS ecosystem, but in this case, we will use the Terraform AWS S3 resource to access MinIO bucket.

Create an AWS provider in Terraform like below. Make sure you update the details to match the Equinix Metal minio1 cluster we just deployed.

provider "aws" {
    region = "us-east-1"
    access_key = "Xe245QheQ7Nwi20dxsuF"
    secret_key = "9g4LKJlXqpe7Us4MIwTPluNyTUJv4A5T9xVwwcZh"
    skip_credentials_validation = true
    skip_metadata_api_check = true
    skip_requesting_account_id = true
    s3_force_path_style = true
    endpoints {
        s3 = "http://147.75.65.29:9000"
    }   
}

Upload a file using the terraform aws_s3_bucket_object resource

resource "aws_s3_bucket_object" "object" {
    bucket = "public"
    key = "my_file_name.txt"
    source = "path/to/my_file_name.txt"
    etag = filemd5("path/to/my_file_name.txt")
}

As you can see above, we haven’t used any MinIO-specific Terraform resource, we are using AWS provider aws_s3_bucket_object resource. Even though we are using the existing AWS S3 Terraform resource, the object store is completely powered by production enterprise-grade MinIO.

Migrating Data from AWS S3

We now have all the building blocks ready to go for you to have production-grade object storage and total control of the entire infrastructure. Next, we’ll migrate data that’s already in S3.

There are a number of ways you can migrate your data from AWS S3 to MinIO, but the one we recommend is using mc.

mc mirror is a Swiss army knife of data synchronization. It can copy objects from S3 or S3-API-compatible object stores and mirror them to MinIO. One of the more popular use cases of this is mirroring an Amazon S3 bucket to MinIO in order to expose data to non-AWS applications and services.

Create a new IAM policy with an access key and secret key, allowing access only to our bucket. Save the generated credentials for the next step.

Next let’s add an alias using the S3 bucket name we created along with the credentials we downloaded.

/opt/minio-binaries/mc alias set s3 https://s3.amazonaws.com BKIKJAA5BMMU2RHO6IBB V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12 --api S3v4

Use mc mirror to copy the data from S3 to MinIO

mc mirror s3/mybucket minio1/testbucket

Depending on the amount of data, network speeds and the physical distance from the region where the bucket data is stored, it might take a few minutes or more for you to mirror all the data. You will see a message when mc is done copying all the objects.

Once the data is copied or while the data is being copied, list the contents of the bucket in minio2 site. You will notice that some of the data is already there from minio1.

/opt/minio-binaries/mc ls minio2/testbucket
[2022-12-19 18:52:09 UTC] 3.0KiB STANDARD all_object s

Ultimately the laptop where you are running mc mirror from is a bottleneck because the data has to traverse through the system where mc mirror command is executed. This could be several petabytes of data which could take days if not weeks to migrate depending on the network speed. To migrate data from S3 to MinIO, there is a more efficient method called Batch Replication, please see How to Repatriate from AWS S3 to MinIO to learn more about Batch Replication and other migration best practices.

Put the Pedal to the Metal

This blog post demonstrated that MinIO running on Equinix Metal NVMe SSDs, in a site-to-site replication configuration, will give you the same level, if not more, of performance, data durability and resilience at a fraction of the cost of S3 while retaining full control over your cloud.

Do you have 100% control of all the infrastructure? Not quite. The switches, routers and other networking gear are managed by Equinix, but the advantages of being on their network exceed the disadvantages. You can get point-to-point, WAN, or other dedicated circuits to virtually connect to any other provider out there. In essence, you can have a private circuit connected directly to AWS (via Equinix Connect) and then you can move your data 10 times as fast, all the while being secure because it's not traversing the open public internet, and only your data goes through that circuit.

Moreover, MinIO benchmarks repeatedly show very little (<1%) throughput performance degradation with encryption turned on, therefore we recommend that all MinIO deployments use encryption at rest and all MinIO deployments should also secure network communications using TLS. Congratulations, now your data is on a more secure,  yet transparent, system where you have full control and accountability.

If you have any questions on migrating from AWS S3 to MinIO on Equinix Metal, be sure to reach out to us on Slack!