One of the core functionalities that truly makes MinIO shine is its ability to ensure data redundancy and durability without the need for any specialized hardware such as RAID controller cards or redundant SAN clusters. In fact, our engineers recommend installing MinIO on a plain JBOD system with fast drives without any additional hardware RAID. Just let MinIO handle the data redundancy, replication and availability using Erasure Coding which is far more efficient than RAID.
In this post we’ll talk about Erasure Coding and Erasure Sets, and then dive deeper into how to use the Erasure Code Calculator when designing deployments to make the most out of MinIO by opting for the right hardware configuration setup from the get go.
Erasure Coding and Erasure Sets
In a previous blog post, Erasure Coding 101, we discussed Erasure Coding and dove deeply into its implementation and best practices. In short, one of the most common types of hardware failures is drives going bad, traditionally in the past engineers had to depend on hardware RAID (or even Software RAID) for data redundancy and availability. RAID can protect data from drive failures on a single server, but when you involve a cluster of separate physical servers, each one of them needs to manage its own RAID parity and redundancy individually on its own node. A big drawback of hardware RAID is that it takes a painfully long time to rebuild the data from a failed drive on a new drive. For today's amount of data it actually might not even be feasible as it will most probably take a few days to rebuild just a single TB in most cases. At PB scale (which is possible in a single box) it might not even completely rebuild. Erasure Coding ensures there is no need for a separate hardware redundancy but rather it splits data files into data and parity blocks and encodes them so that the primary data is recoverable even if part of the encoded data is unable to be recovered. Erasure Coding is able to achieve the same level of fault tolerance as hardware RAID cards but with much better efficiency and performance by striping it not only across multiple drives but also multiple nodes.
But what is an Erasure Set? An Erasure Set is basically a set of drives, and a cluster is divided into one or more Erasure Sets based on the total number of drives. MinIO will ensure the data and parity blocks are deterministically and uniformly distributed across all the available drives in the Erasure Set. MinIO automatically calculates the stripe size (between 2 and 16). Setting the parity for a deployment is a balance between availability and total usable storage. Higher parity values increase resiliency to drive or node failure at the cost of usable storage, while lower parity provides maximum storage with reduced tolerance for drive/node failures. Use the MinIO Erasure Code Calculator to explore the effect of parity on your planned cluster deployment. While customizing set size and parity is possible, it is almost always recommended to use the MinIO defaults.
Using the Calculator
Let’s get started using the Erasure Code calculator . Simply provide the calculator with possible hardware and software configurations and it will tell you the resulting capacity and fault tolerance.
Let’s take a look at some of the parameters that can be provided to the calculator that would give us the required command line parameters.
The parity settings can be set via MC admin CLI tool. But you can also customize the parity settings by passing them as configuration variables either as environment variables or in /etc/default/minio. To reiterate, new objects will take new settings, old objects retain the old setting until they are reuploaded or server side copied.
Number of Racks: For larger deployments, MinIO allows you to configure the number of racks such that Erasure Code is aware of rack level parity wherein the nodes are local but in different racks.
Number of Servers per Rack: Generally the basic configuration is a single rack and we typically recommend multiple servers in that rack.
Number of Drives per Server: We recommend at least a bare minimum of 4 drives per server. But that is really at the low end of the performance and capacity spectrum. While it's possible to increase the size of a cluster by adding additional Server Pools, this adds additional complexity to ongoing system maintenance and nuances to how the data is accessed as each Server Pool is a different S3 API endpoint. We recommend building your server as beefy as possible as far as the drives are concerned from the get go.
Drive Capacity: Not only the number of drives, but the size of each drive matters as well. The larger the capacity of each drive the fewer drives need to be added to the server, or you can max out MinIO capacity by filling your servers with high-capacity drives to achieve your optimal storage configuration.
Erasure Code Stripe Size: MinIO splits each object into data and parity blocks. Stripe size (S) is the total number of data blocks (K) and parity blocks (M), i.e, S=K+M.
Erasure Code Parity: Higher parity (M) trades increased availability and data resiliency in the event of drive or server failure for reduced storage efficiency. Lower parity (M) trades increased storage efficiency for decreased availability/resiliency.
Usable Capacity: After everything is said and done, this is the actual usable capacity that is available for you to store objects. It also shows the raw capacity and how efficiently the space allocated for parity-related operations is used.
Drive/Server/Rack Failure Tolerances: Based on the configuration selected, the calculator also shows the failure tolerances at various levels throughout the hardware stack. For instance, in the above example the server tolerance is 2, so what that means is out of the 8 servers up to 2 can go offline at any given time. At maximum parity, MinIO can tolerate the loss of up to half the drives per erasure set ((N / 2) - 1) and still perform read and write operations. MinIO defaults to 4 parity blocks per object with tolerance for the loss of 4 drives per erasure set.
CLI Parameter: This is the final configuration string you add at the end of your `minio server` command either in /etc/default/minio or as part of the container environment variables.
As you can see, there are a number of hardware parameters that your team needs to consider before you can deploy the right configuration for your application. So it is worth it to briefly spend some time on what kind of hardware you should procure in order to match the erasure code settings you chose in the calculator.
Drives: Since the primary use case here is storage, let's start with talking about the drives. There are a number of different types of drives and each of them vary vastly between cost, performance and use cases. Generally if you are using the MinIO cluster for basic object storage in production, then you should probably consider something like an SSD that gives a good balance between cost and performance. If you are working with backups and archived data that is over a year old and it isn’t queried very often, but it still needs to be queryable although not at lightning speeds, then it will be siphoned off and tiered to more affordable media. In these cases, you will most probably go with simple spindle SATA HDDs in the archive tier to save on cost. This is the beauty of MinIO; it's very simple, flexible and runs on myriad hardware.
NIC: After Drives the NIC speed is probably one of the most important factors that needs consideration. Because data could be stored across multiple servers, you need to ensure that the data speed between the servers is as fast as possible. No more are the days of 1 Gbps or even 10 Gbps Ethernet NICs. These days, if you truly desire top performance you need to get at least 25 Gbps speeds, preferably with dual NICs for added redundancy and to isolate intra-cluster traffic. For high performance, we also recommended 100 Gbps NICs.
CPU: MinIO is highly CPU efficient, using at most less than 20% of a CPU core for CPU intensive tasks such as Erasure Coding, Encryption, Compression, and other operations. That being said, make a decision based on your application and use case. We recommend you get at least a CPU with 8 CPU cores for the best performance and general operations.
Memory: Similar to CPU, MinIO is not a memory intensive application, although during the course of its various operations it might burst to require a large amount of memory which it then later relinquishes. The recommendation is to generally get 128GB memory per server in your cluster.
Servers/Racks: MinIO not only recommends drive level redundancy, but we also recommend using multiple servers to distribute the data across physical nodes and racks as a precaution for maintaining high-performance and high availability. Commodity server hardware has come a long way, but servers can still fail. Even though server failures are much more infrequent compared to drive failures, they do still happen, and it helps for you to be prepared before it happens.
Before finalizing your hardware be sure to reach out to our engineering team on the optimal configuration for your cluster based on your application requirements.
Benefits of Using the Erasure Code Calculator
As you can see, using the calculator will tremendously help you in the early stages of architecting your cluster. We always say at MinIO the initial configuration of the cluster is the most crucial. If you are not yet a customer we can help you with design, architecture and implementation with the guidance from our Engineering team. If you are already a customer, then the MinIO engineering team is a mere click away using SUBNET to help you in architecting the most efficient plan to set up your cluster. Our engineers have worked on configuring several clusters over the years and can guide you to set up with your optimal requirements.
The calculator makes it very easy to visualize the cluster topology, configuration and the net usable capacity available to ensure you purchase the right hardware for your requirements.If you have any questions be sure to reach out to us on Slack!