Updated NVMe Benchmark: 2.6Tbps+ for READS
MinIO is a strong believer in transparency and data driven discussions. It is why we publish our benchmarks and challenge the rest of the industry to do so as well.
It also is why we develop tools that allow a clean, clear measurement of performance and can be easily replicated. We want people to test for themselves.
Further, we do our benchmarks on commodity hardware without tuning. This is fundamentally different from the highly tuned, specialized hardware approaches used by other vendors which, predictability, have given benchmarks a bad name.
We challenge the rest of the industry to follow suit.
We recently updated our benchmark for primary storage. For our customers, primary storage utilizes NVMe drives due to their price/performance characteristics. We will update our HDD benchmark shortly for those customers looking to understand HDD price/performance.
In this post we will cover the benchmarking environment, the tools, how to replicate this on your own and the detailed results. For those looking for a quick take, the 32 node MinIO cluster results can be summarized as follows:
Instance Type | PUT/Write | GET/Read | Parity | mc CLI ver. | MinIO ver. |
i3en.24xlarge | 165 GiB/sec | 325 GiB/sec | EC:4 | RELEASE.2021-12-29T06-52-55Z | RELEASE.2021-12-29T06-49-06Z |
On an aggregate basis this delivers PUT throughput of 1.32 Tbps and GET throughput of 2.6 Tbps. We believe this to be the fastest in the industry.
Benchmarking Setup
MinIO believes in benchmarking on the same HW it would recommend to its customers. For primary storage, we recommend NVMe. We have followed this recommendation for over a year now as our customers have shown us that the price/performance characteristics of NVMe represent the sweet spot for these primary storage workloads.
We used standard AWS bare-metal, storage optimized instances with local NVMe drives and 100 GbE networking for our efforts. These are the same instances that MinIO recommends to its production clients for use in the AWS cloud.
Instance | # Nodes | AWS Instance Type | CPU | MEM | Storage | Network |
Server | 32 | i3en.24xlarge | 96 | 768GB | 8x7500GB | 100 Gbps |
For the software, we used the default Ubuntu 20.04 install on AWS, the latest release of MinIO and our built in Speedtest capability.
Property | Value |
Server OS | RELEASE.2021-12-29T06-52-55Z |
MinIO Version | RELEASE.2021-12-29T06-49-06Z |
Benchmark Tool | mc admin speedtest |
Speedtest is built into the MinIO Server and is accessed through the Console UI or mc admin speedtest command. It requires no special skills or additional software. You can read more about it here.
Measuring Single Drive Performance
The performance of each drive was measured using the command dd. DD is a unix tool used to perform bit-by-bit copy of data from one file to another. It provides options to control the block size of each read and write.
Here is a sample of a single NVMe drive’s Write Performance with 16MB block-size, O_DIRECT option for a total of 64 copies. Note that we achieve greater than 1.1 GB/sec of write performance for each drive.
Here is the output of a single HDD drive’s Read Performance with 16MB block-size using the O_DIRECT option and a total count of 64. Note that we achieved greater than 2.3 GB/sec of read performance for each drive.
Measuring JBOD Performance
JBOD performance with O_DIRECT was measured using https://github.com/minio/dperf. dperf is a filesystem benchmark tool that generates and measures filesystem performance for both read and write. dperf command operating with 64 parallel threads, 4MB block-size and O_DIRECT by default.
Network Performance
The network hardware on these nodes allows a maximum of 100 Gbit/sec. 100 Gbit/sec equates to 12.5 Gbyte/sec (1 Gbyte = 8 Gbit).
Therefore, the maximum throughput that can be expected from each of these nodes would be 12.5 Gbyte/sec.
Running the 32-node Distributed MinIO benchmark
MinIO ran Speedtest in autotune mode. The autotune mode incrementally increases the load to pinpoint maximum aggregate throughput.
$ mc admin speedtest minio/
The test will run and present results on screen. The test may take anywhere from a few seconds to several minutes to execute depending on your MinIO cluster. The flag -v indicates verbose mode. The user can determine the appropriate Erasure Code setting. We recommend EC:4 but include EC:2 and EC:4 below.
MINIO_STORAGE_CLASS_STANDARD=EC:2
MINIO_STORAGE_CLASS_STANDARD=EC:3
MINIO_STORAGE_CLASS_STANDARD=EC:4 (default)
Interpretation of Results
The average network bandwidth utilization during the write phase was 77 Gbit/sec and during the read phase was 84.6 Gbit/sec. This represents client traffic as well as internode traffic. The portion of this bandwidth available to clients is about half for both reads and writes.
The network was almost entirely choked during these tests. Higher throughput can be expected if a dedicated network was available for inter-node traffic.
Note that the write benchmark is slower than read because benchmark tools do not account for write amplification (traffic from parity data generated during writes). In this case, the 100 Gbit network is the bottleneck as MinIO gets close to hardware performance for both reads and writes.
Conclusion
Based on the results above, we found that MinIO takes complete advantage of the available hardware. Its performance is only constrained by the underlying hardware available to it. This benchmark has been tested with our recommended configuration for performance workloads and can be easily replicated in an hour for less than $350.
You can download a PDF of the Benchmark here. You can download MinIO here. If you have any questions, ping us on hello@min.io or join the Slack community.