Updated NVMe Benchmark: 2.6Tbps+ for READS

Updated NVMe Benchmark: 2.6Tbps+ for READS

MinIO is a strong believer in transparency and data driven discussions. It is why we publish our benchmarks and challenge the rest of the industry to do so as well.

It also is why we develop tools that allow a clean, clear measurement of performance and can be easily replicated. We want people to test for themselves.

Further, we do our benchmarks on commodity hardware without tuning. This is fundamentally different from the highly tuned, specialized hardware approaches used by other vendors which, predictability, have given benchmarks a bad name.

We challenge the rest of the industry to follow suit.

We recently updated our benchmark for primary storage. For our customers, primary storage utilizes NVMe drives due to their price/performance characteristics. We will update our HDD benchmark shortly for those customers looking to understand HDD price/performance.

In this post we will cover the benchmarking environment, the tools, how to replicate this on your own and the detailed results. For those looking for a quick take, the 32 node MinIO cluster results can be summarized as follows:

Instance Type

PUT/Write

GET/Read

Parity

mc CLI ver. 

MinIO ver.

i3en.24xlarge




165 GiB/sec

325 GiB/sec


EC:4


RELEASE.2021-12-29T06-52-55Z


RELEASE.2021-12-29T06-49-06Z


On an aggregate basis this delivers PUT throughput of 1.32 Tbps and GET throughput of 2.6 Tbps. We believe this to be the fastest in the industry.

Benchmarking Setup

MinIO believes in benchmarking on the same HW it would recommend to its customers. For primary storage, we recommend NVMe. We have followed this recommendation for over a year now as our customers have shown us that the price/performance characteristics of NVMe represent the sweet spot for these primary storage workloads.

We used standard AWS bare-metal, storage optimized instances with local NVMe drives and 100 GbE networking for our efforts. These are the same instances that MinIO recommends to its production clients for use in the AWS cloud.

Instance

# Nodes

AWS Instance Type

CPU

MEM

Storage

Network

Server

32

i3en.24xlarge

96

768GB

8x7500GB

100 Gbps


For the software, we used the default Ubuntu 20.04 install on AWS, the latest release of MinIO and our built in Speedtest capability.

Property

Value

Server OS

RELEASE.2021-12-29T06-52-55Z

MinIO Version

RELEASE.2021-12-29T06-49-06Z

Benchmark Tool

mc admin speedtest

Speedtest is built into the MinIO Server and is accessed through the Console UI or mc admin speedtest command. It requires no special skills or additional software. You can read more about it here.

Measuring Single Drive Performance

The performance of each drive was measured using the command dd. DD is a unix tool used to  perform bit-by-bit copy of data from one file to another. It provides options to control the block  size of each read and write.

Here is a sample of a single NVMe drive’s Write Performance with 16MB block-size, O_DIRECT  option for a total of 64 copies. Note that we achieve greater than 1.1 GB/sec of write performance for each drive.

Here is the output of a single HDD drive’s Read Performance with 16MB block-size using the  O_DIRECT option and a total count of 64. Note that we achieved greater than 2.3 GB/sec of  read performance for each drive.

Measuring JBOD Performance

JBOD performance with O_DIRECT was measured using https://github.com/minio/dperf. dperf is a filesystem benchmark tool that generates and measures filesystem performance for both read and write. dperf command operating with 64 parallel threads, 4MB block-size and O_DIRECT by default.

Network Performance

The network hardware on these nodes allows a maximum of 100 Gbit/sec. 100 Gbit/sec equates  to 12.5 Gbyte/sec (1 Gbyte = 8 Gbit).

Therefore, the maximum throughput that can be expected from each of these nodes would be  12.5 Gbyte/sec.

Running the 32-node Distributed MinIO benchmark

MinIO ran Speedtest in autotune mode. The autotune mode incrementally increases the load to pinpoint maximum aggregate throughput.

$ mc admin speedtest minio/

The test will run and present results on screen. The test may take anywhere from a few seconds to several minutes to execute depending on your MinIO cluster. The flag -v indicates verbose mode. The user can determine the appropriate Erasure Code setting. We recommend EC:4 but include EC:2 and EC:4 below.

MINIO_STORAGE_CLASS_STANDARD=EC:2

MINIO_STORAGE_CLASS_STANDARD=EC:3

MINIO_STORAGE_CLASS_STANDARD=EC:4 (default)


Interpretation of Results

The average network bandwidth utilization during the write phase was 77 Gbit/sec and during  the read phase was 84.6 Gbit/sec. This represents client traffic as well as internode traffic. The portion of this bandwidth available to clients is about half for both reads and writes.

The network was almost entirely choked during these tests. Higher throughput can be expected if a dedicated network was available for inter-node traffic.

Note that the write benchmark is slower than read because benchmark tools do not account for write amplification (traffic from parity data generated during writes). In this case, the 100 Gbit  network is the bottleneck as MinIO gets close to hardware performance for both reads and writes.

Conclusion

Based on the results above, we found that MinIO takes complete advantage of the available  hardware. Its performance is only constrained by the underlying hardware available to it. This benchmark has been tested with our recommended configuration for performance workloads and can be easily replicated in an hour for less than $350.

You can download a PDF of the Benchmark here. You can download MinIO here. If you have any questions, ping us on hello@min.io or join the Slack community.