All posts

Supermicro Cloud DC Benchmark

Supermicro Cloud DC Benchmark

We just completed an extensive round of testing on some slick new boxes from our friends at SuperMicro. We really enjoyed putting the Cloud DC through its paces with the help of Muru Ramanathan, Ravi Chintala and Siva Yarramaneni in Supermicro’s San Jose lab.    

We are big fans of the work Supermicro is doing. They blend performance, reliability, exacting craftsmanship and exceptional flexibility in the order/configuration process. While we identify our recommended boxes here, the truth is that we could have easily included a product catalog worth of options.

Supermicro Cloud DC is a speedy platform for MinIO

We conducted the testing on Supermicro’s Cloud DC (SYS-620C-TN12R) boxes. These 2U systems are an excellent platform for hyperconverged storage with up to 12 3.5" hot-swap NVMe/SATA/SAS drive bays, up to 16 DIMM Slots that accommodate up to 6TB DDR4-3200 memory, support for Intel® Optane™ persistent memory, Dual 3rd Gen Intel® Xeon® Scalable processors up to 270W TDP or Single 3rd Gen AMD EPYC™ processor up to 280W TDP, and Dual AIOM (Superset OCP 3.0 NIC) for up to 200 Gbps networking.

The testing produced outstanding results. We ran a cluster of four Cloud DC servers. As is often the case with MinIO, the network proved to be the bottleneck - even though it was 100 Gbps.

We ran the WARP S3 benchmark to measure READ/GET and WRITE/PUT performance of MinIO on the Supermicro Cloud DC cluster. WARP is an open source S3 performance benchmark tool developed and maintained by MinIO.

We saw 42.57 GB/s average read (GET) throughput and 24.69 GB/s average write (PUT) throughput in our testing.

This is first class performance and is why MinIO is used to deliver resource intensive workloads such as advanced analytics, AI/ML and other modern, cloud-native applications. This would be an excellent hardware choice for VMware’s Data Persistence platform running with VSAN Direct, replacing an HDFS data lake, building an internal object-storage-as-a-service on Red Hat Openshift to support DevOps teams or improving Splunk SmartStore performance.  

Let’s get into the details.

Architecture

We installed and ran MinIO on four Supermicro Cloud DC servers, each with dual CPU, 540 GB memory and ten 3.84TB NVMe drives. We then ran WARP on four workstations. All were connected via 100 Gbps Ethernet.

Measuring Single Drive Performance

In order to fully characterize storage performance, we began by assessing the performance of the individual drives. We do this in almost all of our benchmarks because experience has taught us that there is variability.

To achieve this we ran dd to perform bit-by-bit copy of data from one file to another so we could measure read and write performance. dd converts and copies files, outputting statistics about the data transfer.  

To assess write performance we ran:

dd if=/dev/zero of=/mnt/drive/test bs=16M count=10240 oflag=direct conv=fdatasync

To assess read performance we ran:

dd of=/dev/null if=/mnt/drive/test bs=16M count=10240 iflag=direct

The dd utility reported an average of 3.3GB/sec write and 4.6GB/sec read performance for our NVMe drives. On par with expectations.

Measuring JBOD Performance

The next step is to measure the JBOD performance. We accomplished this using IOzone. IOzone is a filesystem benchmark tool that generates and measures filesystem performance for read and write among other operations. The following is an example of an IOzone command operating with 160 parallel threads, 4MB block-size and O_DIRECT option.

iozone -s 1g -r 4m -i 0 -i 1 -i 2 -I -t 160 -b 'hostname'-iozone.out -F /mnt/drive1/tmpfile.{1..16} /mnt/drive2/tmpfile.{1..16} /mnt/drive3/tmpfile.{1..16} /mnt/drive4/tmpfile.{1..16} /mnt/drive5/tmpfile.{1..16} /mnt/drive5/tmpfile.{1..16} /mnt/drive6/tmpfile.{1..16} /mnt/drive7/tmpfile.{1..16} /mnt/drive8/tmpfile.{1..16} /mnt/drive9/tmpfile.{1..16} /mnt/drive10/tmpfile.{1..16}

We saw read throughput of 56.1 GB/sec and write throughput of 30.2 GB/sec on a single node. Right where they should be for the HW spec.

Network Performance

The network hardware on these nodes allows a maximum of 100 Gbit/sec. 100 Gbit/sec equates to 12.5 Gbyte/sec (1 Gbyte = 8 Gbit).

Therefore, the maximum throughput that can be expected from each of these nodes would be 12.5 Gbyte/sec.

There are four nodes being tested, making 50 GB/sec the maximum throughput that our cluster might exhibit. However, no TCP/IP switched Ethernet network is ever 100% efficient, so we expect to see between 40 GB/sec and 45 GB/sec during testing.

Results

We ran WARP in multiple configurations, with and without encryption and TLS, to measure the READ/GET and WRITE/PUT performance. The results, summarized below, are impressive.  


Avg. Read Throughput (GET)

Avg. Write Throughput (PUT)

Distributed

42.57 GB/s

24.69 GB/s

Distributed with Encryption

42.54 GB/s

25.96 GB/s

Distributed with TLS

42.35 GB/s

24.42 GB/s

Distributed with TLS and Encryption

42.41 GB/s

23.88 GB/s

The network was almost entirely utilized during testing. As expected, the 100 Gbps network proved to be the bottleneck. In our testing, we ran client and inter-node traffic over the same network. We can improve performance with higher bandwidth networks that isolate client and inter-node traffic by 2x or more.

Of particular note is how little impact encryption has on the results. This is a point of great pride for our team and we have written about frictionless encryption in this HackerNews favorite. For astute observers you may notice that performance actually improves in the case of Avg. Write Throughput. WARP results sometimes contain differences that are sporadic and vary insignificantly between test runs - but the numbers tell a powerful story.

What matters is that you don’t see WRITE/PUT throughput cut by one-third or one-half when encryption is turned on like you might with other object storage solutions. With encryption and TLS enabled, the overall speed of reads is unchanged, while that of writes remains very high when compared to maximum available network bandwidth. Therefore, we strongly recommend turning on TLS and encryption for all external exposed product setups.    

Supermicro is Super Fast

Performance is a critical requirement for successful object storage implementations. MinIO is the fastest object store on the planet. Pair our highly scalable and high performance software with Supermicro’s dense, flexible and high performance hardware and you’ve got one of the fastest and most operations friendly object storage environments possible.

Performance results like these make it clear that object storage is more than speedy enough to support demanding use cases in streaming analytics, machine learning and data lake analytics.

The full report contains details of our Supermicro Cloud DC benchmark including specific steps for you to replicate these tests on your own hardware or cloud instances.

If you want to discuss the results in detail or ask questions about benchmarking your environment, please join us on slack, email us at hello@min.io or click the Ask an expert! button below.

Previous Post Next Post