Low Level Performance Testing for Object Storage

Low Level Performance Testing for Object Storage

Assessing the performance and scalability of enterprise object storage is a complex and time consuming undertaking. Yet, performance testing is one of the most important activities that can be undertaken in the planning, proof-of-concept and ongoing operations stages of a storage project. Without performance testing, you have no way of knowing whether or not storage systems are operating efficiently, and if they are capable of meeting application requirements.

While the ultimate goal of our performance testing is to verify object storage performance for GET and PUT operations, we always start by assessing performance of drives, nodes, the OS and the network. This methodology helps us characterize performance in its entirety and pinpoint bottlenecks. When we understand application workloads, then we can test to verify the storage system has the throughput, latency and IO profile to support them. We always start by baselining individual storage components to spot potential problem areas quickly and make it easy to recreate failure and worst-case scenarios.

The natural focus would be on storage media, but a storage system is made up of much more than just SSDs and/or HDDs. The system includes multiple devices and components that can be thought of as layers, each with its own complexities and characteristics. Performance testing helps pinpoint bottlenecks that can occur almost anywhere: motherboards, host bus adapters, storage controllers, CPUs, NICs, network devices such as switches and OSs. There are many possible bottlenecks that can be uncovered by methodical and thorough performance tests of the entire system and individual components.

This blog post shows you how to measure drive and network performance to verify that underlying infrastructure components are capable of supporting MinIO performance. MinIO customers have access to built-in automated performance test tools that simplify the troubleshooting steps described below to a single click.

Measuring Drive Performance with Dperf

Dperf is a drive performance measurement tool that anyone can use to assess drive performance and identify problem drives. This small utility takes multiple file paths as input and conducts reads and writes in parallel on those files by default, akin to fio or iozone. Serial testing is also possible, akin to the dd tool in Linux. Read and write throughput are displayed on screen with the fastest drives shown first. The tool sets defaults for parameters such as block size and total bytes to read/write, and others to find drive bottlenecks.

Dperf is written in Go, and you will need a working Go environment. Once you have Go installed, you only need to run the following

go install github.com/minio/dperf@latest

A working Go environment is not needed if you download the prebuilt dperf binary.

Dperf is configured via command line flags

$ dperf --help

MinIO drive performance utility
--------------------------------
  dperf measures throughput of each of the drives mounted at PATH...

Usage:
  dperf [flags] PATH...

Examples:

# run dpref on drive mounted at /mnt/drive1
λ dperf /mnt/drive1

# run dperf on drives 1 to 6. Output will be sorted by throughput. Fastest drive is at the top.
λ dperf /mnt/drive{1..6}

# run dperf on drives one-by-one
λ dperf --serial /mnt/drive{1...6}

Flags:
  -b, --blocksize string   read/write block size (default "4MiB")
  -f, --filesize string    amount of data to read/write per drive (default "1GiB")
  -h, --help               help for dperf
      --serial             run tests one by one, instead of all at once.
      --version            version for dperf

You can expect to see around 1 GiB/s R/W (combined Read/Write) or less for HDD. For flash storage, you should see somewhere around 2.0 to 4.0 GiB/s R/W for low-end SATA SSD, ranging up to 6 to 7 GiB/s R/W for PCIe 4.0 SAS SSD and around 7 to 10 GiB/s for PCIe 5.0 NVMe . For example, here’s the output from when I ran the utility on the SSD in my laptop

$ ./dperf /mnt/c
┌───────────┬────────────┐
│ TotalREAD │ TotalWRITE │
1.0 GiB/s │ 1.1 GiB/s  │
└───────────┴────────────┘

And here is sample output from running the utility on a server equipped with enterprise-class NVMe

$ ./dperf /mnt/drive{1..3} –verbose

┌───────────┬────────────┐────────────┐
PATH      │ READ       │ WRITE      |
│/drive3    │ 6.6 GiB/s  │ 3.6 GiB/s  |

|/drive1    | 6.4 GiB/s  | 3.5 GiB/s  |

|/drive2    | 6.3 GiB/s  | 3.6 GiB/s
└───────────┴────────────┘────────────┘


┌───────────┬────────────┐
TotalREADTotalWRITE
│ 19.3 GiB/s│ 10.7 GiB/s │
└───────────┴────────────┘

Flexible and Insightful Testing with FIO

FIO (flexible IO tester) is used to simulate a given I/O workload. A job file defines the tasks that FIO is to complete serially. The job file can contain multiple jobs, and a single job can be configured and launched via the command line. FIO does not need to run as root, but it can when a file or device requires root.

When setting up a FIO I/O workload, the first step is to write a job file describing that workload. A job file may contain any number of threads and/or files – the typical contents of the job file is a global section defining shared parameters, and one or more sections to describe individual jobs. A job contains the following parameters:

  • I/O type: Defines the I/O pattern issued to the file(s). We may only be reading sequentially from this file(s), or we may be writing randomly. Or even mixing reads and writes, sequentially or randomly. Should we be doing buffered I/O, or direct/raw I/O?
  • Block size: In how large chunks are we issuing I/O? This may be a single value, or it may describe a range of block sizes.
  • I/O size: How much data are we going to be reading/writing.
  • I/O engine: How do we issue I/O? We could be memory mapping the file, we could be using regular read/write, we could be using splice, async I/O, or even SG (SCSI generic sg).
  • I/O depth: If the I/O engine is async, how large a queuing depth do we want to maintain?
  • Target file/device: How many files are we spreading the workload over.
  • Threads, processes and job synchronization: How many threads or processes should we spread this workload over.

This list is merely the basic parameters needed to define a workload. There are another few dozen options that can be included in a job file or invoked via a command line flag. See the examples/ directory for inspiration on how to write job files. If you get stuck, the --cmdhelp option also lists all options. If used with a command argument, --cmdhelp will detail the given command.

Measuring HTTP Speed with Hperf

Hperf is a tool that measures the maximum achievable bandwidth between a specified number of peers, reporting receive and transmit bandwidth for each peer. The utility can be installed and run from the command line, or installed into Kubernetes via Helm or YAML. The default ports Hperf uses are 9999 and 10000, so your firewall must allow traffic on these ports. You may optionally configure custom ports in ./hperf using the NPERF_PORT= option.

After downloading and installing hperf into your local Go environment, simply run it from the command line of the servers that you want to test, including the IP addresses of every endpoint for which you want to measure network throughput

./hperf-linux-amd64 10.0.0.10 10.0.0.209
2023/07/10 16:15:44 Starting HTTP service to skip self.. waiting for 10secs for services to be ready
Bandwidth:  54 MB/s RX   |  71 MB/s TX
Bandwidth:  78 MB/s RX   |  80 MB/s TX
Bandwidth:  82 MB/s RX   |  76 MB/s TX
Bandwidth:  80 MB/s RX   |  85 MB/s TX
Bandwidth:  82 MB/s RX   |  76 MB/s TX
Bandwidth:  84 MB/s RX   |  87 MB/s TX
Bandwidth:  80 MB/s RX   |  84 MB/s TX

The above results are from two nodes on my 1 Gbps home network. Results are in the range of 50%-75% of the total bandwidth available, which is to be expected from an unmanaged home network.

MinIO performance is typically gated by network throughput. When we test performance ,we find that the network is almost entirely utilized. At a minimum, we recommend a 25 Gbps network, and we mostly see customers deploy on 100 Gbps networks. A quick and easy way to improve overall performance is to isolate client and inter-node traffic over separate networks. MinIO takes full advantage of the available underlying hardware - the bottleneck is almost always the network. Performance improves when MinIO runs on higher bandwidth networks.

Bringing it All Together

By evaluating storage performance layer by layer we are able to tease out and address bottlenecks. First we measured the drive performance, then the network performance and finally the ability of the drives to execute our workload. When measuring individual drives, we can start to track down deficient/failing hardware.

MinIO subscribers gain access to automated easy-to-use performance test tools that provide a streamlined troubleshooting experience and concise results. MinIO administrators are a mere click away from running a distributed performance assessment of their clusters. Once a HealthCheck is run, results are uploaded to MinIO SUBNET. Health Check assesses object storage performance with PUTS, then GETS, incrementally increasing load to pinpoint maximum aggregate throughput, assesses drive performance with dperf and finally assesses network performance with hperf. Tests can take anywhere from a few seconds to several minutes to execute depending on your MinIO cluster, and can be viewed in the SUBNET portal, under the Performance tab, and discussed with our engineers.

Verifying Object Storage Performance

A well-designed object storage product should make it easy to obtain accurate and consistent performance test results without requiring numerous exotic tunings and making you jump through hoops. The goal is for performance tests to be simple and repeatable – and tweaking ad-infinitum is not repeatable. Along the same lines, make sure to document the specifics of your entire storage environment so someone else could rebuild it exactly and entirely if needed.

From here, you may want to start playing with the variables that affect storage performance. You could simulate a drive failure under heavy load and measure the effect on performance. You could add/remove entire nodes, memory, NICs and drives to measure their impact, or even test on the fastest network you can afford to see if the performance improvement has real-world implications.
Are you sizing hardware for a MinIO deployment? Check out our Erasure Code Calculator to help understand your raw and usable capacity across a range of deployment options.