Block is Faster than Object Storage and Other Myths

We’ve been working with a large financial services institution that had been told by a leading graph database vendor that “S3 blob storage is too slow to support any database, you should use faster block storage with a file system instead.”

Ignoring the fact that the statement doesn’t even make sense - it’s an apples (type of storage access) to oranges (type of storage) comparison, we felt it needed to be addressed directly.

The database vendor in question didn’t mean it maliciously and didn’t have a particular agenda, but if they don’t know better, chances are others don’t either. This post seeks to clarify how different storage technologies play different roles in architectures, discuss which factors affect performance and overall set the record straight on the suitability of object storage for database workloads.

Let’s start by dealing with the apples and oranges problem.  Associating the performance of the storage system with the type of storage access is wrong on multiple levels. Unfortunately, this is a common misunderstanding and source of confusion in the industry.

Now that we have set the record straight on access vs. performance, let’s deal with performance.

Storage System Performance

When you think about storage system performance you generally think along the lines of throughput and IOPS/latency. Traditionally, the industry has considered object storage to be the throughput optimized storage system whereas file/block are the IOPS/latency optimized solutions.

I say traditionally, because the lines are much blurrier now. MinIO, for example, has invested heavily in small file optimizations which in turn leads to major IOPS gains. Further, SAN/NAS solutions slow considerably past a PB so the advantage diminishes.

The point here is that the size, type, and characteristics of the workload matter tremendously when it comes to storage system performance and that relying on outdated “maxims” is a poor approach to architecting modern systems.

As it relates specifically to the claim above that “object is slow” well, that’s just not true. Object storage has come a long way since early object storage systems came to market with low-end appliance hardware and HDDs. Having said that, fast object storage isn’t exactly today’s news. Perhaps they’re also comparing AWS S3 as a service (not the S3 API) to local SSD/NVME drive performance, but even then both are pretty fast. Snowflake runs on S3 and has from inception. If it wasn’t fast, they would have picked something else. They didn’t have to choose S3.

Modern object storage will run at hardware speed. There is a tremendous amount of data from multiple sources to support that assertion. Intel, Western Digital, SuperMicro and Seagate have all run performance tests on MinIO and found this to be the case. Give a modern object storage system like MinIO NVMe and we will flood a 100GbE network. Give us HDDs and we will max those before we saturate the network.

Architected correctly, modern object storage like MinIO is as fast as the hardware you put it on. That’s just a fact.

A recent benchmark achieved 325 GiB/s (349 GB/s) on GETs and 165 GiB/s (177 GB/s) on PUTs with just 32 nodes of off-the-shelf NVMe SSDs. MinIO more than delivers the performance needed to power demanding workloads like Apache Spark, Starburst Presto/Trino, Clickhouse, and just about any other cloud-native database, analytics or AI/ML workload you can think of.

In this case the application vendor may have recommended a filesystem or block based storage because of their dependency on these "storage access" methods not because of the "storage systems" and their performance characteristics. That would be a shortcoming in their system if that was the case - not a technical argument. Nonetheless, let’s turn our attention to the access issue.

Storage Access

Again, storage access (i.e. file vs block vs object) is different from performance. The primary difference is the API. Object uses RESTful API, which is how the cloud operates. File uses POSIX. Block uses FC/SCSI/iSCSI. The last two predate the cloud and are generally considered legacy in that operating model. There are other considerations too. For example, a file-based protocol (i.e. POSIX) may have a lot more options and chattiness compared to an object-based protocol (i.e. S3) which is inherently simple (and therefore inherently scalable). Nonetheless, these are choices to be made that are separate from but have a bearing on performance.

The access approach is also somewhat (not entirely) hardware dependent. This includes CPU, Drive Type, Memory etc. Software architecture also plays a role. Key areas on the access side are the number of layers and the efficiency of each layer. If there are multiple layers that must be navigated, such as fronting a file store with an S3 interface, you will build in complexity, and therefore inhibit performance. Likewise, how each layer organizes, writes, scans and reads data greatly contributes to the overall performance of the storage system. The best storage access architectures have the least number of, and most efficient, layers - recognizing that everyone eventually writes and reads to the block layer to/from drives (while not the focus of this post - “block” from a storage perspective is not “one” layer). Block is an essential layer because it is required to access underlying physical storage, but block is limited in its ability to manage and present metadata about unstructured data. In this case, object storage is required to rapidly and efficiently conduct metadata reliant operations such as search.

Eliminating external dependencies such as an external DB (for metadata) will also address potential performance issues as the environment scales (one layer but multiple paths).

Conclusion


Architects need to understand that Storage Access and Storage Performance should not be conflated. They are related in some aspects, but the choice of one does not dictate, automatically, the behavior of the other. Block is not always fast. Object is not slow. There is ample data available to make this a moot point.

Anyone telling you differently either doesn’t understand storage or is trying to sell you something.