The Scalability Myth

The Scalability Myth

Lots of object storage companies like to talk about scalability while tossing around terms like exabytes and “infinite.” Unfortunately, many of the terms used to describe scalability make grandiose and misleading promises that don’t help enterprises build an effective storage platform. Claims made around the simplest use case,  static archival data, aren’t translatable to the entirety of use cases that must be addressed to support the data-driven enterprise.

Realistically speaking, scalability to exabytes and beyond isn’t that hard when you are putting it in cold storage. But that is not what modern enterprises want. They want security, scalability and performance because they need to be able to interact with data at scale for AI/ML workloads and advanced analytics platforms like Splunk. They want storage proven to perform at scale for the entire software stack regardless of data volume and provide low-latency responses across a variety use cases, many in real time.

We see it in the data - this poll from The New Stack is a great example:


Putting aside the marketing spin (with the disclaimer that I am in marketing), what are the attributes needed to satisfy this combination of requirements for cloud object storage? This is our take:

  1. System Scale - To deliver against the scalability requirement, the entire system needs to be scalable. As we have blogged earlier- an example of a system that doesn’t scale linearly is one where Cassandra is used as the metadata database. This limits what you can do with the data as Cassandra is better at writes than reads when it comes to scale and is very poor at large scale actions like deletes. The entire system needs to scale elegantly, seamlessly and without issue for all kinds of workloads from artifactory storage and snapshots to machine learning pipelines.

    It should be noted that for modern workloads at scale - everyone is building on object storage. SAN/NAS are relegated to increasingly legacy applications.

  2. Performance - Performance can be evaluated across multiple dimensions - raw, straightline performance and performance at scale. The difference is simple - running a benchmark for your object store and a few TBs of data may produce some nice numbers but the real test is to sustain that performance across multiple PBs for all kinds of access patterns and object sizes. The reason it is so important is that without that scalable performance - you can only realistically operate on a fraction of your data.

    The use cases in AI/ML are tending not just toward massive amounts of data, but they increasingly look at what is called the “dark data” the data that holds secrets but is generally forgotten or archived for reasons of performance (too big) or cost.

    Modern object stores need to be able to deliver performance across the continuum of scale. Selecting an object store that can do that ensures the organization can unlock all of the value that lies in that data - not some fractional component.

    MinIO has built a reputation as a leader in the performance at scale game through its benchmarks. Not all benchmarks are created equally, however. Many vendors try and game the system by setting encryption and bitrot protection to low protection levels or turned off entirely. Shameful stuff really. We invite you to play with Warp - our S3 benchmarking tool, it is quickly becoming the standard.

  3. Secure - Security is the overwhelming #1 answer among the respondents to The New Stack survey, but this should not be news to anyone. Storing data includes protecting it. Protecting it from loss. Protecting it from unauthorized access. In the case of Ransomware, the two go together. Unauthorized access results in loss. In the continuum of bad, breach is the worst because there once the data is exposed, the problem compounds.  

    This is why security needs to scale too. That means that security cannot have performance overhead that keeps you from running it all the time. Scalable encryption should also protect data everywhere - in flight (TLS certificates) and at rest (KMS, encryption). Security also includes access management (authentication and authorization) and object locking. They all have to scale if you want to deliver comprehensive protection. Taken together these are monumental requirements that most object stores cannot deliver against. And so enterprises compromise, with predictable results.

  4. Operational Scale - The ability to manage massive infrastructure with just a handful (or even just a couple to manage across time zones) people is operational scale.

    Some call it maintainability. We like that term too. We are less keen about total cost of ownership. The reason is that you can’t “value engineer” maintainability. You can either put one person in charge of a multi-tenant, peta-scale, object storage as a service instance or you can’t. If the aforementioned needs a team of six to look after security, network, drive, CPU, resilience, SLAs, downtime, upgrades etc - that solution is not “maintainable” in our book. That functionality needs to be manageable, transparent and simple - without sacrificing control or granularity.

    OPEX is orders of magnitude higher than CAPEX over time. The ability to scale is a function of software selected. Simple, powerful software wins every time because operational scalability is a software problem, not a people problem.

  5. Software Defined - While the appliance vendors will argue this point aggressively, the fact is that properly designed software defined solutions scale better. When we say properly defined that means they run on any commodity HW, VMs or containers, popular operating system distributions, and not just a couple tightly defined boxes from a handful of big name vendors. Yes, AWS controls the HW in their stack, but there is massive variation on the HW side. Almost all of the Hardware Compatibility List (HCL) is obsolete to begin with. When software is released frequently and hardware too refreshes often, it has become nearly impossible to keep this HCL validated.

    When you can do that, hardware really does become a commodity. The software handles the heterogeneity between media, models - even brands. Go get your best price. Take advantage of quarter end blowouts. Design your systems with SSD and HDD and tier across them using ILM. Use the public cloud as cold storage. Design a data lifecycle around the data - not the HW spec.

    Kubernetes is the driver of that software defined scale. Software shouldn’t worry about the underlying infrastructure be it public cloud or bare metal private cloud. Let Kubernetes abstract the infrastructure and roll out your object storage as software containers. While we have said it before, it bears mentioning again - you can’t containerize an appliance.



Summary
Scalability is a multi-dimensional problem. It doesn’t get the attention it deserves because very few vendors want to discuss it outside of their specific, narrowly defined success criteria. This is bad for the overall industry because it ignores the things that really matter - security, performance and maintainability. We invite you to consider a more comprehensive list in the hope that it will result in better questions of your current vendors and better system design going forward.

Previous Post Next Post