Kubernetes Storage Solutions: A Practical Guide for Diverse Workloads, Including AI/ML

We were talking with a well-respected industry analyst the other day and he challenged us to articulate why Kubernetes is so important to Object Storage. It got us thinking that this was a topic worthy of our time, and yours.

Why Kubernetes Demands More Than Just Stateless Apps

At the most basic level, the value of Kubernetes lies in its ability to treat infrastructure as code, delivering full scale automation to both stateful and stateless components of the software stack. To derive the maximum amount of value requires treating the maximum number of components as code and orchestrating those. That means you put EVERYTHING into the container, including applications, infrastructure and data.

The Case for Running Object Storage Inside the Container

In the modern world, applications are stateless and containerized. Still, that state has to be held somewhere. That somewhere is object storage (not legacy block and file) and that object storage needs to run IN the container. When storage runs inside the container, Kubernetes can fully automate both stateful and stateless infrastructure across multiple Kubernetes clusters.. If the object store is left to bare metal or public cloud storage services, the benefits of Kubernetes based infrastructure orchestration are considerably diminished.

Another way to think about it is through a VMware analogy. VMware created the concept of the software defined datacenter. This was a predecessor to Kubernetes (which is why they claim it as their birthright). To get the true value of SDDC, you have to virtualize the entire datacenter. If some of the applications are left behind to run on bare metal, SDDC benefits are lost. The same is true for Kubernetes. If you only use Kubernetes for the applications, you are only tapping a fractional amount of the value. Let’s explore this a little deeper.

Persistent Volumes and the Role of Storage in Kubernetes Automation

First off, in the modern model, CPU, Network and Storage are physical layers to be abstracted by Kubernetes. They have to be abstracted so that applications and data stores can run as containers anywhere. In particular, the data stores include all persistent services (databases, message queues, object stores..).

From the Kubernetes perspective, object stores are not different from any other key value stores or databases. The storage layer is reduced to physical or virtual drives underneath. The need to run persistent data stores as containers arises from hybrid cloud portability. Offloading critical services like object stores or databases to external systems undermines Kubernetes automation, especially when those services rely on persistent volumes to maintain state across workloads.

Modern applications built for Kubernetes are designed to handle availability, replication, scaling, and encryption independently within their pods.. In turn storage needs to run IN the container in order to deliver Observability, Data Placement, Maintenance Operations, and Failure Handling.

From POSIX Limitations to Cloud-Native Object Storage

This was not always the case. Traditionally, applications relied on databases to store and work with structured data, and storage, such as local drives or distributed file systems, to house all of their unstructured and even semi-structured data. However, the rapid rise in unstructured data challenged this model. As developers quickly learned, POSIX was too chatty, had too much overhead to allow the application to perform at scale and was confined to the data center as it was never meant to provide access across regions and continents.This led them to object storage, which is designed for RESTful APIs (as pioneered by AWS S3). Now applications were free of any burden to handle local storage, making them effectively stateless (as the state is with the remote storage system).

Nothing echoes our case that "object storage is primary storage for AI" more than the world's largest cloud provider bringing out a new service designed to meet the needs of data-intensive AI/ML applications. It's even built to work best with large numbers of small objects, and that's a common workload profile for AI/ML. ML training at scale must rely on object storage because it runs in parallel across hundreds of compute nodes, many times relying on expensive GPUs for computations.

We can be close to certain that all major cloud providers will bring similar high-performance object storage options to market, priced similarly. This is a great upsell opportunity for them to add a more expensive storage option. It probably won’t stop the trend towards data repatriation, a cost-savings phenomenon that also enables greater AI/ML performance and control over data, but it is a calculated attempt to slow it.

Modern applications are built ground up with this expectation. Well-designed modern applications that deal with some kind of data (logs, metadata, blobs, etc), conform to the cloud-native (RESTful API) design principle by saving the state to a relevant storage system.

As a quick side note, REST APIs only address application-storage communication challenges such as PUT and GET or READ/WRITE data, and tracking metadata and version data, but not container orchestration and automation. That requires Kubernetes.

SAN and NAS can also make application containers stateless - but POSIX based File and Block are hopelessly inflexible in a containerized environment - i.e. ability to have application workers grow and shrink based on inbound load, move to a new node as soon as a current node goes down and so on. This is why object storage has replaced them as the primary storage class - as evidenced by public cloud’s reliance on object storage (and pricing of block and file).

This is not to say that storage applications, e.g. databases, object stores, key value stores, must be stateless. On the contrary, they need to be stateful - they just shouldn’t have the effect of making the application stateful in the process.

Why Kubernetes (K8s) Operator for Storage?

Kubernetes native storage applications are designed to leverage the flexibility containers bring. Agile and DevOps best practices dictate that applications and CI/CD processes be simple and straightforward, independent of underlying infrastructure and consistent in how it accesses underlying infrastructure. Simply put, containerized apps need to behave consistently across development, testing, and production environments to ensure true portability. Combining that with variable hardware infrastructures, it makes sense for Kubernetes to be the point of contact between all the disaggregated infrastructures, applications and data stores.

Therefore, storage applications cannot make assumptions about the environment in which they are deployed. For example, some use an internal erasure coding mechanism to ensure there is adequate redundancy in the system, across varying hardware and cloud infrastructures, to allow up to half of the drives to fail. Others also manages the data integrity and security using its own hashing and server side encryption.

No application should have to do any of that for itself anymore.

In the Kubernetes world, functions are simplified and abstracted: applications do application things and storage does storage things. The application doesn’t have to think about it - it just happens, all inside a container that can be expanded, moved or wiped out.

This is the cloud-native way.

Why Developers Are Moving Beyond CSI for Persistent Storage

There are certainly non-cloud native ways. For example, you could solve this problem with the Container Storage Interfaces (CSI), but sophisticated architects and developers don’t because they add needless complexity and scalability challenges. This is because CSI-based PVs bring their own management and redundancy layers which generally compete with the stateful application’s design.

Take the following example of how cloud-native platforms work with storage and state. Apache Spark, in the cloud-native world, runs in a stateless manner on Kubernetes and ships state to other systems while Spark containers themselves are running completely stateless. Other major enterprise players in the big data analytics space like Vertica, Teradata, Greenplum are also moving to a disaggregated model of compute and storage.

Similarly, all the other major analytics platforms from Presto, Tensorflow to R, Jupyter notebooks follow such patterns. Offloading state to remote cloud storage systems makes your application much easier to scale and manage. The rise of open source platforms across the data and AI stack has accelerated this shift, empowering developers with modular, interoperable tools that align naturally with Kubernetes-based architectures. Additionally, it helps keep the application portable to different environments.

AIStor has always thought of storage in this context. A majority of our workloads (523M Docker pulls as of this morning) run in containers (64%) and almost half are managed by Kubernetes (42%). That is why VMware picked us as a design partner for the launch of their Data Persistence platform (DPp). We are the standard for this type of deployment.

We continue to refine our approach. For example, our widely adopted Helm chart approach was not enough to cross the chasm from our DevOps audience to the mainstream IT administrator audience. Our previous implementation effectively dealt with a single tenant. For multi-tenancy and other DevOps tasks like provisioning, scaling, upgrades/updates, monitoring and encryption services - this required customer code.

Our new Kubernetes Operator helps our clients cross the chasm. Building a multi-tenant, self-service object storage infrastructure on top of AIStor required a significant amount of skills and custom code development.

With the introduction of the Operator, such tasks are automated and API / Web driven. Now AIStor is a full blown multi-tenant, self-service cloud storage on top of Kubernetes. The Operator and Console put the power of Kubernetes-native, object-storage-as-a-service into the hands of IT - without requiring CLI or scripting skills.

Want to dive deeper into how object storage fits natively into Kubernetes environments? Explore our Kubernetes-native storage architecture to see how MinIO is redefining what’s possible for persistent storage, scalability, and automation.

AIStor Everywhere

When we started talking about the concept of #aistoreverywhere it was to illustrate our integrations with the cloud-native elite. Now, however, #aistoreverywhere speaks to the fact that AIStor, in conjunction with Kubernetes, runs everywhere.

This can be lost on some given its nuance. Because of key economic and technical hurdles among the public cloud providers, it is increasingly attractive to use AIStor/Kubernetes across all infrastructures.

For example, public clouds are not interchangeable. AWS S3 does not equal Blob (Azure) and certainly does not equal GCP (marginally S3 compatible). Also, in the public cloud, bandwidth is more expensive than storage and latency is high. Smoothing these differences is a very expensive proposition.

Enterprises are adopting AIStor as a core part of their software stack (applications AND storage) because they can roll it anywhere. AWS, GCP, Azure, Tanzu, Openshift - the list goes on. Because AIStor is Kubernetes native and runs IN the container - AIStor works out of the box in any Kubernetes environment - from a car or 5G POP to the public cloud. That is why you find 7.7M IPs running AIStor in AWS, GCP and Azure.

All Together Now

There is a lot here so let’s summarize quickly. Kubernetes' value lies in its ability to treat infrastructure as code, delivering full scale automation to both stateful and stateless components of the software stack.

To realize the full value of Kubernetes, it’s essential to containerize as many components as possible—including persistent storage systems that manage critical application state. This includes storage/persistent data.

AIStor is built for this - it easily fits in containers (~45MB), it is designed for RESTful APIs and continues to evolve its approach (see AIStor Operator) to deliver the most native Kubernetes experience when it comes to storage.

When you are native to Kubernetes you can run anywhere it does - and today, that is everywhere you care about running - public cloud, private cloud, Kubernetes distribution and edge.

Don’t take our word for it. See for yourself. You can pull the AIStor Operator for Kubernetes. Questions? Join the conversation on our Slack channel, or email us at hello@min.io