The Architect’s Guide to Edge Storage

The Architect’s Guide to Edge Storage

The current generation of edge computing is still in its infancy. This is a remarkable statement given its size today ($6 billion) and expectations for growth ($61 billion by 2028). Nonetheless, with 5G just taking hold and the Internet of Things (IoT) economy exploding, the truth is we don’t really know how big it may become, or how quickly.

This installment in MinIO’s Architect Series focuses on storage at the edge. Handling data properly at the edge can ensure a scalable, cost-effective and secure infrastructure. On the other hand, failing to set up the right architecture can lead to data loss, security vulnerabilities and sky-high costs related to the bandwidth needed to transfer data repeatedly to and from the public cloud.

We aim to provide guidance to achieve the former.

Why Edge?

Edge computing places content, data and processing closer to the applications, things and users that consume and interact with them. The primary challenge  of edge computing is to achieve a bandwidth optimized architecture that delivers performance, resilience and security, without massive investments in infrastructure.

Bandwidth is a key consideration from an architecture perspective, and the reason is clear: it is 4X more expensive per GB than storage (.023 vs. .09 on AWS for example).

The rise of the edge is a function of where data is growing. While much attention is paid to the datacenter, many analysts expect that within a year or two the data created by enterprises outside those walls will surpass those created within them.

We will cover both the models for edge storage (edge storage and edge cache) as well as the requirements to be successful. Let’s start with the models.

Edge Storage

The edge storage model is employed when the goal is to conduct processing and analytics on the edge, filtering out the noise and retaining/sending up just the insights and data associated with those insights. In this model, the application, compute and storage exist at the edge and are designed to store and process data in situ. The goal is not to store PBs of data at the edge — rather, this model envisions 100s of GBs up to a few PBs or so. Visually it looks like this:


At the most remote edge you have the data producing devices, coupled with storage and compute/analytics. The compute/analytics can range from something like a Splunk DSP to a deep neural network model development, but the key point here is that there is ETL, processing and insight generation at the remote edge. These instances are containerized and managed with Kubernetes as data pipelines.

Kubernetes, a key enabler of edge architecture, effectively imposes the requirement that storage be disaggregated and object based.

To complete the architecture, one would add a load balancer, another layer of Kubernetes, and then have an origin object storage server and the application layer (training the models, doing large scale analytics etc.) in a more centralized location.

This model is employed by restaurants like Chick-fil-A. It is used for facial recognition systems. It is the default design for manufacturing use cases, as well as 5G use cases.

In each case, there is enough storage and computation onsite or in an economically proximate location to learn from the data.

Let’s use an example of a car producing data from sensors — an area in which MinIO has considerable deployment expertise. The purpose of collecting data is to build and train machine learning models. Cars don’t have the compute resources internally to do the training, which is the most GPU-intensive part. In this case, the data is sent to an edge data center to build and train the machine learning models. Once the model is trained, it can be sent back to the car and used to make decisions and draw conclusions from new data coming in from sensors.

It makes sense to distribute the training and processing geographically, to be as close to the devices as possible.

Eventually, that data will end up in the cloud (public or private). It will accumulate quickly and in the case of autonomous vehicles it will be multiple PBs in no time. As a result, you will need the same storage on each end — at the edge and in the cloud.

Object storage is the storage of choice in the cloud. As a result, object storage is the storage of choice for the edge. We get into what attributes the storage at the edge needs to have below. But for now it is important to note that legacy SAN/NAS systems are inflexible and often incompatible for these use cases, since data processing applications are adopting S3 API natively.

While some conflate the edge with the private cloud, this is a mistake. The definition of public, private and edge are fundamentally blurred at this point. They each draw from the same set of practices — containerization, orchestration, RESTful APIs, automation and microservices. The “classification” is a function of what the architect is optimizing for (performance, economics, security, resilience, and scale).

Edge Caching

Recalling the first rule of the edge, to treat bandwidth as the highest cost component (4X on AWS), we come to our second core edge case: edge caching. Edge caching is not a new concept, as content delivery networks (CDNs) are decades old — but it is also one where object storage has again changed the rules. CDNs need to be tightly integrated into the object storage system in order to maintain the security and consistency model of the objects.

In this model, the edge serves as a gateway cache, creating an intermediary between the application and the public cloud. In this scenario, the gateways are backed by servers with a number of hard drives or flash drives, and are deployed in edge data centers around the world. It looks like this:


All access to the public cloud goes through these caches (write-through cache), so data is uploaded to the public cloud with strict consistency guarantee. Subsequent reads are served from the cache based on ETAG match or the cache control headers. This architecture reduces costs by decreasing the bandwidth needed to transfer data, improves performance by keeping data cached closer to the application, and also reduces the operational cost — the data is still kept in the public cloud, but cached at the edge, so it is still there if the edge data center burns to the ground.

With MinIO’s object storage gateway, one is also able to employ a shared nothing architecture with zero administration. You deploy it once and forget it. Adding a node, two nodes, two thousand nodes — it does not matter, they are architecturally independent of one another and totally stateless. Just keep scaling. If a node dies, let it go.

The Attributes of the Edge

Regardless of which type of architecture you’re building at the edge, there are certain attributes that need to be built into any edge storage system, whether in an edge processing center, in IoT devices themselves, or as part of an edge caching system. The first, as noted, is that the storage needs to be object based and accessible over HTTPs. Object is the default pattern for the edge. File and block protocols cannot be extended beyond the local network.

There are, however, additional requirements for that object storage and they are as follows:

Resilience

Resilience is essential for storage at the edge. It is harder for skilled engineers to physically access and maintain IoT devices or edge data centers. At the same time, the drives in IoT devices — and even in edge data centers — are subject to harsher physical conditions than drives in a traditional data center.

These architectures, and in particular the storage component, need to be able to fail in place. Drive failures will happen. Without the right architecture, drive failures can lead to data loss. In addition to losing data, replacing drives can be an operational nightmare, because they require experienced technical staff to visit geographically distributed data centers and/or attend thousands of edge devices.

It’s crucial for the storage architecture to use self-healing and automation to ensure that data is safe even when drives fail, and to automatically fail over to other data centers if all of the drives in a particular edge location fail.

Software Defined + Container Friendly + Open Source

Software defined storage solutions provide a measure of flexibility that does not exist with traditional systems. They can run on a variety of hardware platforms with equal ease and can be easily maintained from afar.

Further, software defined storage solutions are superior for containerization and orchestration. As you may recall, it is impossible to containerize a hardware appliance. Given the need to spin up/down and grow/shrink edge solutions, a Kubernetes-friendly solution is a requirement.

Third, solutions need to be open source. This is a given for telcos who have long seen the value in open source (see O-Ran), but is also important to other industries where freedom from lock in, freedom to inspect and freedom to innovate, are all key to the selection process. Another underappreciated value proposition for open source is ease of adoption — that it is run in a highly heterogeneous number of configurations and is hardened in ways that proprietary software can never be.

Stateless

Edge storage systems need to be made up of completely disposable physical infrastructure. If they catch fire, there should be no data loss. If there is an accident, there should be no data loss. The critical state should be stored in the public cloud so that individual hardware elements can be disposable.

It’s impossible to treat drives at the edge as pets. They will almost invariably be subject to tougher physical conditions, which leaves them at risk for not just failure but also corruption.

Speed

The faster you can process data, the faster you can make business decisions. Speed is one of the primary reasons for moving data processing away from traditional data centers and the public cloud. The ability to speed up data processing and data transfer is essential for getting the most out of edge computing.

Latency is tricky to solve, even when dealing with data centers located at the edge. Successful architects attack latency wherever they can. One widely-used technique is to process and analyze data in memory to remove the latency introduced by disk. Achieving speed at the edge requires removing any dependency on high-latency networking.

Lightweight

Edge devices are small. For a storage system to be viable at the edge, it must provide speed, resilience and security with very little compute and storage resources. The ability to run on devices with low resources — a Raspberry Pi, for example — is essential to building a storage system that might run on a single solid state drive in an IoT device. The key, however, is that a single solid state drive must look and act like a full-blown server from an application and API perspective.

Security

There is no way to completely ensure the physical security of either edge data centers or IoT devices. Ensuring encryption at rest and in transit is critical, because placing the same physical security measures around an edge data center as would be in place around a traditional data center is impractical — and it’s impossible in the case of IoT devices. The physical vulnerability of drives at the edge makes encryption essential, so that even if data is accessed it can’t be read or tampered with.

Summary

Designing the appropriate architecture is critical in the edge world. This post presents two models, one where data is gathered at the edge and another where the data is pushed to the edge. While the models are fundamentally different, the storage choice is the same: object.