Strict Consistency is a Hard Requirement for Primary Storage

on Object Storage 19 October 2023

Strict Consistency is a Hard Requirement for Primary Storage

Enterprises rely on data to make decisions. Effective decision-making hinges on the accuracy, timeliness, availability, and security of data. Data consistency, an important factor that cannot be ignored when purchasing storage, involves ensuring that all relevant parties can immediately access the results of a database transaction once it has been finalized, either through commitment or rollback. This guarantees that everyone accessing the data sees the same information simultaneously and that it is reliable, useful and not corrupt. As a result, data consistency is essential in a transaction-based environment.

Enter Object Storage

Recent years have brought about an onslaught of structured, semi-structured and unstructured data. We've also seen the rise of cloud-native, non-relational databases and data lakehouses (Apache Iceberg, Apache Hudi, Delta Lake) as an alternative to legacy relational databases. Combined with object storage, these technologies offer a simplified approach to storing and accessing data that relaxes many of the constraints created by proprietary RDBMS. Industries have rushed to embrace the architecture of object storage and non-relational databases because together they provide a range of features that relational databases were never optimally designed to provide, such as exceptional performance at large scale, high availability due to containerization and Kubernetes, and high extensibility as data is available to every application that speaks the S3 API.

As architectures adapted to these modern methods of working with data, object storage replaced legacy file and block storage because of its performance, scale and durability. The enterprise world raced to put object storage to work because of the improvements in response times, scalability, availability, and fault tolerance that it brought. The simplicity of the S3 API and the flat structure of objects saved to the system are attractive to enterprises because of the improvements in developer productivity and code portability.

However, the widespread adoption of object storage has created fresh challenges, particularly concerning data consistency, a concern that relational database vendors addressed decades ago. It is critical that CIOs, CISOs and CTOs understand the implications of opting for eventual consistency or strong consistency when it comes to data in enterprise data systems.

What is Strict or Strong Consistency?

In any high-concurrency cloud-based system, multiple copies of any data are likely to exist across physical locations, public cloud and on-premises, in various microservices and containers. Let's take a moment to dig into consistency using an online retail example.

Imagine a web retailer that has built an inventory management system using object storage. What happens when one employee checks out an item to ship it and at the same time another employee logs that a new box of that item was received from the manufacturer? They will attempt to concurrently update the same object (same key) in S3. Then what happens when a query is run?

In a strict consistency system, the inventory will be accurate at all times. In an eventual consistency system, reading stale data is likely to lead to overselling or underselling.

When software engineers develop applications, they must make crucial decisions regarding the level of consistency required, and they must take into account the need to present consistent information across various platforms. The only way they can ensure that the inventory remains identical everywhere is to build the application on a foundation of robust data security, durability and consistency.

Strong data consistency guarantees that data can be consistently accurate, and that's pretty important in the enterprise today. However, there are trade-offs to consider, particularly in terms of system performance and user experience. There is a potential drawback of strong consistency because, unless a system is engineered correctly, it can lead to a decline in overall system performance, adversely affecting the user experience. For instance, when a customer tries to buy an item using a strongly consistent system, they may not be able to execute the transaction immediately, even if the inventory was updated by a transaction five seconds ago and is highly likely to be accurate. But in an eventual consistency system, the customer could buy the item, perhaps resulting in an oversell. Strong consistency isn't "highly likely", it is "absolutely certain".

In the past, it was easier to architect a system that would balance performance and consistency because scale and geography were of no consequence because systems were on-premise. For an RDBMS running on a single node or a distributed database with two to four nodes, this is relatively straightforward, but in a multi-cloud computing environment with dozens of data nodes around the world, the complexity increases significantly. This issue becomes particularly critical in cloud architectures where network latency and reliability can further impact performance, severely affecting response times, scalability, and availability.

Being Lax with Eventual Consistency

With strong consistency being difficult to attain, storage and database developers chose to "solve" the performance and usability issues associated with strong consistency by taking a step backward and developing for eventual consistency. Having given up on the reality of immediate strong consistency, eventual consistency is a theoretical assurance that, as long as no new updates are made to a piece of data, all future reads of that data will eventually yield the most recently updated value.

In an eventual consistency framework, copies of data may not always be identical, but they are designed to eventually converge and become consistent once all ongoing operations have been processed. For example, the Domain Name System (DNS) operates on an eventual consistency model.

DNS servers are notorious for not being up to date. An individual DNS server may find it close to impossible to always serve up the most current address information; instead, values are cached and distributed across numerous layered directories around the internet. It takes some time to propagate modified values to all DNS clients and servers – a best practice is not to expect your changes to propagate across the world in less than three days. However, the DNS system is immensely successful and a foundational infrastructure component of the internet. The DNS system is remarkable for its high availability and global scalability, serving name lookups for over a hundred million devices throughout the entire internet.

This is in stark contrast to the earlier online retail inventory example. In an eventual consistency data model, the shopping app would display the most recent item inventory on a mobile device if it is programmatically judged to be up-to-date and reasonably reliable, even when the inventory application and data are currently unavailable for querying. Eventual consistency is not how I would want to shop because I would like some assurance that what I'm buying is in stock and available. Eventual consistency is fine for Facebook statuses, but not for anything that even slightly resembles a transaction.

Why?

The Great Consistency Debate

The debate between advocates of strong consistency and advocates of eventual consistency is almost as old as computing itself. The choice between adopting one or the other is closely tied to the CAP Theorem, which establishes that a distributed data store cannot simultaneously ensure all three of the following guarantees:

Consistency
Availability
Partition tolerance

Let's dig in with an example. In a multi-cloud environment, network partitioning occurs rather frequently when architecting distributed data and cloud infrastructure. In these cases, the solution architect faces a decision regarding whether to prioritize data availability or data consistency. The two consistency models each emphasize one of these aspects over the other. It's important to note that eventual consistency doesn't eliminate consistency altogether. Instead, it suggests a temporary relaxation of data consistency in favor of ensuring data availability for a certain period. It's also important to note that the "eventual" in eventual consistency is never defined by the object storage vendors that rely on it – is eventual consistency a few milliseconds, a few seconds or a few days?

Making the Consistency Decision

Cloud architects cannot afford to choose the wrong data consistency model. Both strong consistency and eventual consistency models are valuable tools, but it is up to every architect to carefully assess and select the most suitable one on a case-by-case basis. It's essential to recognize that eventual consistency isn't a replacement for strong consistency but rather a distinct approach to address different business use cases.

Eventual consistency might be preferred for scenarios involving a vast number of data records, as in the case of Facebook statuses. Eventual consistency is well-suited for cases where query results yield a substantial quantity of data, and, perhaps more importantly, the user experience may not significantly depend on the specific inclusion or exclusion of individual entities. Conversely, scenarios featuring a limited number of entities within a narrow context would indicate a need for strong consistency. In such cases, the user experience must align with the context of the data and use case.

If data undergoes frequent changes, eventual consistency is likely not a suitable choice. Users would have to wait for changes to be disseminated to all replicas across the multi-cloud before queries can return the most up-to-date version. For this reason, eventual consistency tends to be more fitting for traditional object storage use cases where data remains relatively stable, such as backups, archives, video and audio files, virtual machine (VM) images and artifacts.

Why Isn't All Object Storage Strictly Consistent?

It turns out that it's no easy task to build a massively scalable distributed object storage system that provides strong consistency, just ask AWS how hard it was to improve on the original eventual consistency S3.

Taking a look at that blog post provides some insight into why other object storage vendors are having such a hard time building the strict consistency of solutions like AWS S3 and MinIO.

The author spells it out pretty clearly: "we wanted strong consistency with no additional cost, applied to every new and existing object, and with no performance or availability tradeoffs."

AWS didn't want to follow the example of "other providers [that] make compromises, such as making strong consistency an opt-in setting for a bucket or account rather than for all storage." They knew that, as the cloud object storage leader, they couldn't get away with support for consistency in name only. Other vendors have not been as forthright.

For AWS to add strong consistency to AWS, they realized that they would first have to make the metadata cache strongly consistent, and this would be a "tall order at S3's scale". They had recently released new replication logic, and this became a core piece of their cache coherency protocol. They also added a "witness" component to prevent reads until the system can verify that an object in cache is not stale. If it is not stale, then the cached version is served and if it is stale then the cache is invalidated and the object served from the persistence tier.

The team knew that there was no room for error in the cache coherency protocol and that they needed to do this without compromising performance and availability across S3.

The team also spent more cycles on verification than the actual implementation itself. "It is important that strong consistency is implemented correctly so that there aren't edge cases that break consistency." Yet, S3 is a massive distributed system with many users and edge cases were going to occur. "Even if something happens only once in a billion requests, that means it happens multiple times per day within S3."

Building Strong Consistency with MinIO

MinIO follows a strict read-after-write and list-after-write consistency model for all I/O operations in both distributed and standalone modes. This means that after a successful write of a new object, or an overwrite or delete of an existing object, all subsequent read requests will immediately return the most recent version of an object. This consistency model is guaranteed when you use disk filesystems such as XFS, ZFS or BTRFS for distributed setup. Avoid deploying MinIO to EXT4 and NFS.

MinIO ensures that once a write operation is acknowledged as successful, applications can perform operations on that object such as LIST. This strong consistency guarantee is vital for applications, such as analytics and ML, that require data accuracy and reliability. Without this built-in strong consistency, it's going to take a lot of work for your team to build the applications they need to build while they're also building applications to test for consistency.

MinIO is built to provide strong data consistency in a distributed environment. Here's how MinIO achieves this:

ACID Transactions: MinIO employs ACID (Atomicity, Consistency, Isolation, Durability) transactions to ensure strong consistency. When multiple clients or applications perform concurrent read and write operations, MinIO ensures that these operations are carried out in a way that maintains data consistency.

Locking Mechanisms: MinIO uses locking mechanisms to control access to objects during write operations. When one client is writing to an object, other clients are prevented from accessing it until the write operation is complete. This ensures that updates to the object are not visible to other clients until the write is successful.

Versioning: MinIO supports versioning of objects, which means that every update to an object creates a new version of that object. This allows for point-in-time access to objects and maintains a complete history of changes, which can be helpful for auditing and data integrity purposes.

Distributed Architecture: MinIO is designed for distributed and highly available architectures. It can be deployed across multiple nodes or data centers, and it employs mechanisms like quorum-based replication to maintain data consistency even in the face of network partitions and hardware failures.

Primary Storage is Strictly Consistent

Object storage, a distributed system that spreads data out over many nodes, used to rely on eventual consistency in order to overcome performance challenges. This legacy design choice makes legacy object storage suitable for archiving large files, saving backups and storing video and audio files – but it is not acceptable when deploying object storage against demanding modern analytics and AI/ML workloads. Not knowing whether or not an application is working with the most up-to-date and valid data can create some real headaches, the kind of headaches that go with lost transactions and corrupted customer data.

MinIO ensures that data updates are reflected consistently across all replicas and that once a write operation is confirmed, all subsequent reads will return the updated data, making it an excellent choice for applications that demand strong data consistency, especially in cases where object storage is used as primary storage.

You can download MinIO here. If you have any questions, ping us on hello@min.io or join the Slack community.

Previous Post Next Post

S3 Select Security Modern Data Lakes Apache Presto SQL Performance S3 Brand/Design Golang Programming Cloud Computing Microservices Docker AWS Kubernetes Apache Spark Open Source Benchmarks Integrations SUBNET Edge Computing Sidekick Secure-by-Design Splunk Veeam Intel Apache Nifi Immutability Software Defined Storage VMware Apache Arrow Hybrid Cloud Red Hat OpenShift Multicloud Scalability Cloud Field Day Cloud Native Apache Kafka Architect's Guide Awards Operator's Guide Security Advisory AI/ML AGPLv3 Apache Hadoop SFD Azure GCP Observability Analytics R H20 DirectPV DevOps Apache Iceberg Apache Hudi YouTube Summaries EKS Elastic Load Balancers CI/CD Object Storage Compliance opentelemetry BC/DR Storage Newsletter Predictions Best Practices Dremio New MinIO Features partners Small Files Databases DuckDB PostgreSQL Delta Lake Cloud Repatriation Python Object Lambdas Data Pipelines Cloud Operating Model Webhook ClickHouse Vector Database Events Value Engineering Change Data Capture Enterprise Object Store GitOps Case Study Equinix Certifications Snowflake Repatriation Migration Tabular Databricks