Leading the Way: MinIO's Conditional Write Feature for Modern Data Workloads

Leading the Way: MinIO's Conditional Write Feature for Modern Data Workloads

When AWS S3 speaks, people listen. Last week they announced the conditional write feature. But this isn't breaking news for us at MinIO—we initially merged support for conditional writes back in February 2023, and many of our customers have been using it ever since. Given AWS’s recent announcement and the heightened interest it brings, we thought it would be a good time to revisit this powerful feature and explain how it can benefit you.

Being Specific: What is MinIO’s Conditional Write Feature?

MinIO’s conditional write feature leverages optimistic concurrency control using the If-Match and If-None-Match HTTP headers with S3-compatible uploads. Writes can occur if one of two conditions are met in the event of object namespace collision:

  • The object exists and matches what the client wants to write.
  • The object exists but does not match what the client wants to write.

The If-Match header indicates that a PUT should only succeed if the object being written has a matching ETag/MD5 checksum as the existing object. Conversely, the If-None-Match header indicates that a PUT should succeed only if the object being written has a different ETag/MD5 checksum than the existing object. While AWS only supports the If-None-Match condition, MinIO supports both If-None-Match and If-Match, providing a more complete implementation of RFC 7232. This additional support gives MinIO users greater control over how object updates are handled in distributed environments.

The Process:

  1. Retrieve the Object: First, you retrieve an object from MinIO. The response includes an HTTP ETag Header value that uniquely identifies the current version of the object.
  2. Prepare for Update: When preparing to upload or update the object, include the ETag value you received in the If-Match conditional header of your upload request. MinIO will compare this ETag value with the current ETag of the object.
  3. Check for Changes: If MinIO detects that the object's current ETag value is different from the ETag specified in the If-Match header, it will not perform the upload. Instead, it will return an HTTP status code 412 (Precondition Failed). This response indicates that another process has modified the object since you last retrieved it, suggesting that you should fetch the object again to get the latest version.
  4. Proceed with Update: If the current ETag value of the object matches the ETag in the If-Match header, MinIO proceeds with the upload, successfully updating the object and its ETag value. This confirmation ensures that the object’s state remains consistent and prevents unintended overwrites.

By utilizing the If-Match and If-None-Match headers, MinIO provides reliable and safe handling of concurrent uploads. This feature ensures that only consistent and expected updates are applied to objects, effectively managing the challenges of concurrency in distributed systems.

Use Cases for Conditional Writes

In the era of AI and data lakehouses, multiple clients or processes frequently attempt to update the same object simultaneously. MinIO’s conditional write feature is particularly advantageous in these high-concurrency environments. AI training models and machine learning workflows depend on data consistency and accuracy to deliver reliable results, making it essential for writes to function as expected. In these high-concurrency scenarios, multiple layers of control and protection are invaluable. While all three open table formats—Apache Iceberg, Apache Hudi, and Delta Lake—offer concurrency control, MinIO’s conditional write feature adds another critical layer. It ensures that data pipelines remain reliable, preventing unintended modifications that could compromise training outcomes. Here is what that could look like in practice:

If-None-Match for Preventing Lost Updates: One specific use case for If-None-Match is saving a file that may or may not exist, ensuring that no other process has uploaded the same object before your update, which would risk overwriting previous data. This technique directly addresses the lost update problem, where a client might overwrite changes made by another client without realizing those changes were made. With If-None-Match, if another client uploads an object with the same name before your update, your upload will fail, ensuring the data integrity of the prior upload remains intact. This is especially useful when multiple processes run simultaneous uploads or writes.

If-Match for Metadata-Only Updates: The If-Match header becomes invaluable for performing metadata-only updates to an object, allowing you to guarantee that the underlying data hasn't changed. For example, when you want to update an object’s metadata (e.g., adding or updating tags) without modifying the object itself, If-Match ensures the data integrity is maintained by only proceeding with the update if the object’s ETag matches the one you specify. This prevents accidental overwrites when other processes might have updated the data. 

MinIO’s early implementation of conditional writes showcases our commitment to innovation and customer needs. As object storage increasingly becomes the primary choice for all types of data workloads, MinIO continues to lead the way, providing cutting-edge functionality that ensures reliability, scalability, and performance. Let us know what you’re building with MinIO’s features at hello@min.io or on our Slack channel.