Data Authenticity and Integrity in MinIO
Organizations ingest, process and analyze data at enormous scale and speed. In a typical scenario, data gathered from a multitude of sources is pushed to a data store where it is analyzed, used for machine learning training, enriched with additional source data and more. Data may be altered in any number of ways as people and apps work with it, but it remains critical that every party remains confident that they are working with a true dataset. This can only be accomplished when data authenticity and integrity are maintained throughout the entire data lifecycle.
Version control is a critical component in assuring data authenticity and integrity. As teams work with data, multiple versions of data files are created. Each version may be important, as it enables the ability to step back to a version before a particular change was made. Versions can be used simultaneously for different purposes or replace each other. It’s possible that different versions could contain different data due to data cleaning, transformation, enrichment or the addition of data for a specific project.
If data is to have business value, it is critical to track different versions of data to understand the contents of each version to make sure that it is useful, while at the same time preventing unauthorized modification or deletion to make sure it is authentic.
Data authenticity and integrity are provided through a combination of mechanisms:
- Erasure Coding
- Bitrot Protection
- Encryption
- Data Immutability
- Versioning
Let’s take a look at each of these to understand how MinIO protects and versions data as objects.
Erasure Coding
MinIO uses erasure coding on an object level to protect data from loss and corruption, two serious risks to data authenticity and integrity. Erasure coding breaks objects into data and parity blocks, where parity blocks support reconstruction of missing or corrupted data blocks. Data and parity blocks are distributed across nodes and drives within a MinIO cluster. With MinIO’s highest level of protection (8 parity or EC:8), you may lose up to half of the total drives and still recover data.
Erasure coding is very well suited for the typical object storage usage pattern of writing an immutable object once that is read many times. MinIO uses Intel AVX512 instructions to fully leverage host CPU resources across multiple nodes for fast erasure coding reads and writes. Please see my earlier blog post, Object Storage Erasure Coding vs. Block Storage RAID, for a deeper explanation of how MinIO implements erasure coding.
When combined with BitRot protection, erasure coding maintains the integrity and authenticity of objects stored on drives within MinIO. If any version of any object were to become corrupted, MinIO would heal it to continue to provide applications and users with data that can be trusted.
BitRot Protection
BitRot, or silent data corruption, can be a serious threat to data authenticity. BitRot can be caused by a variety of factors such as power current spikes, drive firmware bugs or other drive errors. These errors can create a serious problem because they occur without the user’s knowledge. By the time it’s determined that data was compromised, it may be too late to fix it or regenerate it.
MinIO captures and heals corrupted objects on the fly using an optimized implementation of the HighwayHash algorithm. A hash is computed on READ and verified on WRITE from the application to ensure object integrity, removing the risk of BitRot to data authenticity.
Encryption
Encryption is a mainstay in the data authenticity arsenal. MinIO takes a multi-layered approach to encrypt data when it is transmitted over the network and when it is stored on drives. Encrypting data as it travels across the network maintains confidentiality, authenticity and integrity of data as it is sent between external applications and MinIO, as well as between nodes within the MinIO cluster. MinIO supports the ubiquitous Transport Layer Security (TLS) v 1.2+ to encrypt all network traffic, maintaining end-to-end security.
When writing and reading objects to and from drives, MinIO uses authenticated encryption with associated data (AEAD) to maintain the confidentiality and authenticity of data. AEAD encrypts and authenticates plain text data to produce ciphertext and an authentication code. If an unauthorized access were to corrupt the data, even something as small as changing a single bit, the decryption and verification routine would detect the modification using the authentication code.
MinIO AEAD encryption supports industry standard encryption protocols such as AES-256-GCM and ChaCha20-Poly1305 to secure object data. Organizations can enable automatic bucket-level encryption to en/decrypt objects as they are written to or read from object storage. Combined with TLS, AEAD encryption maintains data authenticity between external applications and MinIO, as well as within and between MinIO clusters and when written to drives.
Immutability and Tamper-Proofing
The next step in maintaining data authenticity is to protect data saved to MinIO from deletion or modification using a combination of object locking, retention, legal holds, governance and compliance. Object locking is combined with versioning (below) to ensure data immutability and eliminate the risk of data tampering completely. These features do more than protect data integrity, they work together to establish an audit trail that proves data authenticity.
Object storage retention rules guarantee that an object is WORM protected for a defined period of time. Object storage retention policy can be set on individual objects or inherited via a bucket default setting. A duration is set in days or years that defines the length of time for which object versions and their associated metadata are protected against deletion.
In addition, objects and buckets are subject to additional controls that combine to ensure data authenticity. Governance mode protects objects from being deleted by standard users and requires elevated permissions to change retention policy or delete an object. Compliance mode is more restrictive and ensures that no one, including the root user can delete an object during its retention period. Finally, Legal Hold protects objects using WORM protection for an indefinite period and can only be removed by an authorized user.
MinIO's object storage retention and data immutability have earned a positive assessment from Cohasset Associates, specifically regarding SEC Rule 17a-4(f), FINRA Rule 4511, and CFTC Regulation 1.31. Rule 17a-4 has specific requirements for electronic data storage, including many aspects of record management, such as the duration, format, quality, availability, and accountability of broker-dealer record retention.
Versioning
Object-level versioning is the final component that MinIO uses to ensure data authenticity. Versioning provides data protection as objects are independently versioned for Amazon’s S3 structure and implementation. MinIO tracks versions in metadata, using a unique ID for each version of a given object. By specifying a version ID, applications can access the point-in-time snapshot of a given object. In an earlier blog post, I explained how MinIO provides continuous data protection using versioning.
Versioning retains multiple variants of an object in the same bucket and provides a mechanism to preserve, read and restore every version of every object stored in a bucket. This protects against unintentional overwrites or deletions, while assuring that data authenticity is maintained throughout the entire lifecycle of the object.
MinIO’s deep versioning features enable multiple users and applications to work with data objects without disrupting other users and applications. As data is transformed, enriched and modified, each version remains intact and authenticity can be demonstrated.
In Data We Trust
Organizations that rely on data for informed decision making must be able to trust that it is accurate and timely. Data authenticity and versioning ensure that each data set is genuine and has a traceable lineage. MinIO relies on a combination of features to ensure that data teams can trust the data they’re working with and feeding into their models.
Download MinIO and start building your object storage cloud. It’s straightforward, software defined for greatest flexibility, and S3 API compatible so it’s ready for your workloads. Any questions? Join our Slack Channel or drop us a note at hello@min.io.