Continuous Data Protection with MinIO Versioning and Rewind
MinIO includes multiple mechanisms that ensure data is continuously protected. In an earlier blog post, I discussed how MinIO applies erasure coding and BitRot protection on an object level in order to protect data from loss and corruption. With MinIO’s highest parity setting (EC:8), you can lose up to half of the total drives and still recover data.
However, corruption isn’t the only risk to data - the potential exists for accidental or malicious modification, overwrite or deletion. Every one of us can recall opening a data file, modifying it and mistakenly overwriting the original. The risk is magnified in networked settings where multitudes of users and applications may be trying to work with the same file or group of files.
MinIO eliminates this burden completely. Like all object storage solutions, once written, objects cannot be changed. In MinIO, all operations on an object are atomic, and with versioning enabled, no object can be lost. A typical workflow involves a user or application GET-ting the object from a MinIO server, modifying it locally, and then uploading this new version back to the server. The original object becomes part of the system of record as it cannot be modified, and as each new version is saved it’s possible to roll back changes to work with previous versions. This leaves the newer version intact as well, as even newer versions are added.
MinIO versions each object independently following Amazon’s S3 structure/implementation. When each new version of an object is written, it is assigned a unique version ID. Applications can specify a version ID to access a specific version of that object at a specific time. MinIO retains multiple variants of an object in the same bucket and provides a mechanism (tutorial below) to preserve, retrieve and restore every version of every object stored in a bucket.
MinIO versioning does not rely on volume-level snapshots. Such snapshots were a great approach to data protection when we were dealing with small data volumes, but per-object versioning is more granular and efficient. When you make complete point-in-time copies of entire volumes, rolling back to a previous version of an object requires reading in an entire volume’s snapshot just to get to a specific object. Some storage chains snapshots together so they have to be restored sequentially. This process is akin to searching through multiple versions of an entire library just to find a single article - it’s slow and painful.
Continuous data protection, in the form of versioning, is built into MinIO. Versioning is enabled at the bucket level and, as described above, MinIO automatically creates a unique version ID for each version of an object. MinIO protects against accidental deletion by using a delete marker. When a versioned object is deleted it is not removed from the system. Instead, a delete marker is created and becomes the current version of the object. When that object is requested, MinIO returns a 404 Not Found message. Removing the delete marker makes the object visible again. Similarly, if a new version of the object is written, both the old and new versions of the object exist, each with its own unique identifier. Old versions of individual objects can be exposed quickly and easily as required simply by removing their delete flags.
How to Continuously Protect Data with MinIO
When versioning is enabled, MinIO tracks every single operation and never overwrites any object. You can use the MinIO Console, MinIO Client (mc), SDK or the command line to apply versioning and work with different versions of objects.
This powerful feature is easy to use, but there are a few things to keep in mind. Only admins and users with appropriate permissions can change versioning configuration. Once enabled for a bucket, versioning cannot be disabled, it can only be suspended.
Versioning has a tradeoff - while it protects data from unintended actions, it results in larger bucket sizes as buckets hold multiple versions of objects. This can be mitigated using Life Cycle Management to remove versions of objects that are no longer required. MinIO life cycle management tools is a policy-based approach that determines how long data stays on disk before being removed - but that’s for another blog post.
For this tutorial, I’m going to use the MinIO client, or
mc to show you how to view a bucket or object at any point in time and roll back operations such as PUT and DELETE with a single command.
Start by downloading the latest version of MinIO for Kubernetes, Linux, MacOS, Docker or the source code and the MinIO client and install them. If you’re just trying it out, you can use our
play environment by simply downloading
I’m going to start with a directory of files on my local machine. Please note that I run Windows 10 and Ubuntu 20.04 on Windows Subsystem for Linux 2. These are just a few photos of kittens that I’m using as an example. You don’t have to use kitten photos, feel free to substitute files that are more relevant to your use case, like product photos for an ecommerce catalog or log data from network devices. It’s all objects to MinIO.
To create a bucket in the MinIO play test environment type:
./mc mb play/msarrel.
Enable versioning by entering
./mc version enable play/msarrel and I can see versioning has been turned on:
Now I’ll copy my kitten photos to the versioned bucket on play with
./mc cp /mnt/c/Documents\ and\ Settings/msarr/Downloads/kittens/*.* play/msarrel and I can see the files are copied to my bucket on play:
I see that the local files are now objects saved in MinIO when I type
mc ls play/msarrel.
For the purposes of this tutorial, I’ll download a kitten photo from my bucket, modify it slightly on my local machine and then copy it to the same bucket again.
./mc cp /mnt/c/Documents\ and\ Settings/msarr/Downloads/kittens/PXL_20210619_183244637.jpg play/msarrel
The modified version of the object has replaced the original, yet the original version remains. I can see when I enter the command
./mc ls --versions play/msarrel that there are two versions of the object PXL_2021_0619_183244637.jpg. There is a v1 created at the same time as the other objects and a v2 created about an hour later. You can also see that each object version has its own unique identifier making it possible to work with specific object versions directly.
Now comes the fun part: what if my new version of the object is not what I want? If I were running the usual file system, I would be stuck with it because it would overwrite the original version. That’s what happened in my local file system. Save me, MinIO!
MC Rewind - No Object Left Behind
MinIO includes a rewind feature that enables you to list, examine, retrieve or roll back objects as they were previously. MC Rewind is a higher level function that can be applied to most of the MC command set to work with different versions of an object. The
-rewind option is available for
stat so you can work with buckets and objects at different points in time without overwriting anything.
--rewind flag can be invoked in a number of different ways so you can find the previous versions of an object that you’re looking for. The
--rewind flag can be followed by a time interval (for example 3d) or a specific time (for example 2020.03.24T10:00) to work with the version of an object that was active at that time.
Continuing with our example, I need to roll back my object PXL_20210619_183244637.jpg to the previous version. If you recall the output of my
./mc ls --versions I know the version I want was uploaded to MinIO prior to 2021.09.25T10:15 and it is one version older than the current version.
I confirm this by running
./mc ls --rewind 1d play/msarrel and I see what my bucket looked like yesterday before I overwrote PXL_20210619_183244637.jpg.
I can run
./mc undo play/msarrel/PXL_20210619_183244637.jpg --last 1 to revert the most recent PUT operation on my object. If I’m not certain that’s what I want to do, then I can also add the
--dry-run flag to see what happens to the object:
Let’s look at a simple example of rolling back one object. The scenario is that a user tells me that he deleted a file accidentally and asks me to restore it. I can do this in under a minute.
Sure enough, kitten1.jpg is not present in the most current version of our bucket.
The deleted object is still there on MinIO, but the current version is marked as deleted. All I have to do is remove that version and I can work with specific versions using the `--vid` flag with most MC commands. First, I will list all the versions of objects so I can note the version ID of the accidentally deleted object with the delete marker. Then I will remove the version marked as deleted, and finally I will list the contents of the bucket so you can see the object was restored.
./mc ls --versions play/msarrel
./mc rm play/msarrel/kitten1.jpg --vid e7fc7cf4-f4bb-443a-8db9-25c3dd5fa8d1
./mc ls play/msarrel
I can also use the
mc cp command with
--rewind to copy specific versions of objects to another location. In this case, I’m going to copy the old version of PXL_20210619_183244637.jpg to my local file system so I can open it and work with it. I enter
./mc cp --rewind 1d play/msarrel/PXL_20210619_183244637.jpg new.jpg to copy the version of my file from yesterday onto my local directory as new.jpg.
MC Rewind can also be used from the MinIO Console within the Object Browser by navigating to my bucket and clicking on the Rewind icon on the top right and then selecting the date and time to rewind an object to. I’m going to select a time after the initial files were written and before I overwrote my test object. Once I rewind the bucket, I can restore, copy, download, etc on the earlier version of my object.
If you really want to be sure that no version is removed or tampered with, then you can create the bucket with object locking enabled with
./mc mb -l or add it later using
Retention and locking are important concepts if you’re ever faced with an audit after an attack. Let’s say that one day you notice unauthorized access to your bucket. With object locking enabled, no version of an object is ever deleted. They’re immutable and read-only so they can’t be damaged or deleted in an attack. Once you know the auditor’s legal requirements, you can use the
retention command to set governance on a bucket so it cannot be modified until the audit has completed.
Alternatively, if you want to save storage space, you can purge object versions based on date and time. For example, the command
./mc rm play/msarrel/ --recursive --versions --rewind 365d will remove all versions of all objects older than 365 days.
Continuous Data Protection with MinIO Versioning and Rewind
MinIO protects against data loss and corruption through multiple mechanisms. Objects are protected from accidental or malicious overwrite and deletion when versioning is enabled for a bucket. Unlike volume based approaches, MinIO can restore an overwritten or deleted object immediately and with a single command so users can get back to work quickly after what would otherwise be a catastrophic and time-consuming error. While this approach eliminates the need for snapshotting, we recognize that many customers will continue to do this - achieving the belt and suspenders level of comfort.
Download MinIO today so you can see how easy it is to run your own data time machine. If you have any specific questions, drop us a note on email@example.com or join the conversation on Slack. We are here to help.