MinIO Batch Framework Adds Support for Expiry

MinIO Batch Framework Adds Support for Expiry

You can now perform S3 Delete operations using the MinIO Batch Framework to remove multitudes of objects with a single API request. The MinIO Batch Framework lets you quickly and easily perform repetitive or bulk actions like Batch Replication and Batch Key-Rotate across your MinIO deployment. The MinIO Batch Framework handles all the manual work, including managing retries and reporting progress. 

Batch Expiry provides high-performance expiry and permanent deletion based on specified criteria. This is a distributed server-side delete operation to conduct bulk delete in parallel. 

The MinIO Batch Framework eliminates the need for user intervention and can be scheduled to run during off-hours or low system utilization. Object Lambda Notifications are issued when batch processing is complete. Batch jobs are defined using YAML and then run periodically.  

Rationale for Adding Batch Delete Functionality

Information Lifecycle Management (ILM) was designed to manage objects so that they are stored efficiently during their lifecycle. This includes both expiring and tiering objects based on a set of filters (conditions). For expiry, ILM is most effective when used with expiry schedules in the range of 3 months to a year. However, an anti-pattern occurs when the time to expiry is very small. 

ILM runs on a scanner that runs as a background process. The scanner is throttled up and down automatically based on load so it does not interfere with typical S3 API calls like PUT and GET. The scanner is optimized for time-sensitive operations (like responding to applications). 

Batch Expiry performs parallel delete operations that are guaranteed to complete, quickly and efficiently. Objects are selected using conditional formatting to filter on object and metadata tags.    

Getting Started with Batch Expiry

Download and install MinIO. Record the access key and secret key. 

Download and install MinIO Client. Optionally, create an alias to simplify access to MinIO Server. 

Create a bucket and enable versioning. 

mc mb myminio/test
mc version enable myminio/test

Copy some files into the bucket that you just created. The contents of the files are unimportant, we’re simply learning how to use Batch Delete.

Create and define the Batch Delete (replace "test" with the name of your bucket):

mc batch generate myminio/test expire

This creates an expiry.yaml file that you may then edit to configure the replication job.

This file (shown below) contains bucket name, prefix name, filters/flags, notification and retry rule configurations. You can set rules to expire objects by type, name (including wildcards), age, size and a lot more, see the output below.

expire:
  apiVersion: v1
  bucket: mybucket # Bucket where this job will expire matching objects from
  prefix: myprefix # (Optional) Prefix under which this job will expire objects matching the rules below.
  rules:
    - type: object  # objects with zero ore more older versions
      name: NAME # match object names that satisfy the wildcard expression.
      olderThan: 70h # match objects older than this value
      createdBefore: "2006-01-02T15:04:05.00Z" # match objects created before "date"
      tags:
        - key: name
          value: pick* # match objects with tag 'name', all values starting with 'pick'
      metadata:
        - key: content-type
          value: image/* # match objects with 'content-type', all values starting with 'image/'
      size:
        lessThan: 10MiB # match objects with size less than this value (e.g. 10MiB)
        greaterThan: 1MiB # match objects with size greater than this value (e.g. 1MiB)
      purge:
          # retainVersions: 0 # (default) delete all versions of the object. This option is the fastest.
          # retainVersions: 5 # keep the latest 5 versions of the object.

    - type: deleted # objects with delete marker as their latest version
      name: NAME # match object names that satisfy the wildcard expression.
      olderThan: 10h # match objects older than this value (e.g. 7d10h31s)
      createdBefore: "2006-01-02T15:04:05.00Z" # match objects created before "date"
      purge:
          # retainVersions: 0 # (default) delete all versions of the object. This option is the fastest.
          # retainVersions: 5 # keep the latest 5 versions of the object including delete markers.

  notify:
    endpoint: https://notify.endpoint # notification endpoint to receive job completion status
    token: Bearer xxxxx # optional authentication token for the notification endpoint

  retry:
    attempts: 10 # number of retries for the job before giving up
    delay: 500ms # least amount of delay between each retry

As you can see, the parameters listed in the file enable a wide variety of use cases. Copy the YAML from your terminal to a text editor and customize it, then you'll start it in the following step. Each rule defines expiry criteria and purge operations. The comments make the file self-explanatory.

For example, to delete all objects in a bucket that are older than a week:

...
rules:
    - type: object  # objects with zero ore more older versions
      name: NAME # match object names that satisfy the wildcard expression.
      olderThan: 7d # match objects older than this value
...       	

Note that the notification endpoint has not been configured for this tutorial. When configured, notifications will be available at that endpoint when Batch Expiry completes.

To expire all objects that begin with the letter C:

...
rules:
    - type: object  # objects with zero ore more older versions
      name: C*.* # match object names that satisfy the wildcard expression.
      olderThan: 7d # match objects older than this value
...       	

You can create and run multiple Batch jobs at the same time; there are no predefined limits.

Start Batch Expiry with the following:

mc batch start myminio/ ./expiry.yaml
Successfully started 'expire' job `TEu3LMDvdAhAFZKQ3QtSHr:-1` on '2023-12-28 22:50:05.540096697 +0000 UTC'

You will see a message that the expiry job started successfully and the time.

You can also check the status of batch jobs (the job I ran finished quickly):

mc batch status myminio/ TEu3LMDvdAhAFZKQ3QtSHr:-1
mc: Unable to find an active job, attempting to list from previously run jobs
✔ ✔ ✔
JobType:        expire
Objects:        0
FailedObjects:  0
CurrObjName:

When Batch Expiry is complete, you can list bucket contents to verify it was successful. 

mc ls myminio/test

For more information, please see documentation.

MinIO Batch Expiry 

We continue to build out the MinIO Batch Framework with Batch Expiry. Batches are a powerful method of automating operations. Automation is a key enabler of scale. Customer feedback on Batch Expiry, Batch Replication and Batch Key-Rotate tells us that everyone loves automation – and wants more of it.

Download MinIO today and see the MinIO Batch Framework in action. Any questions? Reach out to us on Slack.