Managing AI workloads with Tagging and Policies

AJ AJ on Modern Data Lakes |
Managing AI workloads with Tagging and Policies

Tags are a valuable way to categorize objects saved to MinIO. Each tag is a key-value pair. You can assign tags to an object when it is saved to MinIO, or you can add them to existing objects. 

You might think that organizing by bucket makes sense, and it does sometimes, but this only gives you the bucket and its prefixes to leverage for organizing data. Yes, object key name prefixes enable sorting and categorization of data, but only in one dimension. Consider the following example:

projects/project1/plan.pdf
projects/project2/estimate.pdf
scans/diagram1.jpg

These key names have the prefixes projects/project1/, projects/project2/ and scans/. Everything under a prefix is one category; one-dimensional and inflexible just like using a directory in a filesystem. There is no way to include diagram1.jpg in either project.

Object tags give you greater power. You now have the ability to categorize by up to ten dimensions. If you want to add the diagram to a project, then all you have to do is tag it appropriately.     

You could tag objects related to each project by the project name, for example

Project=Project1
Project=Project2

You can add multiple tags to an object, allowing you to categorize and organize your data so thoroughly that Marie Kondo would be jealous.

Project=Project 1
Classification=Confidential
DocType=Estimate

Tagging does more than just help you categorize content. You can use tags for fine-grained access control and lifecycle management. As you saw in the above example, you can use tags to label objects containing confidential information. You could even get more fine-grained and tag objects based on the type of confidential information they contain, like Classification=PII

Tagging and Object Lifecycle Management

MinIO Object Lifecycle Management is used to create rules for object transition and object expiry. This is configured on a per-bucket basis, and you can specify a filter to select a subset of objects to apply the rule.

Let's say that you have a web service that stores photos and allows users to edit them. You tag photos as 

type=raw
or
type=editing
or
type=finished

You can have a lifecycle rule with a filter that transitions raw photos to a warm tier of less expensive storage media and another rule that transitions objects as their tags are updated to finished.

Please see mc ilm rule add — MinIO Object Storage for Linux for more details regarding Lifecycle Management rules. 

Tagging and Access Control Policies

MinIO uses AWS-style IAM and PBAC to regulate access to buckets and objects. Object tags enable fine-grained access control for managing permissions. You can grant conditional permissions based on object tags, but you cannot create a policy that grants or denies a user permission to delete or overwrite an object. Condition keys can be used to restrict the tag keys and values that you want to allow.

You can allow a user to only read the objects that have a specific tag and key value. As a reminder, mc admin policy is the command to create and manage policies. MinIO supports tag-based conditionals for policies for specific actions. For example, to limit a user to only reading objects in a bucket that have the deployment: production tag key and value,  use the s3:ExistingObjectTag/<key> in the Condition statement of the policy. Please see Access Management — MinIO Object Storage for Linux for more details. 

Object Tagging Event Notifications

You can set up MinIO event notifications to monitor object tags. This way you'll receive notice when an object tag is added or deleted from an object. Notification will be issued when a tag is PUT on an object or when an existing tag is updated. This is helpful for tracking object status. In the example above, it might be helpful to know when a photo goes from editing to finished.

You could use tags to filter Object Lambda triggers – as objects with certain tags are saved to a bucket a Lambda function is called. This gives you greater specificity than writing Lambdas against an entire bucket. You could have a workflow like Orchestrate Complex Workflows Using Apache Kafka and MinIO and use tags to have some photos get resized to one size and others to a different size.  

How to Tag

Each object can have up to ten tags, with each tag having a unique tag key. Remember that tag keys and values are case-sensitive.  

Managing tags in the MinIO Console is simply a matter of browsing to a bucket and selecting an object, then under Actions select Edit Tags. After you click Save, you will immediately see that the tag has been added to the specified object(s).  

You can apply tags using the MinIO Client. The mc tag set command sets one or more tags on a bucket or object. It's pretty straightforward, as is everything we build at MinIO, to configure tags from the command line. Tags are added as an ampersand-separated list of key-value pairs. Optionally, tags can be applied recursively, using Rewind (--rewind) to set tags only on object versions that existed at a specified time, to all versions or to specific versions in a bucket.

# example mc tag set ALIAS/PATH "TAGS"
mc tag set myminio/mydata "tag1=value1&tag2=value2"

You can work with tags using the MinIO SDK to get, set and delete tags on objects and buckets.  

Tags – An Advantage of Rich Metadata

Using tags to work with data is a direct result of using object storage instead of file or block storage. The ability to create a rich set of tags and use them to filter with no additional latency demonstrates the value of MinIO's integrated metadata architecture. MinIO saves metadata with data, removing the need to query an additional database in order to work with tags. 

Tags are a valuable tool to categorize and work with buckets and objects. Filtering by tags is much more flexible, descriptive and specific than filtering by bucket or path. Cloud-native object storage enables applications to work with data at scale, filtered by tags. 

MinIO's industry-leading S3 compatibility gives your developers the confidence to use features like tags and know that the applications they build will run on MinIO correctly and consistently. The S3 API is the de facto standard for cloud-native applications to address storage. Any AWS S3 alternative must speak the API fluently in order to be effective as object storage in public/private clouds, on-premises and at the edge. 

Tag, you're it! Download MinIO and start tagging today. Join our Slack community, ask questions and learn how to use MinIO as the foundation for your cloud-native application stack.