Regulatory Compliance with MinIO Object Lambdas

Object Lambdas are a feature in MinIO that enables on-the-fly customization of requested data, making it perfect for scenarios like redacting sensitive information, enriching data, or altering data formats without modifying the original stored data. This approach can be more cost-effective than creating and maintaining multiple up-to-date copies of the same dataset and simplifies data processing within MinIO buckets. 

For a real-life example, consider a situation where a MinIO bucket contains a dataset created from a commerce platform with personally identifiable information (PII), such as credit card numbers. To ensure compliance, object lambdas empower MinIO system administrators to control data as it's retrieved from MinIO. This blog provides a tutorial for this scenario.

Differences between Object Lambdas and MinIO Bucket Notifications

MinIO Object Lambdas are designed for real-time content transformation within objects. Activated by a GET request, the Lambda handler transforms the object data, relaying the modified version back to MinIO without any alterations to the original—a crucial feature for maintaining an immutable record.

MinIO Bucket Notifications empower administrators to dispatch event-driven notifications to external services based on specific object or bucket events, mirroring the functionality of S3 Event Notifications. The selection between asynchronous (swift but with potential event loss) and synchronous (more deliberate but reliable) notifications for remote targets hinges on specific priorities. 

Object Lambdas excel in scenarios requiring dynamic content transformation, while bucket notifications take the lead in orchestrating event-driven alerts for external services. The decision ultimately revolves around the specific use case — whether it pertains to instantaneous object modification or the orchestrated notification of events within the MinIO environment.

Prerequisites

You can clone the repo for this project here.

Create Object Lambda Handler

Install the prerequisites using the requirements.txt file.

pip install -r requirements.txt

‘Flask Run’ the following script

from flask import Flask, request, abort, make_response
import requests
import json

app = Flask(__name__)

@app.route('/', methods=['POST'])
def get_webhook():
    if request.method == 'POST':
        # obtain the request event from the 'POST' call
        event = request.json

        object_context = event["getObjectContext"]

        # Get the presigned URL to fetch the requested
        # original object from MinIO
        s3_url = object_context["inputS3Url"]

        # Extract the route and request token from the input context
        request_route = object_context["outputRoute"]
        request_token = object_context["outputToken"]

        # Get the original S3 object using the presigned URL
        r = requests.get(s3_url)
        original_object = r.content.decode('utf-8')

        # Transform the JSON object by anonymizing the credit card number
        transformed_object = anonymize_credit_card(original_object)

        # Write object back to S3 Object Lambda
        # response sends the transformed data
        # back to MinIO and then to the user
        resp = make_response(transformed_object, 200)
        resp.headers['x-amz-request-route'] = request_route
        resp.headers['x-amz-request-token'] = request_token
        return resp

    else:
        abort(400)

def anonymize_credit_card(original_object):
    # Assume the original_object is a JSON string
    data = json.loads(original_object)

    # Check if the JSON is a list of transactions
    if isinstance(data, list):
        # Anonymize the credit card number in each transaction by keeping only the last four digits
        for transaction in data:
            if 'credit_card_number' in transaction:
                transaction['cc_last_four_digits'] = transaction.pop('credit_card_number')[-4:]

    # Convert the updated data back to JSON
    transformed_object = json.dumps(data)

    return transformed_object

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

An exhaustive breakdown of the commands in this handler script can be found in the documentation and other blog posts. The key part is getObjectContext which has inputs and outputs for connections to MinIO. Here is a breakdown of its attributes:

  • inputS3Url: A presigned URL provided to the Lambda function for downloading the original object. This eliminates the need for the Lambda function to have MinIO credentials, streamlining the focus on object transformation without the burden of credential management.
  • outputRoute: A routing token incorporated into the response headers when the Lambda function yields the transformed object. MinIO utilizes this token for additional verification of the incoming response's validity.
  • outputToken: A token appended to the response headers upon the Lambda function's return of the transformed object. MinIO employs this token to authenticate and validate the integrity of the incoming response.

The meat transformation done by the handler is the anonymize_credit_card(original_object) function. This function checks to make sure that the data from the MinIO bucket has credit card information, then transforms the field so that only the last four digits remain. Finally, this function adds a new key cc_last_four_digits, to each dictionary in the list and deletes the original  credit_card_number key using the .pop() method.

You should get a message like this one indicating the Flask app is up and running.

/Users/brennabuuck/PycharmProjects/objectlambdas/venv/bin/python -m flask run 
 * Serving Flask app 'app.py'
 * Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
Press CTRL+C to quit

Start Up MinIO

Run the following command to start up MinIo.

MINIO_LAMBDA_WEBHOOK_ENABLE_function=on \
MINIO_LAMBDA_WEBHOOK_ENDPOINT_function=http://127.0.0.1:5000 \
minio server ~/data

I’ve named the lambda function target as `function`, but you can give it whatever name you want, for example MINIO_LAMBDA_WEBHOOK_ENABLE_<my_cool_function_name> and so on for the other _LAMBDA_ variables.

After the command is executed, you should see the Object Lambda ARNs in your terminal. 

MinIO Object Storage Server
Copyright: 2015-2023 MinIO, Inc.
License: GNU AGPLv3 <https://www.gnu.org/licenses/agpl-3.0.html>
Version: RELEASE.2023-11-06T22-26-08Z (go1.21.3 darwin/arm64)
Status:         1 Online, 0 Offline. 
S3-API: http://192.168.4.68:9000  http://127.0.0.1:9000                         
RootUser: minioadmin 
RootPass: minioadmin 
Console: http://192.168.4.68:50149 http://127.0.0.1:50149             
RootUser: minioadmin 
RootPass: minioadmin 
Object Lambda ARNs: arn:minio:s3-object-lambda::function:webhook

Generate Data 

Run this script to generate a JSON file called pos_transactions.json with fake point of sale (POS) data, create a MinIO bucket called anon-commerce-data if it doesn’t already exist, and then upload that fake data into the anon-commerce-data bucket.

After running the script your terminal should let you know that the file was generated and uploaded.

Bucket 'anon-commerce-data' created successfully.
File 'fake_pos_transactions.json' uploaded successfully to 'anon-commerce-data' bucket.
minio-object-lambda-data-generator-1 exited with code 0

You can navigate to http://127.0.0.1:9001 to the MinIO Console to verify that the bucket and its data arrived safely. 

The username and password for MinIO defaults to  minioadmin and minioadmin, respectively.

Invoke Lambda transformation

Run this python script below to invoke the object lambda, generate a presigned url, and use that presigned url to retrieve your newly transformed data. In production, you could change the behavior from printing to perhaps uploading into a bucket for further processing by your microservice.

from minio import Minio
from datetime import timedelta
import requests

# Set your Minio server information
minio_endpoint = '127.0.0.1:9000'
minio_access_key = 'minioadmin'
minio_secret_key = 'minioadmin'

# Initialize a Minio client
s3Client = Minio(minio_endpoint, access_key=minio_access_key, secret_key=minio_secret_key, secure=False)

# Set lambda function target via `lambdaArn`
lambda_arn = 'arn:minio:s3-object-lambda::function:webhook'

# Generate presigned GET URL with lambda function
bucket_name = 'anon-commerce-data'
object_name = 'fake_pos_transactions.json'
expiration = timedelta(seconds=1000)  # Expiration time in seconds

req_params = {'lambdaArn': lambda_arn}
presigned_url = s3Client.presigned_get_object(bucket_name, object_name, expires=expiration, response_headers=req_params)
print(presigned_url)

# Use the presigned URL to retrieve the data using requests
response = requests.get(presigned_url)

if response.status_code == 200:
    content = response.content
    print("Transformed data:\n", content)
else:
    print("Failed to download the data. Status code:", response.status_code, "Reason:", response.reason)

Look at the Data

Given a JSON object input looks like this:

[ 
   {
        "transaction_id": 7247,
        "amount": 63.59,
        "currency": "USD",
        "credit_card_number": "3506372205690474",
        "timestamp": "2023-11-08T12:00:00"
    }
]

The output of the invocation script will look like this:

[ 
   {
        "transaction_id": 7247,
        "amount": 63.59,
        "currency": "USD",
        "cc_last_four_digits": "1014",
        "timestamp": "2023-11-08T12:00:00"
    }
]

Conclusion

By following these instructions, you've established a robust foundation for incorporating MinIO Object Lambdas into your data processing pipeline, ensuring data compliance and security in a cost-effective and efficient manner. 

This tutorial shows how easy it is to transform data as it is being requested while showcasing the versatility and user-friendly nature of MinIO's Object Lambdas. Those qualities, along with MinIO’s high performance and scalability, position your data processing infrastructure for optimal functionality and future growth. MinIO's commitment to delivering best-in-class object storage ensures that your regulatory compliance needs can be met seamlessly, regardless of the scale or complexity of your operations.

For questions or comments, visit our Slack or drop us an email at hello@min.io. We would love to hear from you.