MinIO Event Notification with Apache Nifi

MinIO Event Notification with Apache Nifi

Apache Nifi is one of the most popular open source data flow engines available today. Nifi supports almost all the major enterprise data systems and allows users to create effective, fast, and scalable information flow systems. Creating data flow systems is simple with Nifi and there is a clear path to add support for systems not already available as Nifi Processors. All this has propelled large scale adoption of Nifi.

Several MinIO customers leverage Nifi in their use cases. Those customers are leveraging MinIO for high-performance data lake, often synthesizing multiple different data sets. Nifi allows those customers to route this data to relevant end consumers.

One of common patterns is using MinIO object metadata in Nifi to create custom flows.

Specific use cases may differ, e.g. someone may want to treat csv and json files uploaded to MinIO differently, others may want to segregate jpg,png and pdf files, someone else may want to convert only the json files to parquet and store those back to MinIO – and so on.

In this post, I'll explain how to setup Nifi to listen for MinIO Event Notifications. Then we'll see how to parse MinIO event json via a Nifi processor. We'll then filter a user defined metadata header from event json. Finally we'll see how next steps can be taken based on whether the header exists in event json.

Prerequisites

Start Nifi Processor for Webhook

We'll use MinIO Webhook event notification to configure Apache Nifi as the event target. To do this, first create a ListenHTTP Processor in the Nifi GUI. Then configure it to listen to a certain port. Refer the processor details below:

ListenHTTP Processor Properties

Configure MinIO Event Notification

Once the Processor is created, configure MinIO event notification for the webhook server we just created.

mc mb myminio/source

mc admin config set myminio notify_webhook:nifi endpoint=http://localhost:8086/contentListener

mc admin service restart myminio

mc event add myminio/source arn:minio:sqs::nifi:webhook --event put

Here, we configured MinIO to send notifications to http://localhost:8086/contentListener whenever there is a put event on the bucket source.

Nifi ListenHTTP processor is waiting for events at http://localhost:8086/contentListener as configured in previous step.

Add EvaluateJsonPath Processor

Now that there is communication established between MinIO and Nifi, the next step is to use the EvaluateJsonPath Nifi Processor. We use this to parse the MinIO event notification json payload and identify if it contains a certain user defined metadata header.

In case the header X-Amz-Meta-key1 is present, we proceed to next step, else we drop the flow here. We also fetch object and bucket name so it can be passed on to the next step.

This is the critical step in the data flow.

In my example I look for header X-Amz-Meta-key1. You can tweak the metadata field(s) to suit your use case here:

EvaluateJsonPath Processor Properties

Final Steps

If EvaluateJsonPath Nifi Processor finds the header we are looking for, we move to the next step. In this example, I chose to fetch the object from MinIO. We use the FetchS3Object Processor to do this.

You can of course use other Processors here based on your exact use case.

FetchS3Object Processor Properties

For completion, we save the file on local drive if the object fetch passes, otherwise we log an error. The complete flow looks like this:

Complete Data Flow

Conclusion

Customers inevitably face data flow challenges, and Apache Nifi has emerged as the popular choice to address such challenges. We at MinIO are increasingly seeing use cases in the field where Nifi is used as the data flow orchestrator to build fast, scalable and effective pipelines.

In this post we saw how you can build a data flow system based on MinIO and Nifi, using event notifications and builtin Nifi Processors. We saw how the data flow can inspect event notification payload and identify if there is a certain header available. Based on the outcome of this filter event, we added another step to fetch objects from MinIO and save it locally .

Give it a try on your own. If you don't have MinIO already you can download it here. If you need a little help, check out our documentation. You can also check out our public Slack channel as well.