MinIO Event Notification with Apache Nifi
Apache Nifi is one of the most popular open source data flow engines available today. Nifi supports almost all the major enterprise data systems and allows users to create effective, fast, and scalable information flow systems. Creating data flow systems is simple with Nifi and there is a clear path to add support for systems not already available as Nifi Processors. All this has propelled large scale adoption of Nifi.
Several MinIO customers leverage Nifi in their use cases. Those customers are leveraging MinIO for high-performance data lake, often synthesizing multiple different data sets. Nifi allows those customers to route this data to relevant end consumers.
One of common patterns is using MinIO object metadata in Nifi to create custom flows.
Specific use cases may differ, e.g. someone may want to treat csv
and json
files uploaded to MinIO differently, others may want to segregate jpg,png
and pdf
files, someone else may want to convert only the json
files to parquet
and store those back to MinIO – and so on.
In this post, I'll explain how to setup Nifi to listen for MinIO Event Notifications. Then we'll see how to parse MinIO event json
via a Nifi processor. We'll then filter a user defined metadata header from event json
. Finally we'll see how next steps can be taken based on whether the header exists in event json
.
Prerequisites
- MinIO Server running with mc configured to configure event notifications.
- Apache Nifi running, with access to Nifi GUI.
Start Nifi Processor for Webhook
We'll use MinIO Webhook event notification to configure Apache Nifi as the event target. To do this, first create a ListenHTTP
Processor in the Nifi GUI. Then configure it to listen to a certain port. Refer the processor details below:
Configure MinIO Event Notification
Once the Processor is created, configure MinIO event notification for the webhook server we just created.
mc mb myminio/source
mc admin config set myminio notify_webhook:nifi endpoint=http://localhost:8086/contentListener
mc admin service restart myminio
mc event add myminio/source arn:minio:sqs::nifi:webhook --event put
Here, we configured MinIO to send notifications to http://localhost:8086/contentListener
whenever there is a put
event on the bucket source
.
Nifi ListenHTTP
processor is waiting for events at http://localhost:8086/contentListener
as configured in previous step.
Add EvaluateJsonPath Processor
Now that there is communication established between MinIO and Nifi, the next step is to use the EvaluateJsonPath
Nifi Processor. We use this to parse the MinIO event notification json payload and identify if it contains a certain user defined metadata header.
In case the header X-Amz-Meta-key1
is present, we proceed to next step, else we drop the flow here. We also fetch object and bucket name so it can be passed on to the next step.
This is the critical step in the data flow.
In my example I look for header X-Amz-Meta-key1
. You can tweak the metadata field(s) to suit your use case here:
Final Steps
If EvaluateJsonPath
Nifi Processor finds the header we are looking for, we move to the next step. In this example, I chose to fetch the object from MinIO. We use the FetchS3Object
Processor to do this.
You can of course use other Processors here based on your exact use case.
For completion, we save the file on local drive if the object fetch passes, otherwise we log an error. The complete flow looks like this:
Conclusion
Customers inevitably face data flow challenges, and Apache Nifi has emerged as the popular choice to address such challenges. We at MinIO are increasingly seeing use cases in the field where Nifi is used as the data flow orchestrator to build fast, scalable and effective pipelines.
In this post we saw how you can build a data flow system based on MinIO and Nifi, using event notifications and builtin Nifi Processors. We saw how the data flow can inspect event notification payload and identify if there is a certain header available. Based on the outcome of this filter event, we added another step to fetch objects from MinIO and save it locally .
Give it a try on your own. If you don't have MinIO already you can download it here. If you need a little help, check out our documentation. You can also check out our public Slack channel as well.