Apache Nifi is one of the most popular open source data flow engines available today. Nifi supports almost all the major enterprise data systems and allows users to create effective, fast, and scalable information flow systems. Creating data flow systems is simple with Nifi and there is a clear path to add support for systems not already available as Nifi Processors. All this has propelled large scale adoption of Nifi.
Several MinIO customers leverage Nifi in their use cases. Those customers are leveraging MinIO for high-performance data lake, often synthesizing multiple different data sets. Nifi allows those customers to route this data to relevant end consumers.
One of common patterns is using MinIO object metadata in Nifi to create custom flows.
Specific use cases may differ, e.g. someone may want to treat
json files uploaded to MinIO differently, others may want to segregate
json files to
parquet and store those back to MinIO – and so on.
In this post, I'll explain how to setup Nifi to listen for MinIO Event Notifications. Then we'll see how to parse MinIO event
json via a Nifi processor. We'll then filter a user defined metadata header from event
json. Finally we'll see how next steps can be taken based on whether the header exists in event
- MinIO Server running with mc configured to configure event notifications.
- Apache Nifi running, with access to Nifi GUI.
Start Nifi Processor for Webhook
We'll use MinIO Webhook event notification to configure Apache Nifi as the event target. To do this, first create a
ListenHTTP Processor in the Nifi GUI. Then configure it to listen to a certain port. Refer the processor details below:
Configure MinIO Event Notification
Once the Processor is created, configure MinIO event notification for the webhook server we just created.
mc mb myminio/source mc admin config set myminio notify_webhook:nifi endpoint=http://localhost:8086/contentListener mc admin service restart myminio mc event add myminio/source arn:minio:sqs::nifi:webhook --event put
Here, we configured MinIO to send notifications to
http://localhost:8086/contentListener whenever there is a
put event on the bucket
ListenHTTP processor is waiting for events at
http://localhost:8086/contentListener as configured in previous step.
Add EvaluateJsonPath Processor
Now that there is communication established between MinIO and Nifi, the next step is to use the
EvaluateJsonPath Nifi Processor. We use this to parse the MinIO event notification json payload and identify if it contains a certain user defined metadata header.
In case the header
X-Amz-Meta-key1 is present, we proceed to next step, else we drop the flow here. We also fetch object and bucket name so it can be passed on to the next step.
This is the critical step in the data flow.
In my example I look for header
X-Amz-Meta-key1. You can tweak the metadata field(s) to suit your use case here:
EvaluateJsonPath Nifi Processor finds the header we are looking for, we move to the next step. In this example, I chose to fetch the object from MinIO. We use the
FetchS3Object Processor to do this.
You can of course use other Processors here based on your exact use case.
For completion, we save the file on local drive if the object fetch passes, otherwise we log an error. The complete flow looks like this:
Customers inevitably face data flow challenges, and Apache Nifi has emerged as the popular choice to address such challenges. We at MinIO are increasingly seeing use cases in the field where Nifi is used as the data flow orchestrator to build fast, scalable and effective pipelines.
In this post we saw how you can build a data flow system based on MinIO and Nifi, using event notifications and builtin Nifi Processors. We saw how the data flow can inspect event notification payload and identify if there is a certain header available. Based on the outcome of this filter event, we added another step to fetch objects from MinIO and save it locally .