Visualize usage patterns in MinIO using Elasticsearch and Beats

MinIO is frequently used to store Elasticsearch snapshots, it makes a safe home for Elasticsearch backups. It is more efficient when used with storage tiering, which decreases total cost of ownership for Elasticsearch, plus you get the added benefits of writing data to MinIO that is immutable, versioned and protected by erasure coding. In addition, saving Elasticsearch snapshots to MinIO object storage makes them available to other cloud native machine learning and analytics applications. In a previous blog we went over details on how to snapshot and restore from Elasticsearch.

Since then, both MinIO and Elasticsearch have grown in their feature set and now we can do more:

We’ll send MinIO journalctl logs to Elasticsearch.
Send logs from MinIO bucket to Elasticsearch.

MinIO is the perfect companion for Elasticsearch because of its industry-leading performance and scalability. MinIO’s combination of scalability and high-performance puts every data-intensive workload, not just Elasticsearch, within reach. MinIO has created a comprehensive blueprint for data infrastructure to support exascale AI and other large scale data lake workloads. It is called the MinIO DataPod. Why? Because exascale data is the reality that is common today in today's enterprise. By sending MinIO service logs to Elasticsearch, we will gain visibility into the operations of MinIO, you can see patterns and alerts in a Kibana graphical interface that would allow you to run further analysis and even alerting based on certain thresholds. For example, you might want to check for trends or bottlenecks and try to identify patterns in workload type or time of day. In this blog post we’ll show you how to visualize these patterns in a consumable way that facilitates insight.

Installing ELK Stack

We’ll go through the most basic way to install Elasticsearch-Logstash-Kibana. We’ll install it on the same node for the sake of simplicity and to ensure we don’t have to worry about opening ports between the nodes.

In production, you should architect these on separate nodes so you can scale the individual components.

Add the apt repo key where not only Elasticsearch but other components such as Logstash and Kibana are also downloaded later in the next steps.
Install the Elasticsearch package.

root@aj-test-1:~# curl -fsSL https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

root@aj-test-1:~# echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list

root@aj-test-1:~# apt-get update

root@aj-test-1:~# apt-get -y install elasticsearch

Start and verify Elasticsearch is working. After starting, even if the status is running, it might take a minute or two for Elasticsearch’s API to respond, so if it timeouts as soon as you start the service, try again after a few minutes.

root@aj-test-1:~# systemctl enable elasticsearch

root@aj-test-1:~# systemctl start elasticsearch

root@aj-test-1:~# curl -X GET "localhost:9200"
{
"name" : "aj-test-1",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "b-ivUmYnRWyXBiwMuljO9w",
"version" : {
"number" : "7.17.6",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "f65e9d338dc1d07b642e14a27f338990148ee5b6",
"build_date" : "2022-08-23T11:08:48.893373482Z",
"build_snapshot" : false,
"lucene_version" : "8.11.1",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}

We’ll use Kibana to visualize our logs. We can use ElasticSearch API as well to read the indices but the graphical interface will make it more user friendly to understand.

Install and start Kibana.

root@aj-test-1:~# apt-get -y install kibana

root@aj-test-1:~# systemctl enable kibana

root@aj-test-1:~# systemctl start kibana

Go to http://localhost:5601 I have pointed kibana.min.io to localhost in /etc/hosts in these examples for better visibility but you can use localhost.

There are no indices at this time, but we’ll add them in the next few steps. In order to process logs to load our indices, we need to install Logstash.

Think of Logstash as the log parser. It doesn’t store anything, it has inputs and outputs, and in between, a bunch of different types of filters. It groks the input data, filters/transforms it, then outputs it variously.

Install Logstash

root@aj-test-1:~# apt-get -y install logstash

Configure Logstash to output to Elasticsearch in the following file

root@aj-test-1:~# vi /etc/logstash/conf.d/01-example.conf

Using the contents below to send to a default index.

output {

elasticsearch {

hosts => ["localhost:9200"]

index => "default-%{+YYYY.MM.dd}"

}

NOTE: Before starting Logstash, run it in debug mode to ensure everything is working as expected. In my case it wasn’t able to find pipelines.yml so I had to manually fix that by symlinking it to the original location. So please verify before proceeding.

root@aj-test-1:~# /usr/share/logstash/bin/logstash --debug

If everything looks good, Enable and Start Logstash service

root@aj-test-1:~# systemctl enable logstash

root@aj-test-1:~# systemctl start logstash

Logstash is technically running at this point but we haven’t configured any input to consume data, only the output, so let’s do that now by installing MinIO and collecting the service logs.

Installing MinIO

In a previous blog we discussed how to configure MinIO as a SystemD service. We’ll use the same principles here except instead of a binary it will be installed as an OS package.

Install the MinIO .deb package. If you are using another OS family you can find other packages here

root@aj-test-1:~# wget https://dl.min.io/server/minio/release/linux-amd64/archive/minio_20220825071705.0.0_amd64.deb -O minio.deb

root@aj-test-1:~# dpkg -i minio.deb

Create a user and group minio-user and minio-user, respectively

root@aj-test-1:~# groupadd -r minio-user
root@aj-test-1:~# useradd -M -r -g minio-user minio-user

Create the data directory for MinIO and set the permissions with the user and group created in the previous step

root@aj-test-1:~# mkdir /opt/minio

root@aj-test-1:~# chown minio-user:minio-user /opt/minio

Enable and Start MinIO service

root@aj-test-1:~# systemctl enable minio

root@aj-test-1:~# systemctl start minio

You can verify MinIO is running either through the console by going to http://localhost:9001 or through mc admin

root@aj-test-1:~# wget https://dl.min.io/client/mc/release/linux-amd64/mc

root@aj-test-1:~# chmod +x mc

root@aj-test-1:~# mv mc /usr/local/bin/mc

root@aj-test-1:~# mc alias set local http://127.0.0.1:9000 minioadmin minioadmin

root@aj-test-1:~# mc admin info local
● 127.0.0.1:9000
Uptime: 5 minutes
Version: 2022-08-25T07:17:05Z
Network: 1/1 OK
Drives: 1/1 OK
Pool: 1

Pools:
1st, Erasure sets: 1, Disks per erasure set: 1

1 drive online, 0 drives offline

If you see messages similar to these, you can be assured that MinIO has started. Later we’ll create a bucket and add some objects to further test MinIO.

Send Journalctl Logs using Journalbeat

Elasticsearch uses Beats to collect data from various sources and there are different types of beats. Journalbeat is one such Beat; from the name you can tell it reads journalctl logs. We’ll read MinIO’s journalctl logs in this example.

Install the Journalbeat package. There are different packages for the most common OSs, but here we’ll install the .DEB package on this Ubuntu-based machine.

root@aj-test-1:~# curl -L -O https://artifacts.elastic.co/downloads/beats/journalbeat/journalbeat-7.15.2-amd64.deb

root@aj-test-1:~# dpkg -i journalbeat-7.15.2-amd64.deb

Update 01-example.conf with the following configuration. This allows Logstash to listen on port 5044 for various Beats, in this case Journalbeat.

root@aj-test-1:~# vi /etc/logstash/conf.d/01-example.conf

input {
beats {
port => 5044
}
}

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "minio-journalctl-logs-%{+YYYY.MM.dd}"
}
}

Restart Logstash for the setting to take effect.

root@aj-test-1:~# systemctl restart logstash

Modify journalbeat.yml to add minio.service explicitly. You can configure various inputs to be more broad and collect all journalctl logs from all services.

root@aj-test-1:~# vi /etc/journalbeat/journalbeat.yml

15 journalbeat.inputs:
16 # Paths that should be crawled and fetched. Possible values files and directories.
17 # When setting a directory, all journals under it are merged.
18 # When empty starts to read from local journal.
19 - id: minio.service
20 paths: []
21 include_matches:
22 - _SYSTEMD_UNIT=minio.service

In the same journalbeat.yml file, modify the output to send to our Logstash Beats port 5044 we configured earlier in 01-example.conf.
Comment out output.elasticsearch because we want the logs to be parsed through Logstash.

122 # ----------------------- Elasticsearch Output ------------------------
123 #output.elasticsearch: ←—----- Comment these lines out
124 # Array of hosts to connect to.
125 #hosts: ["localhost:9200"] ←—----- Comment these lines out

[...TRUNCATED...]

135 # ------------------------- Logstash Output ---------------------------
136 output.logstash: ←—----- UNcomment these lines out
137 # The Logstash hosts
138 hosts: ["localhost:5044"] ←—----- UNcomment these lines out

Enable and Start journalbeat

root@aj-test-1:~# systemctl enable journalbeat

root@aj-test-1:~# systemctl start journalbeat

Use the /_cat/ endpoint to see the new indices. You should see something like below

root@aj-test-1:~# curl -s localhost:9200/_cat/indices | grep -i minio

yellow open minio-journalctl-logs-2022.08.26 J72DiduZQqWZfzt_Ml7Rvw 1 1 24 0 314.8kb 314.8kb

There should already be some logs. If you don’t see the logs you can do one of two things:
Modify the date in the Kibana dashboard to widen the range of logs to show. It could be possible that between the time we started MinIO and installed Journalbeat the logs might have gotten older than the default range in Kibana, which is usually 15 mins. So if you widen the range you should see the old logs.
If you need some fresh journalctl logs immediately, just restart MinIO like below

root@aj-test-1:~# systemctl restart minio

If you go to the Kibana dashboard you should almost immediately see new logs in the index minio-journalctl-logs-*

If you see something similar to the above, the MinIO journalctl logs are in Elasticsearch now.

Read Apache Logs from MinIO using Filebeat

Let's say you already have logs in a MinIO bucket and you want to load them into Elasticsearch. One possible use case for this is archival logs. Sometimes you only want to restore a specific range of logs for analysis, and once you are done analyzing them, you would delete those indices from the cluster. This saves cost by avoiding storing old logs that are not be used on a daily basis.

We’ll use Filebeat’s S3 input to read from a MinIO bucket.

Create a bucket in MinIO to store Apache server logs.

root@aj-test-1:~# mc mb local/apachelogs
Bucket created successfully `local/apachelogs`.

Copy the Apache logs to the bucket

root@aj-test-1:~# mc cp /var/log/apache2/access.log local/apachelogs
/var/log/apache2/access.log: 1.55 KiB / 1.55 KiB ┃▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓┃ 117.96 KiB/s 0s

root@aj-test-1:~# mc ls local/apachelogs
[2022-08-29 18:37:31 UTC] 1.5KiB STANDARD access.log

Download and Install Filebeat

root@aj-test-1:~# curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.4.0-amd64.deb

root@aj-test-1:~# dpkg -i filebeat-8.4.0-amd64.deb

Configure Filebeat with the following in filebeat.yml:
Input from an S3 source, in this case the MinIO apachelogs bucket.
Output to our Logstash processor

root@aj-test-1:~# vi /etc/filebeat/filebeat.yml

15 filebeat.inputs:
16 - type: aws-s3
17 non_aws_bucket_name: apachelogs
18 number_of_workers: 5
19 bucket_list_interval: 60s
20 access_key_id: minioadmin
21 secret_access_key: minioadmin
22 endpoint: http://localhost:9000
23 expand_event_list_from_field: Records

[...TRUNCATED...]

138 # ------------------------ Elasticsearch Output -----------------------
139 #output.elasticsearch: ← Comment these lines
140 # Array of hosts to connect to.
141 #hosts: ["localhost:9200"] ← Comment these lines

[...TRUNCATED...]

151 # -------------------------- Logstash Output --------------------------
152 output.logstash: ← UNcomment these lines
153 # The Logstash hosts
154 hosts: ["localhost:5044"] ← UNcomment these lines

Before we start Filebeat and send logs to Elasticsearch through Logstash, let's update our trusty 01-example.conf Logstash configuration to send these to a new index.

root@aj-test-1:~# vi /etc/logstash/conf.d/01-example.conf

10 index => "apache-%{+YYYY.MM.dd}"

Restart Logstash for the changes to take effect.

root@aj-test-1:~# systemctl restart logstash

If you check Kibana now under apache-* you should see some Apache logs.

Final Thoughts

Previously, we showed you how Elasticsearch indices can be snapshotted, backed up, and restored using MinIO.

In this installment we showed you how to:

Send MinIO Journalctl logs to Elasticsearch
Send Apache logs from a MinIO bucket

You could take this one step further and integrate Filebeat with Kafka notifications. This allows you to leverage MinIO’s bucket notification feature to kick off Filebeat to read the bucket when a new object is added rather than polling it every few seconds.

Now you have better insight into the operations MinIO performs in a pretty Kibana dashboard. Not only that, but instead of storing massive amounts of data in an Elasticsearch cluster, which can get quite expensive, we also showed you how you can load logs from a MinIO bucket ad hoc to an index of your choice. This way once you are done analyzing them you can discard the index. MinIO fits seamlessly into existing DevOps practices and toolchains, making it possible to integrate with a variety of Elasticsearch workflows.

Got questions? Want to get started? Reach out to us on Slack.