Metrics with MinIO using OpenTelemetry, Flask, and Prometheus

AJ AJ on Observability |
Metrics with MinIO using OpenTelemetry, Flask, and Prometheus

Observability is all about gathering information (traces, logs, metrics) with the goal of improving performance, reliability, and availability. Seldom it's just one of these that would pinpoint the root cause of an event, more often than not it's when we correlate this information to form a narrative is when we’ll have a better understanding.

In the last few blogs on observability, we went through how to implement tracing in MinIO and sending MinIO logs to ElasticSearch. In this blog post we’ll talk about metrics.

From the start MinIO has not only focused efforts on performance and scalability, but also on observability. MinIO has a built-in endpoint /minio/v2/metrics/cluster that Prometheus can scrape and gather metrics from. You can also publish events to Kafka and trigger alerts and other processes to run that are dependent on action performed within MinIO. Finally you can bring all of these metrics together in a pretty dashboard on Grafana and even kick off alerts.

Cloud-native MinIO integrates seamlessly with cloud-native observability tools like OpenTelemetry. We’ll use the same OpenTelemetry API that we used for traces for metrics, just with different sets of exporters and functions.

To give you a feel for the observability process and the depth of information available, we’re going to build a small flask Python application and visualize it in Grafana

  1. We’ll launch MinIO, Prometheus and Grafana containers.
  2. Write a sample Python app using OpenTelemetry to gather various types of metrics such as counters and histograms.
  3. Perform operations on MinIO using MinIO SDK.
  4. Build a Docker image for the sample application we wrote.
  5. Visualize the metrics scraped by Prometheus in our Grafana dashboard.

We’ll walk you through the entire process step by step.

Launching infrastructure

Similar to the last blog, we’ll use a Python application that we write from scratch, except this time we’ll send metrics instead of sending traces.

MinIO

There are several ways to install MinIO in various environments. In this blog post we’ll launch MinIO with Docker, but in production please be sure to install in a distributed setup on Kubernetes.

Create a directory on your local machine where MinIO will persist data

$ mkdir -p /minio/data

Launch the MinIO container using Docker

$ docker run -d \
  -p 9000:9000 \
  -p 9001:9001 \
  --name minio \

   --hostname minio \
  -v /minio/data:/data \
  -e "MINIO_ROOT_USER=minio" \
  -e "MINIO_ROOT_PASSWORD=minioadmin" \
  quay.io/minio/minio server /data --console-address ":9001"

Note: Keep a copy of the credentials used above; you will need them later to access MinIO.

Verify you can access MinIO by logging in using a browser through http://localhost:9001/ with the credentials used to launch the container above.

Prometheus

The metrics we collect from our application need to be stored somewhere. Prometheus is that store in this case, but OpenTelemetry supports other exporters as well.

Start by creating a file called prometheus.yml with the following contents

scrape_configs:
  - job_name: 'metrics-app'
    scrape_interval: 2s
    static_configs:
      - targets: ['metrics-app:8000']

Make a note of the - targets: ['metrics-app:8000']; later when we launch the app in a Docker container, we’ll set the hostname to metrics-app to match the target.

Launch the Prometheus container

docker run -d \
    -p 9090:9090 \
    --name prom \
    --hostname prom \
    -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
    prom/prometheus

Make a note of the hostname and ports above; you’ll need those later when configuring the data source in Grafana.

Grafana

Did you know that Grafana originally started as a fork of Kibana? In this example we're going to use Grafana to provide the visualization for the metrics we collected and stored with MinIO above. By visualizing in Grafana, we can see the patterns emerge that we would otherwise not see by looking at just numbers or logs.

Launch the Grafana container

docker run -d \
  -p 3000:3000 \
  --name grafana \
  --hostname grafana \
  grafana/grafana:latest

Access it via browser http://localhost:3000 and follow the instructions below to setup the Prometheus datasource.

Flask application

Previously we wrote a Python application to send traces but this time we’ll write a Flask web app in Python to have a better understanding of the integrations. Flask is a Python web framework that helps us easily build simple web facing or API applications. In this example we are just showing a stripped down version so that we can demonstrate the techniques succinctly.

Instrumenting metrics

We’ll go ahead and paste the entire Flask application here and then dissect it to delve deeper into the code base on how we instrument the metrics.

Create a file called app.py and paste the following code

from flask import Flask, request, g

from prometheus_client import start_http_server

from minio import Minio


from opentelemetry import metrics
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.instrumentation.flask import FlaskInstrumentor

import time

# Service name is required for most backends
resource = Resource(attributes={
    SERVICE_NAME: "minio-service"
})

# Start Prometheus client
start_http_server(port=8000, addr="0.0.0.0")
reader = PrometheusMetricReader()
provider = MeterProvider(resource=resource, metric_readers=[reader])
metrics.set_meter_provider(provider)
meter = metrics.get_meter(__name__, True)

minio_app_visits_counter = meter.create_counter(
    name="minio_app_visits", description="count the number of visit to various routes", unit="1",
)

minio_request_duration = meter.create_histogram(
    name="minio_request_duration",
    description="measures the duration of the inbound MinIO App HTTP request",
    unit="milliseconds")

config = {
  "minio_endpoint": "minio:9000",
  "minio_username": "minio",
  "minio_password": "minioadmin",
}

minio_client = Minio(config["minio_endpoint"],
              secure=False,
              access_key=config["minio_username"],
              secret_key=config["minio_password"]
              )

app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)

@app.before_request
def before_request_func():
    g.start_time = round(time.time()*1000)
    minio_app_visits_counter.add(1, attributes={"request_path": request.path})

@app.after_request
def after_request_func(response):

    total_request_time = round(time.time()*1000) - g.start_time
    minio_request_duration.record(total_request_time, {"request_path": request.path})

    return response

@app.route('/')
def hello_world():
    return '<h1>Hello World MinIO!</h1>'

@app.route('/makebucket/<name>')
def make_bucket(name):
    if not minio_client.bucket_exists(name):
 
        minio_client.make_bucket(name)

    return f"""
        <html>
        <h3>Make Bucket {name} Successful</h3>
        </html>
        """

@app.route('/removebucket/<name>')
def remove_bucket(name):
    if minio_client.bucket_exists(name):
 
        minio_client.remove_bucket(name)

    return f"""
        <html>
        <h3>Removed Bucket {name} Successful</h3>
        </html>
        """

if __name__ == "__main__":
    app.run(debug=True)

Flask is the web framework that serves our route

from flask import Flask, request, g

In order for the server to scrape our metrics, we need to start a Prometheus client. For now just import the package.

from prometheus_client import start_http_server

For doing the MinIO operations, import the minio package as well.

from minio import Minio

OpenTelemetry is the framework that standardized the metrics we send to Prometheus. We'll import those packages as well.

from opentelemetry import metrics
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
from opentelemetry.instrumentation.flask import FlaskInstrumentor

The Prometheus client is what exposes the metrics we collect to the Prometheus scraper.

start_http_server(port=8000, addr="0.0.0.0")

As a reminder, the port should match the configuration in prometheus.yml.

Create a metric type counter that will allow us to send whole number values. Here we are counting the number of visits to the app. Counters can count things like

  • The number of API visits
  • The number of page faults with each process

minio_app_visits_counter = meter.create_counter(
    name="minio_app_visits", description="count the number of visit to various routes", unit="1",
)

In addition to page visits, we would also like to know how long it took for the request; for that we’ll create a metric type histogram. In this instance, we are measuring the inbound request time in milliseconds, but we can display other metrics such as

  • Object size being uploaded to a specific MinIO bucket
  • The duration it took for the object to be uploaded

minio_request_duration = meter.create_histogram(
    name="minio_request_duration",
    description="measures the duration of the inbound MinIO App HTTP request",
    unit="milliseconds")

Initialize the MinIO client

config = {

  "minio_endpoint": "minio:9000",

  "minio_username": "minio",

  "minio_password": "minioadmin",

}


minio_client = Minio(config["minio_endpoint"],
              secure=False,
              access_key=config["minio_username"],
              secret_key=config["minio_password"]
              )

Initialize Flask App and OpenTelemetry metrics instrumentation

app = Flask(__name__)
FlaskInstrumentor().instrument_app(app)

In def before_request_func(): we are doing two things

  • A counter function to increment by one every time someone visits any of the routes
  • A start time to check the load times in milliseconds

g.start_time = round(time.time()*1000)

minio_app_visits_counter.add(1, attributes={"request_path": request.path})

In def after_request_func(): we are capturing the end time of the request and passing the delta to record the histogram

total_request_time = round(time.time()*1000) - g.start_time
minio_request_duration.record(total_request_time, {"request_path": request.path})

These metrics are encapsulated within these before/after decorators so that we keep the code DRY. You can add any number of additional routes and they will automatically have our custom metrics instrumented.

@app.before_request
def before_request_func():
    ...

@app.after_request
def after_request_func(response):
  ...

    return response

Last but not least, some sample routes doing some operations using the MinIO Python SDK. The cURL calls for these would look something like this

  • curl http://<host>:<port>/makebucket/aj-test
  • curl http://<host>:<port>/removebucket/aj-test

@app.route('/')
def hello_world():
    return '<h1>Hello World MinIO!</h1>'

@app.route('/makebucket/<name>')
def make_bucket(name):
    if not minio_client.bucket_exists(name):
 
        minio_client.make_bucket(name)

    return f"""
        <html>
        <h3>Make Bucket {name} Successful</h3>
        </html>
        """

@app.route('/removebucket/<name>')
def remove_bucket(name):
    if minio_client.bucket_exists(name):
 
        minio_client.remove_bucket(name)

    return f"""
        <html>
        <h3>Removed Bucket {name} Successful</h3>
        </html>
        """

As you can see, our app is very simple, and that is the whole point of it. We want to give you an idea of the foundation but you can take this and run with it and add your own instrumentation.

Build and deploy app

We wrote the app, now we need to build the app as a Docker image so we can deploy it alongside our other infrastructure so everything can talk to each other.

Create a file called Dockerfile and add the following contents

# syntax=docker/dockerfile:1

FROM python:3.8-slim-buster

WORKDIR /metrics_app

COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt

COPY app.py app.py

CMD [ "flask", "run", "--host", "0.0.0.0"]

We need to install all the packages we are importing in our app. Create a file called requirements.txt with the following contents

minio
flask
prometheus_client
opentelemetry-api
opentelemetry-sdk
opentelemetry-exporter-prometheus
opentelemetry-instrumentation-flask

Build the App with the name ot-prom-metrics-app which we’ll use later as the image name to launch the container with.

docker build --tag ot-prom-metrics-app .

Launch the app container with the image we just created on port 5000.

docker run -d \
    -p 5000:5000 \
    --name metrics-app \
    --hostname metrics-app \
    ot-prom-metrics-app

Test by making a bucket using the following curl command.

$ curl http://127.0.0.1:5000/makebucket/aj-test

        <html>
        <h3>Make Bucket aj-test Successful</h3>
        </html>

Now that we have all the pieces deployed, it's time to observe these in Grafana.

Observing metrics

We have an app deployed that can generate some metrics, a data store (prometheus) to keep these metrics, and Grafana for historical views and context.

Grafana queries

Login to Grafana and follow the instructions below

You’ll notice none of our metrics are in there. That is because either you have not made any call against the flask app routes or it was made more than five mins ago. Technically you should see the curl call register that you made in the previous section if you expand the time frame of the visibility, but no matter, let’s make a test call again and then check.

$ curl http://127.0.0.1:5000/makebucket/aj-test

        <html>
        <h3>Make Bucket aj-test Successful</h3>
        </html>

Refresh the Grafana page and follow the instructions below

Well wasn’t that fun? Let’s do some more fun stuff, by doing multiple calls but staggering them. Each subsequent call will take a second longer. In other words, the first call will take one second, the second call will take two seconds, the third call will take three seconds and so on. Go ahead and run the command below.

for i in {1..10}; do curl http://127.0.0.1:5000/makebucket/aj-test && sleep $i; done;

In Grafana click Run query once the above command is done executing. The result should look something like this, notice on the far right? That is our 11 requests or however many you made.


While that was fun and now we know the amount of total visits to the application we don’t know at what rate those visits are being made: Were they one per second? A few every second? This is where the range functions come in handy. Follow the instructions below

Now you have a simple application which you can use to build upon further instrumentation as you see fit. Some of the other common use cases are

  • Monitor the size of the objects uploaded
  • Monitor the rate at which number of objects are being fetched
  • Monitor the rate at which objects are being downloaded

These are just a few examples; the more you instrument, the more observability you’ll have.

Final Thoughts

Observability is multi-faceted; you will often have to look at a combination of traces, metrics, and logs to determine the root cause. You can use Chaos Engineering tools such as Gremlin, ChaosMonkey and the like to break your system and observe the pattern in the metrics.

For example, perhaps you are collecting your HTTP request status and generally you see 200s all the time but suddenly you see a spike of 500s. You go take a look at the logs and you notice a deployment took place recently or a database went down for maintenance. Or, if you are monitoring object storage performance metrics you can correlate any server side issues with this data. It's these types of events that often cause the most pain and having visibility in these cases is paramount.

If you would like to learn more about Chaos Engineering or have a cool implementation of metrics in your MinIO application, reach out to us on Slack! We’d love to see what you have.