Distributed Tracing with MinIO using OpenTelemetry and Jaeger

Distributed Tracing with MinIO using OpenTelemetry and Jaeger

Several years ago when you had a monolithic application, it was fairly easy to debug and diagnose since there was probably only one service with a couple of users. Nowadays systems are broken up into smaller microservices deployed in containers on top of Kubernetes in multiple clusters across different cloud environments. In these kinds of distributed environments, there is a need to observe it all, both the overall picture, and, if need be, at a more granular level.

Observability can be roughly divided into three sub-categories: logging, metrics, and tracing. In this blog post we’ll show you how simple it is to get set up with tracing in your new or existing MinIO application. We’ll build a small MinIO app that does a few basic requests; this will be our base application to which we’ll add tracing to gain a better picture of how system components and functions interact.

Tracing is the term used to describe the activity of recording and observing requests made by an application, and how those requests propagate through a system. When systems are set up in a distributed fashion, we call this distributed tracing as it involves observing the application and its interactions through multiple systems. For example, as a developer your code probably includes  multiple functions, but you are more interested in how long the MinIO functions take to execute and the interdependency of those functions as they are used in the application. Tracing will give the necessary insights by:

  • Identifying performance and latency bottlenecks
  • Finding root cause analysis after major incidents
  • Determining dependencies in multi-service architecture
  • Monitoring transactions within your application

MinIO

We’ll get started with a small MinIO Python application that will show a couple of simple operations. We’ll then later add the code for tracing to measure the time it takes for our code to execute.

Installing MinIO

There are several ways to install MinIO in various environments. In this blog post we’ll launch MinIO with Docker, but in production please be sure to install in a distributed setup.

  • Create a directory on your local machine where MinIO will persist data

$ mkdir -p /minio/data

  • Launch the MinIO container using Docker

$ docker run -d \
  -p 9000:9000 \
  -p 9001:9001 \
  --name minio \
  -v /minio/data:/data \
  -e "MINIO_ROOT_USER=minio" \
  -e "MINIO_ROOT_PASSWORD=minioadmin" \
  quay.io/minio/minio server /data --console-address ":9001"

Note: Keep a copy of the credentials used above, you will need them later to access MinIO.

  • Verify you can access MinIO by logging in using a browser through http://localhost:9001/ with the credentials used to launch the container above.

MinIO SDKs

There are several SDKs that are supported for you to integrate your app with the MinIO API. In this example we’ll use the Python SDK.

  • Install the MinIO Python SDK

$ pip install minio

  • Copy and paste the entire Python script with the included MinIO functions into a local text editor and save it as minio_app.py. You can refer to it as we describe what it does below.

from minio import Minio

# Convenient dict for basic config
config = {
  "dest_bucket":    "processed", # This will be auto created
  "minio_endpoint": "localhost:9000",
  "minio_username": "minio",
  "minio_password": "minioadmin",
}

# Initialize MinIO client
minio_client = Minio(config["minio_endpoint"],
              secure=False,
              access_key=config["minio_username"],
              secret_key=config["minio_password"]
              )

# Create destination bucket if it does not exist
if not minio_client.bucket_exists(config["dest_bucket"]):
  minio_client.make_bucket(config["dest_bucket"])
  print("Destination Bucket '%s' has been created" % (config["dest_bucket"]))

# Create a test object
file_path = "test_object.txt"
f = open(file_path, "w")
f.write("created test object")
f.close()

# Put an object inside the bucket
minio_client.fput_object(config["dest_bucket"], file_path, file_path)

# Get the object from the bucket
minio_client.fget_object(config["dest_bucket"], file_path, config["dest_bucket"] + "/" + file_path)

# Get list of objects
for obj in minio_client.list_objects(config["dest_bucket"]):
  print(obj)
  print("Some objects here")

Let’s walk through the above script. We are invoking some basic operations with the MinIO container we launched in the previous step.

At the very top we’re importing the MinIO Python SDK that we installed earlier and initializing it with default values for

  • MinIO endpoint
  • MinIO username
  • MinIO password
  • Destination bucket name in MinIO

from minio import Minio

# Convenient dict for basic config
config = {
  "dest_bucket":    "processed", # This will be auto created
  "minio_endpoint": "localhost:9000",
  "minio_username": "minio",
  "minio_password": "minioadmin",
}

# Initialize MinIO client
minio_client = Minio(config["minio_endpoint"],
              secure=False,
              access_key=config["minio_username"],
              secret_key=config["minio_password"]
              )

  • Check if the specific destination bucket exists; if not create it.

# Create destination bucket if it does not exist
if not minio_client.bucket_exists(config["dest_bucket"]):
  minio_client.make_bucket(config["dest_bucket"])
  print("Destination Bucket '%s' has been created" % (config["dest_bucket"]))

  • Create a test object to do basic operations with. Here we’re creating a file with some text

# Create a test object
file_path = "test_object.txt"
f = open(file_path, "w")
f.write("created test object")
f.close()

  • Put the test object in the bucket we created earlier

minio_client.fput_object(config["dest_bucket"], file_path, file_path)

  • Get the test object we added in the previous step. On the machine where you are running the script, we’re placing the file at <bucket_name>/<file_path> so it doesn’t conflict with the original

minio_client.fget_object(config["dest_bucket"], file_path, config["dest_bucket"] + "/" + file_path)

  • Get the list of objects in the bucket to confirm our file made it there

for obj in minio_client.list_objects(config["dest_bucket"]):
  print(obj)
  print("Some objects here")

  • The output would be something like below. The file we added is shown as a Python object. Now we know that the object is there.

<minio.datatypes.Object object at 0x109b1d3d0>
Some objects here

  • Run the script using the command below. You should see the new object in the bucket.

$ python minio_app.py

  • Now that we have the object on the machine where you ran the script, let's delete it from our MinIO bucket and verify that no other objects are there.

minio_client.remove_object(config["dest_bucket"], file_path)

for obj in minio_client.list_objects(config["dest_bucket"]):
  print(obj)
  print("No objects here")

  • Since that was the only object, now we can delete the bucket we created earlier

if minio_client.bucket_exists(config["dest_bucket"]):
  minio_client.remove_bucket(config["dest_bucket"])
  print("Destination Bucket '%s' has been removed" % (config["dest_bucket"]))

Remember that we’re using a very simple app using the MinIO SDK for this tutorial. From this, you can easily see how to include tracing in your own app. While it's ideal to add tracing when  building your app, it's never too late to add it and take advantage of the insight that it provides.

OpenTelemetry

OpenTelemetry is a framework that allows you to take traces, metrics, and logs from your app and standardize them in a way that they can then be consumed by a number of exporters, such as Jaeger.

Installing OpenTelemetry

Like MinIO, OpenTelemetry supports a number of SDKs. Some are more feature rich than others, but one of the SDKs that has been built with all the features is the Python SDK. There are two Python packages we need to install

$ pip install opentelemetry-api
$ pip install opentelemetry-sdk

Initialize OpenTelemetry

Once the required packages are installed, import them to the minio_app.py script we started in the previous section.

  • Import the following packages

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

  • Set the service name in the resource attributes so traces are easy to find when searching for them. We named the service my-minio but you can name it anything you want. This code accomplishes that and initializes the trace

resource = Resource(attributes={
  SERVICE_NAME: "my-minio"
})

provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(ConsoleSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

We’ve created all the building blocks; now let's create a span that we can observe. But what is a span? In simple terms, a span is nothing but the start and end of a single request made by the function. There could be parent and child spans; together these form a trace.

Creating a span

One of the first requests we make is to check whether the bucket exists. Let’s create a span for it

with tracer.start_as_current_span("check if bucket exist"):

At this point the script should look something like this

from minio import Minio
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

# Convenient dict for basic config
config = {
  "dest_bucket":    "processed", # This will be auto created
  "minio_endpoint": "localhost:9000",
  "minio_username": "minio",
  "minio_password": "minioadmin",
}

# Initialize MinIO client
minio_client = Minio(config["minio_endpoint"],
              secure=False,
              access_key=config["minio_username"],
              secret_key=config["minio_password"]
              )

# Initialize OpenTelemetry provider
resource = Resource(attributes={
  SERVICE_NAME: "my-minio"
})

provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(ConsoleSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("check if bucket exists"):
# Create destination bucket if it does not exist
  if not minio_client.bucket_exists(config["dest_bucket"]):
    minio_client.make_bucket(config["dest_bucket"])
    print("Destination Bucket '%s' has been created" % (config["dest_bucket"]))



...TRUNCATED...

The spans can emit to multiple exporters but to start we’ll just emit our traces to the CLI. If you run the script above you should see a JSON output at the end, like below. This is your trace.

$ python3 minio_app.py
Destination Bucket 'processed' has been created
<minio.datatypes.Object object at 0x103f36eb0>
Some objects here
Destination Bucket 'processed' has been removed
{
    "name": "check if bucket exists",
    "context": {
        "trace_id": "0xef41e07cf082045a2fc4eea70fd1a6de",
        "span_id": "0x867c14fe1fd97590",
        "trace_state": "[]"
    },
    "kind": "SpanKind.INTERNAL",
    "parent_id": null,
    "start_time": "2022-09-14T20:49:15.569511Z",
    "end_time": "2022-09-14T20:49:15.599886Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {},
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "service.name": "my-minio"
        },
        "schema_url": ""
    }
}

Each span can be customized in a number of ways by adding additional attributes. Let’s add an attribute function.name with value CHECK_BUCKET

with tracer.start_as_current_span("check if bucket exists"):
  current_span = trace.get_current_span()
  current_span.set_attribute("function.name", "CHECK_BUCKET")

If you rerun the script again, in the output you’ll notice there will be a new attribute

...TRUNCATED…

 

   },
    "attributes": {
        "function.name": "CHECK_BUCKET"
    },
    "events": [],


...TRUNCATED...

You can also add events to further enrich the trace

  current_span.add_event("Checking if bucket exists.")
  if not minio_client.bucket_exists(config["dest_bucket"]):
    current_span.add_event("Bucket does not exist, going to create it.")

    minio_client.make_bucket(config["dest_bucket"])

Run the script again and you’ll notice two new events in the JSON output

...TRUNCATED... 


   "events": [
        {
            "name": "Checking if bucket exists.",
            "timestamp": "2022-09-14T21:09:48.505709Z",
            "attributes": {}
        },
        {
            "name": "Bucket does not exist, going to create it.",
            "timestamp": "2022-09-14T21:09:48.514541Z",
            "attributes": {}
        }
    ],


...TRUNCATED...

Adding more spans

So far we’ve only added one span. To make this more useful let’s add some more spans along with additional attributes and events. Generally adding more spans will not degrade the performance of the app since these are asynchronous requests.

With that in mind, here is the script updated with a few additional spans

from minio import Minio
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

# Convenient dict for basic config
config = {
  "dest_bucket":    "processed", # This will be auto created
  "minio_endpoint": "localhost:9000",
  "minio_username": "minio",
  "minio_password": "minioadmin",
}

# Initialize MinIO client
minio_client = Minio(config["minio_endpoint"],
              secure=False,
              access_key=config["minio_username"],
              secret_key=config["minio_password"]
              )

resource = Resource(attributes={
  SERVICE_NAME: "my-minio"
})


provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(ConsoleSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)


tracer = trace.get_tracer(__name__)


# Create destination bucket if it does not exist
with tracer.start_as_current_span("check if bucket exists"):
  current_span = trace.get_current_span()
  current_span.set_attribute("function.name", "CHECK_BUCKET")

  current_span.add_event("Checking if bucket exists.")
  if not minio_client.bucket_exists(config["dest_bucket"]):
    current_span.add_event("Bucket does not exist, going to create it.")

    with tracer.start_as_current_span("create bucket"):
      minio_client.make_bucket(config["dest_bucket"])
      current_span.add_event("Bucket has been created.")
      print("Destination Bucket '%s' has been created" % (config["dest_bucket"]))

with tracer.start_as_current_span("create object to add"):
  current_span = trace.get_current_span()
  current_span.set_attribute("function.name", "CREATE_OBJECT")

  # Create a test object
  file_path = "test_object.txt"
  f = open(file_path, "w")
  f.write("created test object")
  f.close()
  current_span.add_event("Test object has been created.")

# Put an object inside the bucket
with tracer.start_as_current_span("add created object to bucket"):
  current_span = trace.get_current_span()
  current_span.set_attribute("function.name", "CREATE_OBJECT")

  minio_client.fput_object(config["dest_bucket"], file_path, file_path)
  current_span.add_event("Test object has been placed in bucket.")

# Get the object from the bucket
with tracer.start_as_current_span("fetch object from bucket"):
  current_span = trace.get_current_span()
  current_span.set_attribute("function.name", "FETCH_OBJECT")

  minio_client.fget_object(config["dest_bucket"], file_path, config["dest_bucket"] + "/" + file_path)
  current_span.add_event("Test object has been fetched from bucket.")

# Get list of objects
for obj in minio_client.list_objects(config["dest_bucket"]):
  print(obj)
  print("Some objects here")


# Remove the object from bucket
with tracer.start_as_current_span("remove object from bucket"):
  current_span = trace.get_current_span()
  current_span.set_attribute("function.name", "REMOVE_OBJECT")

  minio_client.remove_object(config["dest_bucket"], file_path)
  current_span.add_event("Test object has been removed from bucket.")

# Get list of objects
for obj in minio_client.list_objects(config["dest_bucket"]):
  print(obj)
  print("No objects here")

# Remove destination bucket if it does exist
with tracer.start_as_current_span("check if bucket exists"):
  current_span = trace.get_current_span()
  current_span.set_attribute("function.name", "REMOVE_BUCKET")

  current_span.add_event("Checking if bucket exists.")
  if minio_client.bucket_exists(config["dest_bucket"]):
    current_span.add_event("Bucket exists, going to remove it.")

    with tracer.start_as_current_span("delete bucket"):
      minio_client.remove_bucket(config["dest_bucket"])
      current_span.add_event("Bucket has been removed.")
      print("Destination Bucket '%s' has been removed" % (config["dest_bucket"]))

These are just a few examples; you can go as detailed as you want to make your traces helpful to your team. You can measure

  • The performance of database calls
  • Processing time and performance for AI/ML jobs
  • Latency when connecting to external services

Jaeger

But there is an issue: If you try to run the script now, it will run, but you will be hit with a wall of text that might be far longer and more complex (and therefore not as helpful) than if there was only a single span. In order to make sense of these traces we need a tool that will collect and process them.

There are many tools you can use but Jaeger is one of the most popular ones. It's very easy to get up and running like MinIO, and like MinIO, is very feature rich in order to help you do things like root cause analysis and service dependency analysis, among other things.

Installing Jaeger

We’ll deploy Jaeger as a Docker container and expose the necessary ports

  • Installing the Jaeger container

$ docker run -d --name jaeger -p 16686:16686 -p 6831:6831/udp jaegertracing/all-in-one

  • The above will expose two ports on localhost

6831: the thrift server port, which is the inbound port for accepting traces

16686: the Jaeger UI for you to visualize the traces

Configuring Jaeger

At the moment our traces are emitting to the CLI. Now we’ll slightly modify the Python script to emit them to the Jaeger container we just created.

  • Install OpenTelemetry’s Jaeger exporter

$ pip install opentelemetry-exporter-jaeger

  • Import the package in Python by replacing the following line

from opentelemetry.sdk.trace.export import ConsoleSpanExporter

with

from opentelemetry.exporter.jaeger.thrift import JaegerExporter

  • Add the Jaeger exporter host information

jaeger_exporter = JaegerExporter(
    agent_host_name="localhost",
    agent_port=6831,
)

  • Replace the following line

processor = BatchSpanProcessor(ConsoleSpanExporter())

with

processor = BatchSpanProcessor(jaeger_exporter)

  • The end result would look something like this

...TRUNCATED…


from opentelemetry.exporter.jaeger.thrift import JaegerExporter

...TRUNCATED...

jaeger_exporter = JaegerExporter(
    agent_host_name="localhost",
    agent_port=6831,
)

provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(jaeger_exporter)
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)

tracer = trace.get_tracer(__name__)


...TRUNCATED...

Using Jaeger

Rerun the script. Instead of seeing a JSON blob emitted (remember our wall of text), go to the Jaeger UI and on the left you will see the my-minio service. Select it and then click Find Traces.

So far we’ve only made one request, and there should be a couple of traces from the few spans we created.

Click on one of the traces which shows 2 Spans; let's choose the one that says “check if bucket exists”. You can see all the details in a much more consumable way than just a JSON blob, forever banishing the wall of text and retaking our efficiency.

After running the script five to six times, we can start to see patterns emerge. The time you see above is the time it took for the different spans to execute. We are seeing two spans here because if you recall we added two parent and child spans. This is not only visually more appealing than a monster JSON blob, but it is also more helpful.

You can then send this data from Jaeger to Grafana so you can get historical graphs and can even set up alerts based on certain thresholds. For example, if an AI/ML job is taking longer than ten milliseconds to execute its function, then alert based on the thresholds set. It doesn’t matter which environment your apps are running in; you can ensure you can keep a watch with a single pane of glass.

Once we have the traces, you want to build a library of historical data that you can look back at to see trends and correlations. This is where metrics would come in handy. OpenTelemetry supports an entire cache of metrics and logging frameworks that are worth checking out.

Distributed tracing speeds troubleshooting

Tracing is just one of the pieces in the path towards observability. Generally when incidents happen it's not just one trace or one log or one metric that we use to determine and resolve the issue. Often it is understanding the combination of these things that is required to get to the root cause.

Observability opens the door for automation. We are big fans of automation, but in order to efficiently operate at cloud scale, you need to have a solid foundation and visibility into your logs and performance of applications.

If you need any help with MinIO’s Python SDK or how to implement tracing in your MinIO app don’t hesitate to reach out to us on Slack.