MinIO and Quickwit

MinIO and Quickwit

MinIO is frequently used to store data from logging, metrics and trace data whether it be ElasticSearch, OpenTelemetry, OpenSearch, OpenObserve or any of the other dozen or so great monitoring solutions. MinIO is more efficient when used with storage tiering, which decreases total cost of ownership for the data stored, plus you get the added benefits of writing data to MinIO that is immutable, versioned and protected by erasure coding. In addition, saving data to MinIO object storage makes it available to other cloud native machine learning and analytics applications.

Quickwit and MinIO share a lot of the same principles. Quickwit is designed for sub-second search straight from object storage allowing true decoupled compute and storage. This means you can store your data on cheap commodity hardware, while MinIO handles the Replication and Integrity of the data. As your needs and requirements change, you scale out your cluster as needed. Quickwit has concepts of Tenants similar to MinIO that are easily isolated and can manage their individual usage.

In today’s post we’ll show you how to setup MinIO and Quickwit with a specific focus on

  • Configuring MinIO as a storage provider for Quickwit
  • Set up MinIO as a metadata store for Quickwit

Installing MinIO

In a previous blog we discussed how to configure MinIO as a SystemD service. We’ll use the same principles here except instead of a binary it will be installed as an OS package.

  • Install the MinIO .deb package. If you are using another OS family you can find other packages here

root@aj-test-1:~# wget -O minio.deb

root@aj-test-1:~# dpkg -i minio.deb

  • Create a user and group minio-user and minio-user, respectively

root@aj-test-1:~# groupadd -r minio-user
root@aj-test-1:~# useradd -M -r -g minio-user minio-user

  • Create the data directory for MinIO and set the permissions with the user and group created in the previous step

root@aj-test-1:~# mkdir /opt/minio

root@aj-test-1:~# chown minio-user:minio-user /opt/minio

  • Enable and Start MinIO service

root@aj-test-1:~# systemctl enable minio

root@aj-test-1:~# systemctl start minio

  • You can verify MinIO is running either through the console by going to http://localhost:9001 or through mc admin

root@aj-test-1:~# wget

root@aj-test-1:~# chmod +x mc

root@aj-test-1:~# mv mc /usr/local/bin/mc

root@aj-test-1:~# mc alias set local minioadmin minioadmin

root@aj-test-1:~# mc admin info local
  Uptime: 5 minutes
  Version: 2023-11-25T07:17:05Z
  Network: 1/1 OK
  Drives: 1/1 OK
  Pool: 1

  1st, Erasure sets: 1, Disks per erasure set: 1

1 drive online, 0 drives offline

If you see messages similar to these, you can be assured that MinIO has started. Now we’ll create a bucket and later some objects using Quickwit.

root@aj-test-1:~# mc mb local/quickwit

Bucket created successfully `local/quickwit`.

Now we are ready to install Quickwit and configure it with MinIO as the backend.

Configure Quickwit

The Quickwit installer automatically picks the correct binary archive for your environment and then downloads and unpacks it in your working directory. In this case since we are running Ubuntu it will install packages related to that OS but it supports all the popular distributions.

curl -L | sh

cd ./quickwit-v*/
./quickwit --version

Curl the configuration file and let's modify it to add the MinIO bits.

curl -o quickwit.yaml

Open the yaml and first add the credentials to configure MinIO

    flavor: minio
    access_key_id: minioadmin
    secret_access_key: minioadmin

Next we’ll add the Storage and Metadata store configurations

default_index_root_uri: s3://quickwit/indexes

metastore_uri: s3://quickwit/indexes

Once the above configurations are set in the YAML, save it and close. In order to use it set it as an environment variable and run the service

export QW_CONFIG=./quickwit.yaml

./quickwit run

We can check if its working by browsing the UI at http://localhost:7280 or doing a GET

curl http://localhost:7280/api/v1/version

Let's create an index configured to receive Stackoverflow posts. You need to create an index configured with a YAML to map your input documents to your index fields and whether these fields should be stored and indexed.

curl -o stackoverflow-index-config.yaml

Once the index is downloaded create it 

./quickwit index create --index-config ./stackoverflow-index-config.yaml

To hydrate the index we just created, we’ll download a sample of the first 10,000 Stackoverflow posts and then feed this data into Quickwit which will store it on MinIO in the backend.

curl -O

./quickwit index ingest --index stackoverflow --input-path stackoverflow.posts.transformed-10000.json --force

As soon as the ingest command finishes you can start querying data by using the search command

./quickwit index search --index stackoverflow --query "search AND engine"

You can use more advanced features such as aggregations like the following query to find the most popular tags used on the questions in this dataset

curl -XPOST "http://localhost:7280/api/v1/stackoverflow/search" -H 'Content-Type: application/json' -d '{
    "query": "type:question",
    "max_hits": 0,
    "aggs": {
        "foo": {
                "size": 10

Final Thoughts

MinIO is the right choice for Quickwit because of its industry-leading performance and scalability. MinIO’s combination of scalability and high-performance puts every data-intensive workload, not just Quickwit, within reach. MinIO is capable of tremendous performance - a recent benchmark achieved 325 GiB/s (349 GB/s) on GETs and 165 GiB/s (177 GB/s) on PUTs with just 32 nodes of off-the-shelf NVMe SSDs. This makes managing Quickwit with MinIO seamless for log management, distributed tracing, and immutable data such as conversational data, event-based analytics among others.

By storing the data in MinIO, Quickwit can be used as a Grafana datasource for achieving fast visibility into the operations of your application. You can see patterns and set alerts in Grafana's graphical interface that would allow you to run historical analysis and act on anomalies based on certain thresholds. For example, you might want to check for trends or bottlenecks and try to identify patterns in workload type during a specific time of the day.

Got questions? Want to get started? Reach out to us on Slack.

Previous Post Next Post