MinIO and Quickwit
MinIO is frequently used to store data from logging, metrics and trace data whether it be ElasticSearch, OpenTelemetry, OpenSearch, OpenObserve or any of the other dozen or so great monitoring solutions. MinIO is more efficient when used with storage tiering, which decreases total cost of ownership for the data stored, plus you get the added benefits of writing data to MinIO that is immutable, versioned and protected by erasure coding. In addition, saving data to MinIO object storage makes it available to other cloud native machine learning and analytics applications.
Quickwit and MinIO share a lot of the same principles. Quickwit is designed for sub-second search straight from object storage allowing true decoupled compute and storage. This means you can store your data on cheap commodity hardware, while MinIO handles the Replication and Integrity of the data. As your needs and requirements change, you scale out your cluster as needed. Quickwit has concepts of Tenants similar to MinIO that are easily isolated and can manage their individual usage.
In today’s post we’ll show you how to setup MinIO and Quickwit with a specific focus on
- Configuring MinIO as a storage provider for Quickwit
- Set up MinIO as a metadata store for Quickwit
Installing MinIO
In a previous blog we discussed how to configure MinIO as a SystemD service. We’ll use the same principles here except instead of a binary it will be installed as an OS package.
- Install the MinIO .deb package. If you are using another OS family you can find other packages here
- Create a user and group
minio-user
andminio-user
, respectively
- Create the data directory for MinIO and set the permissions with the user and group created in the previous step
- Enable and Start MinIO service
- You can verify MinIO is running either through the console by going to
http://localhost:9001
or through mc admin
If you see messages similar to these, you can be assured that MinIO has started. Now we’ll create a bucket and later some objects using Quickwit.
Now we are ready to install Quickwit and configure it with MinIO as the backend.
Configure Quickwit
The Quickwit installer automatically picks the correct binary archive for your environment and then downloads and unpacks it in your working directory. In this case since we are running Ubuntu it will install packages related to that OS but it supports all the popular distributions.
Curl the configuration file and let's modify it to add the MinIO bits.
curl -o quickwit.yaml
https://github.com/quickwit-oss/quickwit/blob/main/config/quickwit.yaml
Open the yaml and first add the credentials to configure MinIO
Next we’ll add the Storage and Metadata store configurations
default_index_root_uri: s3://quickwit/indexes
metastore_uri: s3://quickwit/indexes
Once the above configurations are set in the YAML, save it and close. In order to use it set it as an environment variable and run the service
export QW_CONFIG=./quickwit.yaml
./quickwit run
We can check if its working by browsing the UI at http://localhost:7280
or doing a GET
curl
http://localhost:7280/api/v1/version
Let's create an index configured to receive Stackoverflow posts. You need to create an index configured with a YAML to map your input documents to your index fields and whether these fields should be stored and indexed.
curl -o stackoverflow-index-config.yaml https://raw.githubusercontent.com/quickwit-oss/quickwit/v0.6.4/config/tutorials/stackoverflow/index-config.yaml
Once the index is downloaded create it
./quickwit index create --index-config ./stackoverflow-index-config.yaml
To hydrate the index we just created, we’ll download a sample of the first 10,000 Stackoverflow posts and then feed this data into Quickwit which will store it on MinIO in the backend.
curl -O https://quickwit-datasets-public.s3.amazonaws.com/stackoverflow.posts.transformed-10000.json
./quickwit index ingest --index stackoverflow --input-path stackoverflow.posts.transformed-10000.json --force
As soon as the ingest command finishes you can start querying data by using the search
command
./quickwit index search --index stackoverflow --query "search AND engine"
You can use more advanced features such as aggregations like the following query to find the most popular tags used on the questions in this dataset
Final Thoughts
MinIO is the right choice for Quickwit because of its industry-leading performance and scalability. MinIO’s combination of scalability and high-performance puts every data-intensive workload, not just Quickwit, within reach. MinIO is capable of tremendous performance - a recent benchmark achieved 325 GiB/s (349 GB/s) on GETs and 165 GiB/s (177 GB/s) on PUTs with just 32 nodes of off-the-shelf NVMe SSDs. This makes managing Quickwit with MinIO seamless for log management, distributed tracing, and immutable data such as conversational data, event-based analytics among others.
By storing the data in MinIO, Quickwit can be used as a Grafana datasource for achieving fast visibility into the operations of your application. You can see patterns and set alerts in Grafana's graphical interface that would allow you to run historical analysis and act on anomalies based on certain thresholds. For example, you might want to check for trends or bottlenecks and try to identify patterns in workload type during a specific time of the day.
Got questions? Want to get started? Reach out to us on Slack.