Connect Dremio to MinIO with Self-Signed TLS
Dremio is an open-source, distributed analytics engine that provides a simple, self-service interface for data exploration, transformation, and collaboration. Dremio's architecture is built on top of Apache Arrow, a high-performance columnar memory format, and leverages the Parquet file format for efficient storage. For more on Dremio, please see Getting Started with Dremio.
MinIO is a high-performance, distributed object storage system designed for cloud-native applications. Its combination of scalability and high performance puts every workload, no matter how demanding, within reach. A recent benchmark achieved 325 GiB/s (349 GB/s) on GETs and 165 GiB/s (177 GB/s) on PUTs with just 32 nodes of off-the-shelf NVMe SSDs.
In this tutorial, we’ll show you how to configure Dremio to connect to MinIO, which uses self-signed TLS certificates. This is one of the more common use cases, and we’ve had customers from SUBNET ask time and time again how they can configure something like this.
MinIO and Dremio
Let's create a kind cluster with the following configuration
Deploy the MinIO operator to the kind cluster we created above.
Create a MinIO tenant so that we can create a bucket for Demio.
Fetch the MinIO tenant credentials and make a note of them.
Port forward to the tenant's minio
service so we can access it using mc
in the next steps.
Create an alias for the tenant and create a sample bucket for testing with Dremio.
Clone the openlake
and dremio
github repos.
Copy the MinIO helm values YAML and update them as shown below.
Update dremio helm templates to disable cert checking. Please note there are multiple files where this needs to be updated.
dremio_v2/templates/dremio-coordinator.yaml
dremio_v2/templates/dremio-executor.yaml
dremio_v2/templates/dremio-master.yaml
Once all the configs have been updated, install Dremio using helm charts.
You may need to wait for few minutes to make sure all Dremio pods come running
Once Dremio is up, verify the new prefixes created in the openlake
bucket.
Port forward the dremio-client to access the Dremio console at http://localhost:9047.
To access the Dremio portal, create a user and load a sample file for running a query to verify as per screenshots below.
Create a new user.
Add a new job.
Set the format.
Test queries to run
Verify the sample CSV file uploaded to the bucket.
It's as simple as that.
Final Thoughts
MinIO is built to power Modern Datalakes as well as the data analytics and AI/ML workloads that run on top of them. MinIO includes a number of optimizations for working with large datasets consisting of many small files, a common occurrence within the Modern Datalake.
Perhaps more importantly for data lakes, MinIO guarantees durability and immutability. In addition, MinIO encrypts data in transit and on drives, and regulates access to data using IAM and policy based access controls (PBAC).
If you would like to configure MinIO with Dremio or have any questions be sure to reach out to us on Slack!