Connect Dremio to MinIO with Self-Signed TLS
Dremio is an open-source, distributed analytics engine that provides a simple, self-service interface for data exploration, transformation, and collaboration. Dremio's architecture is built on top of Apache Arrow, a high-performance columnar memory format, and leverages the Parquet file format for efficient storage. For more on Dremio, please see Getting Started with Dremio.
MinIO is a high-performance, distributed object storage system designed for cloud-native applications. Its combination of scalability and high performance puts every workload, no matter how demanding, within reach. A recent benchmark achieved 325 GiB/s (349 GB/s) on GETs and 165 GiB/s (177 GB/s) on PUTs with just 32 nodes of off-the-shelf NVMe SSDs.
In this tutorial, we’ll show you how to configure Dremio to connect to MinIO, which uses self-signed TLS certificates. This is one of the more common use cases, and we’ve had customers from SUBNET ask time and time again how they can configure something like this.
MinIO and Dremio
Let's create a kind cluster with the following configuration
kind: Cluster apiVersion: kind.x-k8s.io/v1alpha4 nodes: - role: control-plane - role: worker - role: worker - role: worker - role: worker |
kind create cluster --config kind-config.yml |
Deploy the MinIO operator to the kind cluster we created above.
kubectl minio init |
Create a MinIO tenant so that we can create a bucket for Demio.
kubectl create ns tenant-ns kubectl minio tenant create tenant-1 --servers 4 --volumes 4 --capacity 4Gi --namespace tenant-ns |
Fetch the MinIO tenant credentials and make a note of them.
kubectl get secrets/tenant-1-user-1 -n tenant-ns -oyaml | yq '.data."CONSOLE_ACCESS_KEY"' | base64 -d kubectl get secrets/tenant-1-user-1 -n tenant-ns -oyaml | yq '.data."CONSOLE_SECRET_KEY"' | base64 -d |
Port forward to the tenant's minio
service so we can access it using mc
in the next steps.
kubectl port-forward svc/minio -n tenant-ns 9443:443 |
Create an alias for the tenant and create a sample bucket for testing with Dremio.
mc alias set myminio https://localhost:9443/ WZaBqLMGYViJ0Sba XMPAlfUUM4rnaAnGTxPKzeYYcBiRlUVr --insecure mc mb myminio/openlake --insecure |
Clone the openlake
and dremio
github repos.
git clone https://github.com/minio/openlake git clone https://github.com/dremio/dremio-cloud-tools |
Copy the MinIO helm values YAML and update them as shown below.
cp ~/openlake/dremio/charts/values.minio.yaml ~/dremio-cloud-tools/charts/dremio_v2/ |
distStorage: type: "aws" aws: bucketName: "openlake" path: "/dremio" authentication: "accessKeySecret" credentials: accessKey: "9RW081BM1STLAWQHXS07" secret: "L2GCeGRpHUbaQwrCEcW7tnmExuhmUkYN4c2ly49E" extraProperties: | <property> <name>fs.s3a.endpoint</name> <value>minio.tenant-ns.svc.cluster.local</value> </property> <property> <name>fs.s3a.path.style.access</name> <value>true</value> </property> <property> <name>dremio.s3.compat</name> <value>true</value> </property> |
Update dremio helm templates to disable cert checking. Please note there are multiple files where this needs to be updated.
dremio_v2/templates/dremio-coordinator.yaml
- name: DREMIO_JAVA_SERVER_EXTRA_OPTS value: >- {{- include "dremio.coordinator.extraStartParams" $ | nindent 12 -}} -Dzookeeper=zk-hs:2181 -Dservices.coordinator.enabled=true -Dservices.coordinator.master.enabled=false -Dservices.coordinator.master.embedded-zookeeper.enabled=false -Dservices.executor.enabled=false -Dservices.conduit.port=45679 -Dcom.amazonaws.sdk.disableCertChecking=true |
dremio_v2/templates/dremio-executor.yaml
- name: DREMIO_JAVA_SERVER_EXTRA_OPTS value: >- {{- include "dremio.executor.extraStartParams" (list $ $engineName) | nindent 12 -}} -Dzookeeper=zk-hs:2181 -Dservices.coordinator.enabled=false -Dservices.coordinator.master.enabled=false -Dservices.coordinator.master.embedded-zookeeper.enabled=false -Dservices.executor.enabled=true -Dservices.conduit.port=45679 -Dservices.node-tag={{ $engineName }} -Dcom.amazonaws.sdk.disableCertChecking=true |
dremio_v2/templates/dremio-master.yaml
- name: DREMIO_JAVA_SERVER_EXTRA_OPTS value: >- {{- include "dremio.coordinator.extraStartParams" $ | nindent 12 -}} -Dzookeeper=zk-hs:2181 -Dservices.coordinator.enabled=true -Dservices.coordinator.master.enabled=true -Dservices.coordinator.master.embedded-zookeeper.enabled=false -Dservices.executor.enabled=false -Dservices.conduit.port=45679 -Dcom.amazonaws.sdk.disableCertChecking=true |
Once all the configs have been updated, install Dremio using helm charts.
helm install dremio dremio_v2 -f dremio_v2/values.minio.yaml --namespace dremio --create-namespace |
You may need to wait for few minutes to make sure all Dremio pods come running
Once Dremio is up, verify the new prefixes created in the openlake
bucket.
mc ls myminio/openlake/dremio/uploads --insecure |
Port forward the dremio-client to access the Dremio console at http://localhost:9047.
kubectl port-forward svc/dremio-client -n dremio 9047 |
To access the Dremio portal, create a user and load a sample file for running a query to verify as per screenshots below.
Create a new user.
Add a new job.
Set the format.
Test queries to run
Verify the sample CSV file uploaded to the bucket.
mc ls --summarize --recursive myminio/openlake/dremio/uploads --insecure |
It's as simple as that.
Final Thoughts
MinIO is built to power Modern Datalakes as well as the data analytics and AI/ML workloads that run on top of them. MinIO includes a number of optimizations for working with large datasets consisting of many small files, a common occurrence within the Modern Datalake.
Perhaps more importantly for data lakes, MinIO guarantees durability and immutability. In addition, MinIO encrypts data in transit and on drives, and regulates access to data using IAM and policy based access controls (PBAC).
If you would like to configure MinIO with Dremio or have any questions be sure to reach out to us on Slack!