Connect Dremio to MinIO with Self-Signed TLS

Dremio is an open-source, distributed analytics engine that provides a simple, self-service interface for data exploration, transformation, and collaboration. Dremio's architecture is built on top of Apache Arrow, a high-performance columnar memory format, and leverages the Parquet file format for efficient storage. For more on Dremio, please see Getting Started with Dremio.

MinIO is a high-performance, distributed object storage system designed for cloud-native applications. Its combination of scalability and high performance puts every workload, no matter how demanding, within reach. A recent benchmark achieved 325 GiB/s (349 GB/s) on GETs and 165 GiB/s (177 GB/s) on PUTs with just 32 nodes of off-the-shelf NVMe SSDs.

In this tutorial, we’ll show you how to configure Dremio to connect to MinIO, which uses self-signed TLS certificates. This is one of the more common use cases, and we’ve had customers from SUBNET ask time and time again how they can configure something like this.

MinIO and Dremio

Let's create a kind cluster with the following configuration

kind: Cluster

apiVersion: kind.x-k8s.io/v1alpha4

nodes:

  - role: control-plane

  - role: worker

  - role: worker

  - role: worker

  - role: worker

kind create cluster --config kind-config.yml

Deploy the MinIO operator to the kind cluster we created above.

kubectl minio init

Create a MinIO tenant so that we can create a bucket for Demio.

kubectl create ns tenant-ns

kubectl minio tenant create tenant-1 --servers 4 --volumes 4 --capacity 4Gi --namespace tenant-ns

Fetch the MinIO tenant credentials and make a note of them.

kubectl get secrets/tenant-1-user-1 -n tenant-ns -oyaml | yq '.data."CONSOLE_ACCESS_KEY"' | base64 -d

kubectl get secrets/tenant-1-user-1 -n tenant-ns -oyaml | yq '.data."CONSOLE_SECRET_KEY"' | base64 -d

Port forward to the tenant's minio service so we can access it using mc in the next steps.

kubectl port-forward svc/minio -n tenant-ns 9443:443

Create an alias for the tenant and create a sample bucket for testing with Dremio.

mc alias set myminio https://localhost:9443/ WZaBqLMGYViJ0Sba XMPAlfUUM4rnaAnGTxPKzeYYcBiRlUVr --insecure

mc mb myminio/openlake --insecure

Clone the openlake and dremio github repos.

git clone https://github.com/minio/openlake

git clone https://github.com/dremio/dremio-cloud-tools

Copy the MinIO helm values YAML and update them as shown below.

cp ~/openlake/dremio/charts/values.minio.yaml ~/dremio-cloud-tools/charts/dremio_v2/

distStorage:

  type: "aws"


  aws:

bucketName: "openlake"

path: "/dremio"

authentication: "accessKeySecret"

credentials:

  accessKey: "9RW081BM1STLAWQHXS07"

  secret: "L2GCeGRpHUbaQwrCEcW7tnmExuhmUkYN4c2ly49E"


extraProperties: |

  <property>

    <name>fs.s3a.endpoint</name>

    <value>minio.tenant-ns.svc.cluster.local</value>

  </property>

  <property>

    <name>fs.s3a.path.style.access</name>

    <value>true</value>

  </property>

  <property>

    <name>dremio.s3.compat</name>

    <value>true</value>

  </property>

Update dremio helm templates to disable cert checking. Please note there are multiple files where this needs to be updated.

dremio_v2/templates/dremio-coordinator.yaml

    - name: DREMIO_JAVA_SERVER_EXTRA_OPTS

       value: >-

         {{- include "dremio.coordinator.extraStartParams" $ | nindent 12 -}}

         -Dzookeeper=zk-hs:2181

         -Dservices.coordinator.enabled=true

         -Dservices.coordinator.master.enabled=false

         -Dservices.coordinator.master.embedded-zookeeper.enabled=false

         -Dservices.executor.enabled=false

         -Dservices.conduit.port=45679

         -Dcom.amazonaws.sdk.disableCertChecking=true

dremio_v2/templates/dremio-executor.yaml

    - name: DREMIO_JAVA_SERVER_EXTRA_OPTS

       value: >-

         {{- include "dremio.executor.extraStartParams" (list $ $engineName) | nindent 12 -}}

         -Dzookeeper=zk-hs:2181

         -Dservices.coordinator.enabled=false

         -Dservices.coordinator.master.enabled=false

         -Dservices.coordinator.master.embedded-zookeeper.enabled=false

         -Dservices.executor.enabled=true

         -Dservices.conduit.port=45679

         -Dservices.node-tag={{ $engineName }}

         -Dcom.amazonaws.sdk.disableCertChecking=true

dremio_v2/templates/dremio-master.yaml

    - name: DREMIO_JAVA_SERVER_EXTRA_OPTS

       value: >-

         {{- include "dremio.coordinator.extraStartParams" $ | nindent 12 -}}

         -Dzookeeper=zk-hs:2181

         -Dservices.coordinator.enabled=true

         -Dservices.coordinator.master.enabled=true

         -Dservices.coordinator.master.embedded-zookeeper.enabled=false

         -Dservices.executor.enabled=false

         -Dservices.conduit.port=45679

         -Dcom.amazonaws.sdk.disableCertChecking=true

Once all the configs have been updated, install Dremio using helm charts.

helm install dremio dremio_v2 -f dremio_v2/values.minio.yaml --namespace dremio --create-namespace

You may need to wait for few minutes to make sure all Dremio pods come running

Once Dremio is up, verify the new prefixes created in the openlake bucket.

mc ls myminio/openlake/dremio/uploads --insecure

Port forward the dremio-client to access the Dremio console at http://localhost:9047.

kubectl port-forward svc/dremio-client -n dremio 9047

To access the Dremio portal, create a user and load a sample file for running a query to verify as per screenshots below.

Create a new user.

Add a new job.

Set the format.


Test queries to run

Verify the sample CSV file uploaded to the bucket.

mc ls --summarize --recursive myminio/openlake/dremio/uploads --insecure

It's as simple as that.

Final Thoughts

MinIO is built to power Modern Datalakes as well as the data analytics and AI/ML workloads that run on top of them. MinIO includes a number of optimizations for working with large datasets consisting of many small files, a common occurrence within the Modern Datalake.

Perhaps more importantly for data lakes, MinIO guarantees durability and immutability. In addition, MinIO encrypts data in transit and on drives, and regulates access to data using IAM and policy based access controls (PBAC).

If you would like to configure MinIO with Dremio or have any questions be sure to reach out to us on Slack!