Skip to content

Dremio

Dremio is the lakehouse SQL query engine running over the platform’s S3 data lake. It exposes a single SQL interface across raw and curated data sets and is the query layer for the data products produced by Airflow pipelines.

FieldValue
Flux pathflux-clusters/stefanzhelev/apps/dremio
Base pathflux-apps/dremio
Namespacedremio
Sync wave1
Depends on
  • GitRepository source pointing at the external dremio-cloud-tools chart repo
  • HelmRelease for the charts/dremio_v2 chart, Dremio OSS 25.1.1
  • ExternalSecrets pulling AWS credentials from Vault
  • Terraform CR (via Tofu Controller) that materializes those credentials in Vault
  • Traefik Ingress exposing the UI
  • Coordinator + 1 executor, executor sized at 4 CPU / 8 GB memory
  • ZooKeeper: 3.8.4 (bundled)
  • Distributed storage: S3, with AWS credentials pulled from Vault via ExternalSecrets
  • CloudCache: enabled for hot data
  • Ingress: Traefik to dremio.local:9047
  • Install timeout: 15 minutes (Dremio is slow to initialize)
  • Vault + Tofu Controller: AWS access key/secret stored in Vault, synced via ExternalSecrets
  • S3: distributed storage and the data lake itself
  • External chart repo: dremio-cloud-tools GitRepository instead of an OCI Helm registry
Terminal window
kubectl get pods -n dremio
kubectl logs -n dremio -l app=dremio-coordinator
kubectl logs -n dremio -l app=dremio-executor
kubectl port-forward -n dremio svc/dremio-client 9047:9047