Dremio
Dremio is the lakehouse SQL query engine running over the platform’s S3 data lake. It exposes a single SQL interface across raw and curated data sets and is the query layer for the data products produced by Airflow pipelines.
Deployment
Section titled “Deployment”| Field | Value |
|---|---|
| Flux path | flux-clusters/stefanzhelev/apps/dremio |
| Base path | flux-apps/dremio |
| Namespace | dremio |
| Sync wave | 1 |
| Depends on | — |
What it deploys
Section titled “What it deploys”GitRepositorysource pointing at the externaldremio-cloud-toolschart repo- HelmRelease for the
charts/dremio_v2chart, Dremio OSS 25.1.1 - ExternalSecrets pulling AWS credentials from Vault
- Terraform CR (via Tofu Controller) that materializes those credentials in Vault
- Traefik
Ingressexposing the UI
Configuration highlights
Section titled “Configuration highlights”- Coordinator + 1 executor, executor sized at 4 CPU / 8 GB memory
- ZooKeeper: 3.8.4 (bundled)
- Distributed storage: S3, with AWS credentials pulled from Vault via ExternalSecrets
- CloudCache: enabled for hot data
- Ingress: Traefik to
dremio.local:9047 - Install timeout: 15 minutes (Dremio is slow to initialize)
Integrations
Section titled “Integrations”- Vault + Tofu Controller: AWS access key/secret stored in Vault, synced via ExternalSecrets
- S3: distributed storage and the data lake itself
- External chart repo:
dremio-cloud-toolsGitRepository instead of an OCI Helm registry
Key commands
Section titled “Key commands”kubectl get pods -n dremiokubectl logs -n dremio -l app=dremio-coordinatorkubectl logs -n dremio -l app=dremio-executorkubectl port-forward -n dremio svc/dremio-client 9047:9047