Skip to content

Airflow

Apache Airflow schedules and runs the data pipelines that feed the lakehouse. It’s the orchestrator that ties together extraction, transformation, and load steps across the platform’s storage and compute services.

FieldValue
Flux pathflux-clusters/stefanzhelev/apps/airflow
Base pathflux-apps/airflow
Namespaceairflow
Sync wave1
Depends on
  • HelmRelease pulling the official Apache Airflow chart 1.18.0
  • ExternalSecrets pulling ten secrets from Vault (Fernet key, JWT secret, API secret, broker URL, git-sync credentials, database credentials, webserver credentials, S3 logging credentials)
  • Configuration for git-sync against an external DAGs repository
  • Executor: CeleryExecutor with 2 worker replicas
  • Broker: Built-in Redis
  • Metadata DB: External PostgreSQL via CloudNative-PG, fronted by pgBouncer for connection pooling
  • DAG delivery: git-sync sidecar pulling DAGs from an external Git repository
  • Logging: S3 remote logging so logs survive worker restarts
  • UI: Flower enabled for Celery worker monitoring
  • Vault: every credential is sourced through ExternalSecrets — Airflow itself never holds secret material
  • CloudNative-PG: metadata database
  • External Git: DAG repository synced into the workers
  • S3: remote log storage and pipeline artifact landing zone
Terminal window
# Check Airflow pods
kubectl get pods -n airflow
# View scheduler logs
kubectl logs -n airflow -l component=scheduler
# Port-forward the webserver
kubectl port-forward -n airflow svc/airflow-webserver 8080:8080
# Trigger a DAG
kubectl exec -n airflow deploy/airflow-scheduler -- \
airflow dags trigger <dag-id>