Airflow
Apache Airflow schedules and runs the data pipelines that feed the lakehouse. It’s the orchestrator that ties together extraction, transformation, and load steps across the platform’s storage and compute services.
Deployment
Section titled “Deployment”| Field | Value |
|---|---|
| Flux path | flux-clusters/stefanzhelev/apps/airflow |
| Base path | flux-apps/airflow |
| Namespace | airflow |
| Sync wave | 1 |
| Depends on | — |
What it deploys
Section titled “What it deploys”- HelmRelease pulling the official Apache Airflow chart 1.18.0
- ExternalSecrets pulling ten secrets from Vault (Fernet key, JWT secret, API secret, broker URL, git-sync credentials, database credentials, webserver credentials, S3 logging credentials)
- Configuration for git-sync against an external DAGs repository
Configuration highlights
Section titled “Configuration highlights”- Executor: CeleryExecutor with 2 worker replicas
- Broker: Built-in Redis
- Metadata DB: External PostgreSQL via CloudNative-PG, fronted by pgBouncer for connection pooling
- DAG delivery: git-sync sidecar pulling DAGs from an external Git repository
- Logging: S3 remote logging so logs survive worker restarts
- UI: Flower enabled for Celery worker monitoring
Integrations
Section titled “Integrations”- Vault: every credential is sourced through ExternalSecrets — Airflow itself never holds secret material
- CloudNative-PG: metadata database
- External Git: DAG repository synced into the workers
- S3: remote log storage and pipeline artifact landing zone
Key commands
Section titled “Key commands”# Check Airflow podskubectl get pods -n airflow
# View scheduler logskubectl logs -n airflow -l component=scheduler
# Port-forward the webserverkubectl port-forward -n airflow svc/airflow-webserver 8080:8080
# Trigger a DAGkubectl exec -n airflow deploy/airflow-scheduler -- \ airflow dags trigger <dag-id>