dbt
dbt owns the SQL transformation layer of the lakehouse. The fortune project (under dbt/ in the repo) reads raw scraped data out of ClickHouse and produces the sources → calculations → mart model layers consumed by Lightdash, Evidence, and any ad-hoc query that hits ${ENV}__mart.
Project layout
Section titled “Project layout”| Directory | What lives there |
|---|---|
dbt/models/sources/ | Typed views over raw ingestion tables, one subdir per scraper (bet365, moderndatastack, …) |
dbt/models/calculations/ | Business-logic transformations |
dbt/models/mart/ | End-user analytics tables — these are what BI tools read |
dbt/models/elementary_extensions/ | Project-specific Elementary views |
dbt/lightdash/ | Lightdash content-as-code (charts, dashboards) — see Lightdash |
Schemas follow ${ENV}__${schema} (e.g. dev__mart, main__sources). All models are partitioned by event_date and every dbt invocation passes it as a var.
Workflow phases
Section titled “Workflow phases”Day-to-day changes flow through these phases:
1. Local edit
Section titled “1. Local edit”Edit models under dbt/models/. Python and dbt versions are pinned in pyproject.toml + uv.lock; install with uv:
uv syncuv run dbt deps2. Local validate
Section titled “2. Local validate”Build and test the day(s) you touched against a dev profile. dbt-run.sh wraps dbt run over an inclusive [start, end] range, one day per invocation:
uv run dbt build --vars "event_date: 2024-10-01"./dbt-run.sh --start-date 2024-10-01 --end-date 2024-10-10uv run dbt test --vars "event_date: 2024-10-01"All tests filter by event_date so they only validate the date being processed.
Open a PR against stefanzhelev. Two CI workflows fire on dbt/** changes:
| Workflow | Trigger | What it does |
|---|---|---|
lightdash-preview | PR open / sync / close | Spins up a per-PR Lightdash preview project (pr-<N>) so reviewers can poke at the new explores in a real UI |
dbt-build | Push to stefanzhelev (after merge) | Builds the dbt image and pushes it to Harbor |
4. Image build
Section titled “4. Image build”On merge to stefanzhelev, .forgejo/workflows/dbt-build.yml builds dbt/Dockerfile and pushes two tags to Harbor:
harbor.internal.stefanzhelev.com/data-workflows/dbt:latestharbor.internal.stefanzhelev.com/data-workflows/dbt:<git-sha>The :<git-sha> tag is what makes a SHA rollback a one-line image-tag bump.
5. Lightdash deploy
Section titled “5. Lightdash deploy”The same merge fires .forgejo/workflows/lightdash-deploy.yml, which renders profiles, runs dbt compile, and lightdash deploy --select tag:lightdash against the live project. Schema is in sync the moment the workflow finishes — see Lightdash for the full pipeline.
6. Production run
Section titled “6. Production run”Airflow’s KubernetesPodOperator (DAGs in airflow-dags/, e.g. bet365_dbt.py) pulls data-workflows/dbt:latest and runs:
dbt build --vars "event_date: {{ ds }}"one pod per date. Backfills use airflow dags backfill against the per-year DAGs (bet365_dbt_2016, bet365_dbt_2017, …).
Connections and credentials
Section titled “Connections and credentials”- Warehouse: ClickHouse, reached as
clickhouse-clickhouse.clickhouse.svc.cluster.local:8123(HTTP — dbt-clickhouse 1.10+ usesclickhouse-connect, not native protocol on 9000) - Production profiles.yml: rendered from Vault by the
dbt-secretsExternalSecret in thedata-workflowsnamespace - dbt user: owned by the
data-workflows-databaseTofu module (clickhousedbops + vault providers); the same user doubles as Lightdash’s read user - CI profiles: rendered inline by each Forgejo Actions workflow from
LIGHTDASH_CLICKHOUSE_*/EVIDENCE_CLICKHOUSE_*secrets
uv run dbt docs generateuv run dbt docs serve --port 8001Integrations
Section titled “Integrations”- ClickHouse: the only warehouse target — every model materializes here
- Airflow: orchestrates the daily
dbt buildruns viaKubernetesPodOperator - Harbor: stores the
data-workflows/dbtimage - Forgejo: hosts the repo and runs the build / deploy workflows
- Elementary: test logging and anomaly detection layered on top of dbt
- Lightdash / Evidence: the BI consumers reading
${ENV}__mart