Skip to content

dbt

dbt owns the SQL transformation layer of the lakehouse. The fortune project (under dbt/ in the repo) reads raw scraped data out of ClickHouse and produces the sources → calculations → mart model layers consumed by Lightdash, Evidence, and any ad-hoc query that hits ${ENV}__mart.

DirectoryWhat lives there
dbt/models/sources/Typed views over raw ingestion tables, one subdir per scraper (bet365, moderndatastack, …)
dbt/models/calculations/Business-logic transformations
dbt/models/mart/End-user analytics tables — these are what BI tools read
dbt/models/elementary_extensions/Project-specific Elementary views
dbt/lightdash/Lightdash content-as-code (charts, dashboards) — see Lightdash

Schemas follow ${ENV}__${schema} (e.g. dev__mart, main__sources). All models are partitioned by event_date and every dbt invocation passes it as a var.

Day-to-day changes flow through these phases:

Edit models under dbt/models/. Python and dbt versions are pinned in pyproject.toml + uv.lock; install with uv:

Terminal window
uv sync
uv run dbt deps

Build and test the day(s) you touched against a dev profile. dbt-run.sh wraps dbt run over an inclusive [start, end] range, one day per invocation:

Terminal window
uv run dbt build --vars "event_date: 2024-10-01"
./dbt-run.sh --start-date 2024-10-01 --end-date 2024-10-10
uv run dbt test --vars "event_date: 2024-10-01"

All tests filter by event_date so they only validate the date being processed.

Open a PR against stefanzhelev. Two CI workflows fire on dbt/** changes:

WorkflowTriggerWhat it does
lightdash-previewPR open / sync / closeSpins up a per-PR Lightdash preview project (pr-<N>) so reviewers can poke at the new explores in a real UI
dbt-buildPush to stefanzhelev (after merge)Builds the dbt image and pushes it to Harbor

On merge to stefanzhelev, .forgejo/workflows/dbt-build.yml builds dbt/Dockerfile and pushes two tags to Harbor:

harbor.internal.stefanzhelev.com/data-workflows/dbt:latest
harbor.internal.stefanzhelev.com/data-workflows/dbt:<git-sha>

The :<git-sha> tag is what makes a SHA rollback a one-line image-tag bump.

The same merge fires .forgejo/workflows/lightdash-deploy.yml, which renders profiles, runs dbt compile, and lightdash deploy --select tag:lightdash against the live project. Schema is in sync the moment the workflow finishes — see Lightdash for the full pipeline.

Airflow’s KubernetesPodOperator (DAGs in airflow-dags/, e.g. bet365_dbt.py) pulls data-workflows/dbt:latest and runs:

dbt build --vars "event_date: {{ ds }}"

one pod per date. Backfills use airflow dags backfill against the per-year DAGs (bet365_dbt_2016, bet365_dbt_2017, …).

  • Warehouse: ClickHouse, reached as clickhouse-clickhouse.clickhouse.svc.cluster.local:8123 (HTTP — dbt-clickhouse 1.10+ uses clickhouse-connect, not native protocol on 9000)
  • Production profiles.yml: rendered from Vault by the dbt-secrets ExternalSecret in the data-workflows namespace
  • dbt user: owned by the data-workflows-database Tofu module (clickhousedbops + vault providers); the same user doubles as Lightdash’s read user
  • CI profiles: rendered inline by each Forgejo Actions workflow from LIGHTDASH_CLICKHOUSE_* / EVIDENCE_CLICKHOUSE_* secrets
Terminal window
uv run dbt docs generate
uv run dbt docs serve --port 8001
  • ClickHouse: the only warehouse target — every model materializes here
  • Airflow: orchestrates the daily dbt build runs via KubernetesPodOperator
  • Harbor: stores the data-workflows/dbt image
  • Forgejo: hosts the repo and runs the build / deploy workflows
  • Elementary: test logging and anomaly detection layered on top of dbt
  • Lightdash / Evidence: the BI consumers reading ${ENV}__mart