ClickHouse Backup

clickhouse-backup is a long-running Altinity clickhouse-backup server that holds the persistent backup index and exposes an HTTP API. Daily scheduled backups + manual restores are issued by Airflow DAGs that POST to that API; the server itself runs the SQL BACKUP TABLE ... TO Disk('s3_backup', ...) against ClickHouse in embedded mode, so the actual data movement is done by the ClickHouse server using the s3_backup disk it has configured.

Deployment

Field	Value
Flux path	`flux-clusters/stefanzhelev/apps/clickhouse-backup`
Base path	`flux-apps/clickhouse-backup`
Namespace	`clickhouse`
Sync wave	7 (after `clickhouse`)
Depends on	`clickhouse`

What it deploys

A Deployment running altinity/clickhouse-backup:latest with args: [server] — single replica, Recreate strategy so two pods can never compete for the same backup name during a rollout
A Service exposing the API at clickhouse-backup.clickhouse.svc.cluster.local:7171 (ClusterIP)
An emptyDir mount at /var/lib/clickhouse/ because the binary needs a writable working directory even in embedded mode (it stages a local index there); the actual data lives on the ClickHouse-side s3_backup disk so nothing important is lost when the volume is wiped on pod restart

The matching ExternalSecret (clickhouse-backup-secrets, rendering CLICKHOUSE_USERNAME / CLICKHOUSE_PASSWORD from secret/clickhouse) is owned by the clickhouse Kustomization (under clickhouse-external-secrets/) rather than this one.

Embedded mode wiring

The binary doesn’t read or write S3 directly. It opens a SQL connection to ClickHouse, issues BACKUP TABLE ... TO Disk('s3_backup', '<name>'), and ClickHouse server takes care of the upload. That requires four env vars to be set with the CLICKHOUSE_ prefix — the embedded fields live under the binary’s clickhouse: config section, not general:, and without the prefix the binary silently falls back to the FREEZE-based local flow (which writes to the pod’s filesystem and is lost when the pod terminates).

Env var	Value
`CLICKHOUSE_USE_EMBEDDED_BACKUP_RESTORE`	`true`
`CLICKHOUSE_EMBEDDED_BACKUP_DISK`	`s3_backup`
`CLICKHOUSE_TIMEOUT`	`4h` (embedded mode rejects the default 30m)
`BACKUPS_TO_KEEP_LOCAL`	`14` (rolling 14-backup retention; dependency-aware so a full with incremental children is kept)

The s3_backup disk itself is configured on the ClickHouse server via config.d/backup.xml (see ClickHouse) and points at the manually-created Hetzner Object Storage bucket stefanzhelev-clickhouse-backup.

API endpoints

Method	Path	Purpose
`GET`	`/backup/list`	List backups known to the server’s index
`GET`	`/backup/status`	Last/current operation + state
`GET`	`/backup/actions?last=N`	Recent action history
`POST`	`/backup/create?name=NAME`	Async create (returns 200 with `operation_id`)
`POST`	`/backup/restore?name=NAME`	Async restore — walks diff chains automatically for incrementals
`POST`	`/backup/clean_remote_broken`	Drop incomplete/broken remote artifacts

POST operations return immediately with an acknowledgement; consumers poll /backup/status until the response no longer contains "status":"in progress", then inspect /backup/actions to assert outcome.

DAG triggers

Two Airflow DAGs in airflow-dags/ are thin wrappers around the API:

DAG	Schedule	Behavior
`clickhouse_backup`	`0 2 * * *`	`POST /backup/create?name={{ ds }}` daily at 02:00, named after the logical date
`clickhouse_restore`	manual (`schedule=None`)	`POST /backup/restore?name=<param.backup_name or {{ ds }}>` — pass `--conf '{"backup_name": "..."}'` to choose a different backup

Each DAG runs a KubernetesPodOperator with curlimages/curl (~10 MiB image, ~30s task overhead) that POSTs the operation, polls until idle, then exits non-zero on any non-success result so Airflow surfaces the failure normally.

Endpoint


Service DNS	`clickhouse-backup.clickhouse.svc.cluster.local`
Port	`7171`
Reachable from	in-cluster only (ClusterIP, no Ingress) — invoke via `kubectl port-forward` for ad-hoc API calls

Integrations

ClickHouse: the binary issues SQL against the ClickHouse server; the s3_backup disk lives in ClickHouse’s config.d/backup.xml
External Secrets: admin credentials are synced from secret/clickhouse into the clickhouse-backup-secrets Secret in the clickhouse namespace
Airflow: schedules + observes the daily backup runs and is the trigger surface for restores

Key commands

# Pod + API health
kubectl get deploy,pod,svc -n clickhouse -l app=clickhouse-backup

# List backups in the server's index
kubectl exec -n clickhouse deploy/clickhouse-backup -- \
  wget -qO- http://localhost:7171/backup/list

# Trigger an ad-hoc backup (named "manual-YYYY-MM-DD-HH-MM-SS")
kubectl exec -n clickhouse deploy/clickhouse-backup -- \
  wget --post-data="" -qO- "http://localhost:7171/backup/create?name=manual-$(date -u +%Y-%m-%d-%H-%M-%S)"

# Watch progress
kubectl exec -n clickhouse deploy/clickhouse-backup -- \
  wget -qO- http://localhost:7171/backup/status

# Inspect what's actually on the embedded disk (from the ClickHouse pod)
kubectl exec -n clickhouse chi-clickhouse-default-0-0-0 -c clickhouse -- \
  ls /var/lib/clickhouse/disks/s3_backup/

# Trigger a restore manually via Airflow
kubectl exec -n airflow deploy/airflow-scheduler -c scheduler -- \
  airflow dags trigger clickhouse_restore --conf '{"backup_name": "2026-04-26"}'