Skip to content

ClickHouse Backup

clickhouse-backup is a long-running Altinity clickhouse-backup server that holds the persistent backup index and exposes an HTTP API. Daily scheduled backups + manual restores are issued by Airflow DAGs that POST to that API; the server itself runs the SQL BACKUP TABLE ... TO Disk('s3_backup', ...) against ClickHouse in embedded mode, so the actual data movement is done by the ClickHouse server using the s3_backup disk it has configured.

FieldValue
Flux pathflux-clusters/stefanzhelev/apps/clickhouse-backup
Base pathflux-apps/clickhouse-backup
Namespaceclickhouse
Sync wave7 (after clickhouse)
Depends onclickhouse
  • A Deployment running altinity/clickhouse-backup:latest with args: [server] — single replica, Recreate strategy so two pods can never compete for the same backup name during a rollout
  • A Service exposing the API at clickhouse-backup.clickhouse.svc.cluster.local:7171 (ClusterIP)
  • An emptyDir mount at /var/lib/clickhouse/ because the binary needs a writable working directory even in embedded mode (it stages a local index there); the actual data lives on the ClickHouse-side s3_backup disk so nothing important is lost when the volume is wiped on pod restart

The matching ExternalSecret (clickhouse-backup-secrets, rendering CLICKHOUSE_USERNAME / CLICKHOUSE_PASSWORD from secret/clickhouse) is owned by the clickhouse Kustomization (under clickhouse-external-secrets/) rather than this one.

The binary doesn’t read or write S3 directly. It opens a SQL connection to ClickHouse, issues BACKUP TABLE ... TO Disk('s3_backup', '<name>'), and ClickHouse server takes care of the upload. That requires four env vars to be set with the CLICKHOUSE_ prefix — the embedded fields live under the binary’s clickhouse: config section, not general:, and without the prefix the binary silently falls back to the FREEZE-based local flow (which writes to the pod’s filesystem and is lost when the pod terminates).

Env varValue
CLICKHOUSE_USE_EMBEDDED_BACKUP_RESTOREtrue
CLICKHOUSE_EMBEDDED_BACKUP_DISKs3_backup
CLICKHOUSE_TIMEOUT4h (embedded mode rejects the default 30m)
BACKUPS_TO_KEEP_LOCAL14 (rolling 14-backup retention; dependency-aware so a full with incremental children is kept)

The s3_backup disk itself is configured on the ClickHouse server via config.d/backup.xml (see ClickHouse) and points at the manually-created Hetzner Object Storage bucket stefanzhelev-clickhouse-backup.

MethodPathPurpose
GET/backup/listList backups known to the server’s index
GET/backup/statusLast/current operation + state
GET/backup/actions?last=NRecent action history
POST/backup/create?name=NAMEAsync create (returns 200 with operation_id)
POST/backup/restore?name=NAMEAsync restore — walks diff chains automatically for incrementals
POST/backup/clean_remote_brokenDrop incomplete/broken remote artifacts

POST operations return immediately with an acknowledgement; consumers poll /backup/status until the response no longer contains "status":"in progress", then inspect /backup/actions to assert outcome.

Two Airflow DAGs in airflow-dags/ are thin wrappers around the API:

DAGScheduleBehavior
clickhouse_backup0 2 * * *POST /backup/create?name={{ ds }} daily at 02:00, named after the logical date
clickhouse_restoremanual (schedule=None)POST /backup/restore?name=<param.backup_name or {{ ds }}> — pass --conf '{"backup_name": "..."}' to choose a different backup

Each DAG runs a KubernetesPodOperator with curlimages/curl (~10 MiB image, ~30s task overhead) that POSTs the operation, polls until idle, then exits non-zero on any non-success result so Airflow surfaces the failure normally.

Service DNSclickhouse-backup.clickhouse.svc.cluster.local
Port7171
Reachable fromin-cluster only (ClusterIP, no Ingress) — invoke via kubectl port-forward for ad-hoc API calls
  • ClickHouse: the binary issues SQL against the ClickHouse server; the s3_backup disk lives in ClickHouse’s config.d/backup.xml
  • External Secrets: admin credentials are synced from secret/clickhouse into the clickhouse-backup-secrets Secret in the clickhouse namespace
  • Airflow: schedules + observes the daily backup runs and is the trigger surface for restores
Terminal window
# Pod + API health
kubectl get deploy,pod,svc -n clickhouse -l app=clickhouse-backup
# List backups in the server's index
kubectl exec -n clickhouse deploy/clickhouse-backup -- \
wget -qO- http://localhost:7171/backup/list
# Trigger an ad-hoc backup (named "manual-YYYY-MM-DD-HH-MM-SS")
kubectl exec -n clickhouse deploy/clickhouse-backup -- \
wget --post-data="" -qO- "http://localhost:7171/backup/create?name=manual-$(date -u +%Y-%m-%d-%H-%M-%S)"
# Watch progress
kubectl exec -n clickhouse deploy/clickhouse-backup -- \
wget -qO- http://localhost:7171/backup/status
# Inspect what's actually on the embedded disk (from the ClickHouse pod)
kubectl exec -n clickhouse chi-clickhouse-default-0-0-0 -c clickhouse -- \
ls /var/lib/clickhouse/disks/s3_backup/
# Trigger a restore manually via Airflow
kubectl exec -n airflow deploy/airflow-scheduler -c scheduler -- \
airflow dags trigger clickhouse_restore --conf '{"backup_name": "2026-04-26"}'