ClickHouse Backup
clickhouse-backup is a long-running Altinity clickhouse-backup server that holds the persistent backup index and exposes an HTTP API. Daily scheduled backups + manual restores are issued by Airflow DAGs that POST to that API; the server itself runs the SQL BACKUP TABLE ... TO Disk('s3_backup', ...) against ClickHouse in embedded mode, so the actual data movement is done by the ClickHouse server using the s3_backup disk it has configured.
Deployment
Section titled “Deployment”| Field | Value |
|---|---|
| Flux path | flux-clusters/stefanzhelev/apps/clickhouse-backup |
| Base path | flux-apps/clickhouse-backup |
| Namespace | clickhouse |
| Sync wave | 7 (after clickhouse) |
| Depends on | clickhouse |
What it deploys
Section titled “What it deploys”- A
Deploymentrunningaltinity/clickhouse-backup:latestwithargs: [server]— single replica,Recreatestrategy so two pods can never compete for the same backup name during a rollout - A
Serviceexposing the API atclickhouse-backup.clickhouse.svc.cluster.local:7171(ClusterIP) - An
emptyDirmount at/var/lib/clickhouse/because the binary needs a writable working directory even in embedded mode (it stages a local index there); the actual data lives on the ClickHouse-sides3_backupdisk so nothing important is lost when the volume is wiped on pod restart
The matching ExternalSecret (clickhouse-backup-secrets, rendering CLICKHOUSE_USERNAME / CLICKHOUSE_PASSWORD from secret/clickhouse) is owned by the clickhouse Kustomization (under clickhouse-external-secrets/) rather than this one.
Embedded mode wiring
Section titled “Embedded mode wiring”The binary doesn’t read or write S3 directly. It opens a SQL connection to ClickHouse, issues BACKUP TABLE ... TO Disk('s3_backup', '<name>'), and ClickHouse server takes care of the upload. That requires four env vars to be set with the CLICKHOUSE_ prefix — the embedded fields live under the binary’s clickhouse: config section, not general:, and without the prefix the binary silently falls back to the FREEZE-based local flow (which writes to the pod’s filesystem and is lost when the pod terminates).
| Env var | Value |
|---|---|
CLICKHOUSE_USE_EMBEDDED_BACKUP_RESTORE | true |
CLICKHOUSE_EMBEDDED_BACKUP_DISK | s3_backup |
CLICKHOUSE_TIMEOUT | 4h (embedded mode rejects the default 30m) |
BACKUPS_TO_KEEP_LOCAL | 14 (rolling 14-backup retention; dependency-aware so a full with incremental children is kept) |
The s3_backup disk itself is configured on the ClickHouse server via config.d/backup.xml (see ClickHouse) and points at the manually-created Hetzner Object Storage bucket stefanzhelev-clickhouse-backup.
API endpoints
Section titled “API endpoints”| Method | Path | Purpose |
|---|---|---|
GET | /backup/list | List backups known to the server’s index |
GET | /backup/status | Last/current operation + state |
GET | /backup/actions?last=N | Recent action history |
POST | /backup/create?name=NAME | Async create (returns 200 with operation_id) |
POST | /backup/restore?name=NAME | Async restore — walks diff chains automatically for incrementals |
POST | /backup/clean_remote_broken | Drop incomplete/broken remote artifacts |
POST operations return immediately with an acknowledgement; consumers poll /backup/status until the response no longer contains "status":"in progress", then inspect /backup/actions to assert outcome.
DAG triggers
Section titled “DAG triggers”Two Airflow DAGs in airflow-dags/ are thin wrappers around the API:
| DAG | Schedule | Behavior |
|---|---|---|
clickhouse_backup | 0 2 * * * | POST /backup/create?name={{ ds }} daily at 02:00, named after the logical date |
clickhouse_restore | manual (schedule=None) | POST /backup/restore?name=<param.backup_name or {{ ds }}> — pass --conf '{"backup_name": "..."}' to choose a different backup |
Each DAG runs a KubernetesPodOperator with curlimages/curl (~10 MiB image, ~30s task overhead) that POSTs the operation, polls until idle, then exits non-zero on any non-success result so Airflow surfaces the failure normally.
Endpoint
Section titled “Endpoint”| Service DNS | clickhouse-backup.clickhouse.svc.cluster.local |
| Port | 7171 |
| Reachable from | in-cluster only (ClusterIP, no Ingress) — invoke via kubectl port-forward for ad-hoc API calls |
Integrations
Section titled “Integrations”- ClickHouse: the binary issues SQL against the ClickHouse server; the
s3_backupdisk lives in ClickHouse’sconfig.d/backup.xml - External Secrets: admin credentials are synced from
secret/clickhouseinto theclickhouse-backup-secretsSecret in theclickhousenamespace - Airflow: schedules + observes the daily backup runs and is the trigger surface for restores
Key commands
Section titled “Key commands”# Pod + API healthkubectl get deploy,pod,svc -n clickhouse -l app=clickhouse-backup
# List backups in the server's indexkubectl exec -n clickhouse deploy/clickhouse-backup -- \ wget -qO- http://localhost:7171/backup/list
# Trigger an ad-hoc backup (named "manual-YYYY-MM-DD-HH-MM-SS")kubectl exec -n clickhouse deploy/clickhouse-backup -- \ wget --post-data="" -qO- "http://localhost:7171/backup/create?name=manual-$(date -u +%Y-%m-%d-%H-%M-%S)"
# Watch progresskubectl exec -n clickhouse deploy/clickhouse-backup -- \ wget -qO- http://localhost:7171/backup/status
# Inspect what's actually on the embedded disk (from the ClickHouse pod)kubectl exec -n clickhouse chi-clickhouse-default-0-0-0 -c clickhouse -- \ ls /var/lib/clickhouse/disks/s3_backup/
# Trigger a restore manually via Airflowkubectl exec -n airflow deploy/airflow-scheduler -c scheduler -- \ airflow dags trigger clickhouse_restore --conf '{"backup_name": "2026-04-26"}'