TL;DR — Quick Summary

etcd backup and restore for Kubernetes disaster recovery: snapshot methods, verification, multi-node restore, kubeadm certs, and monitoring.

etcd is the heartbeat of every Kubernetes cluster: a strongly consistent, distributed key-value store that holds the entire desired and observed state of your cluster. When etcd is healthy, kubectl commands return in milliseconds and controllers reconcile continuously. When etcd is lost without a backup, your cluster is gone — every Deployment definition, every Secret, every RBAC binding, every CRD, every ConfigMap disappears. This guide covers everything you need to build a production-grade etcd backup and restore strategy, from understanding the internals to running a full disaster recovery on a multi-node kubeadm cluster.

Prerequisites

  • A Kubernetes cluster managed with kubeadm (v1.22+) or access to etcd certificates on a managed cluster
  • etcdctl installed on the control plane node (version must match your etcd version)
  • Root or sudo access on the control plane node
  • kubectl configured with cluster-admin permissions
  • Basic familiarity with Kubernetes control plane components
  • For automated backups: access to S3, GCS, or equivalent offsite storage

etcd’s Role in Kubernetes

Every time you run kubectl apply, the Kubernetes API server validates the request and writes the resulting object to etcd. Every controller (Deployment controller, ReplicaSet controller, Scheduler) watches etcd for changes via the API server’s watch mechanism and reconciles the cluster accordingly. etcd is the only stateful component in the Kubernetes control plane — all other components are stateless and can be restarted from scratch as long as etcd is intact.

What lives in etcd:

  • All API object definitions: Pods, Deployments, StatefulSets, DaemonSets, Services, Ingresses
  • Secrets and ConfigMaps
  • RBAC: Roles, ClusterRoles, RoleBindings, ClusterRoleBindings
  • Custom Resource Definitions and all custom resource instances
  • Namespace definitions, ResourceQuotas, LimitRanges
  • ServiceAccounts and associated tokens
  • Node registrations and lease objects
  • Leader election records for kube-controller-manager and kube-scheduler

What does not live in etcd: the actual data stored in PersistentVolumes. etcd only stores the PersistentVolumeClaim and PersistentVolume API objects (metadata and binding), not the bytes on disk.

etcd Architecture: Raft, WAL, and Snapshots

etcd uses the Raft consensus algorithm to replicate state across a cluster of odd-numbered members (typically 3 or 5). Raft elects a leader that processes all writes; followers replicate the leader’s log. The cluster tolerates (n-1)/2 member failures — a 3-node cluster survives 1 failure, a 5-node cluster survives 2.

Writes are first appended to the Write-Ahead Log (WAL) on disk, then applied to an in-memory B-tree (bbolt). Periodically, etcd takes an internal snapshot of the B-tree to disk and truncates the WAL to prevent unbounded growth. The combination of WAL + snapshot means etcd can recover from a crash without losing committed data.

The etcdctl snapshot save command triggers an on-demand snapshot of the current B-tree state. This snapshot is a complete, self-contained backup of all etcd data at the moment it was taken — no WAL required for restore.

Backup Methods

The canonical backup method. On a kubeadm cluster, etcd runs as a static pod with TLS. Certificates live at /etc/kubernetes/pki/etcd/.

ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

You can also exec into the etcd container directly:

kubectl exec -n kube-system etcd-<control-plane-node> -- \
  etcdctl snapshot save /tmp/etcd-backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

kubectl cp kube-system/etcd-<control-plane-node>:/tmp/etcd-backup.db /backup/etcd-backup.db

Method 2: Automated CronJob

Deploy a Kubernetes CronJob on the control plane that mounts the host’s etcd certs and writes snapshots to a mounted PVC or cloud storage:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: etcd-backup
  namespace: kube-system
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          hostNetwork: true
          containers:
          - name: etcd-backup
            image: bitnami/etcd:3.5
            command:
            - /bin/sh
            - -c
            - |
              ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
                --endpoints=https://127.0.0.1:2379 \
                --cacert=/etc/kubernetes/pki/etcd/ca.crt \
                --cert=/etc/kubernetes/pki/etcd/server.crt \
                --key=/etc/kubernetes/pki/etcd/server.key
            volumeMounts:
            - name: etcd-certs
              mountPath: /etc/kubernetes/pki/etcd
              readOnly: true
            - name: backup-dir
              mountPath: /backup
          volumes:
          - name: etcd-certs
            hostPath:
              path: /etc/kubernetes/pki/etcd
          - name: backup-dir
            persistentVolumeClaim:
              claimName: etcd-backup-pvc
          restartPolicy: OnFailure
          nodeSelector:
            node-role.kubernetes.io/control-plane: ""
          tolerations:
          - key: node-role.kubernetes.io/control-plane
            effect: NoSchedule

Method 3: Velero with etcd Plugin

Velero is a full cluster backup solution. The velero-plugin-for-etcd takes etcd snapshots and stores them in object storage alongside Velero’s PV snapshots, giving you a unified backup for both cluster state and persistent data. Velero is better suited for application-level backup (namespace + PV together); for control-plane-only DR, etcdctl remains the preferred approach.

Tooletcd StatePV DataRestore GranularityComplexity
etcdctl snapshotYesNoFull clusterLow
Velero + etcd pluginYesYesNamespace or fullMedium
etcd-backup-operatorYesNoFull clusterMedium
kube-backupYesNoFull clusterLow
Manual CronJobYesNoFull clusterLow

Snapshot Verification

Never trust a backup you have not verified. After every snapshot:

ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot.db \
  --write-out=table

Sample output:

+----------+----------+------------+------------+
|   HASH   | REVISION | TOTAL KEYS | TOTAL SIZE |
+----------+----------+------------+------------+
| 3b0d7ab2 |  1234567 |       3821 |    12 MB   |
+----------+----------+------------+------------+

If TOTAL KEYS is 0 or the hash is malformed, the snapshot is corrupt. A healthy production cluster typically has 2000–8000 keys. Build snapshot verification into your backup CronJob and alert on failures.

Restore Procedures

Single-Node Kubeadm Cluster

Step 1: Stop the API server and etcd

# Move static pod manifests out of the manifests directory
mkdir -p /tmp/k8s-manifests-backup
mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/k8s-manifests-backup/
mv /etc/kubernetes/manifests/etcd.yaml /tmp/k8s-manifests-backup/

# Wait for processes to exit
sleep 10
ps aux | grep etcd

Step 2: Back up the existing (corrupt) data directory

mv /var/lib/etcd /var/lib/etcd.bak

Step 3: Restore the snapshot

ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
  --data-dir=/var/lib/etcd \
  --initial-cluster=default=https://127.0.0.1:2380 \
  --initial-cluster-token=etcd-cluster-1 \
  --initial-advertise-peer-urls=https://127.0.0.1:2380

Step 4: Fix ownership and restore manifests

chown -R etcd:etcd /var/lib/etcd   # if running as non-root
mv /tmp/k8s-manifests-backup/etcd.yaml /etc/kubernetes/manifests/
mv /tmp/k8s-manifests-backup/kube-apiserver.yaml /etc/kubernetes/manifests/

# Wait for control plane to come up
sleep 30
kubectl get nodes

Multi-Node Kubeadm Cluster

For a 3-node control-plane cluster (HA), you must restore on all etcd members simultaneously using consistent --initial-cluster and --initial-cluster-token values:

# On each control plane node, run restore with the SAME snapshot and token
# but with the correct --name and --initial-advertise-peer-urls for that node

# Node 1
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
  --name=etcd-cp1 \
  --data-dir=/var/lib/etcd \
  --initial-cluster=etcd-cp1=https://10.0.0.1:2380,etcd-cp2=https://10.0.0.2:2380,etcd-cp3=https://10.0.0.3:2380 \
  --initial-cluster-token=etcd-restore-token-$(date +%s) \
  --initial-advertise-peer-urls=https://10.0.0.1:2380

# Node 2 (same token)
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
  --name=etcd-cp2 \
  --data-dir=/var/lib/etcd \
  --initial-cluster=etcd-cp1=https://10.0.0.1:2380,etcd-cp2=https://10.0.0.2:2380,etcd-cp3=https://10.0.0.3:2380 \
  --initial-cluster-token=etcd-restore-token-<SAME_VALUE> \
  --initial-advertise-peer-urls=https://10.0.0.2:2380

Use a unique --initial-cluster-token different from the original to prevent the restored cluster from accidentally joining the old (degraded) cluster.

Managed Kubernetes Considerations

EKS, GKE, AKS — The cloud provider manages etcd entirely. You cannot access etcd directly. Use provider-native backup mechanisms:

  • EKS: Velero with S3; AWS does not expose etcd directly
  • GKE: Velero; Google manages etcd with automatic backups on Autopilot
  • AKS: Velero + Azure Blob; Microsoft manages etcd for managed node pools

For managed clusters, focus on application-level backup (Velero namespaces + PV snapshots) rather than etcd-level backup.

etcd Health Monitoring

Monitor etcd continuously — do not wait for a disaster to discover problems:

# Check endpoint health
ETCDCTL_API=3 etcdctl endpoint health \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Check leader and member status
ETCDCTL_API=3 etcdctl endpoint status --write-out=table \
  --endpoints=https://10.0.0.1:2379,https://10.0.0.2:2379,https://10.0.0.3:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

Prometheus alerts to configure for etcd:

  • etcd_server_has_leader == 0 — no leader elected (critical)
  • etcd_disk_wal_fsync_duration_seconds{quantile="0.99"} > 0.01 — slow WAL writes (storage degradation)
  • etcd_disk_backend_commit_duration_seconds{quantile="0.99"} > 0.25 — slow B-tree commits
  • etcd_server_proposals_failed_total > 0 — consensus failures
  • etcd_mvcc_db_total_size_in_bytes > 8589934592 — DB approaching 8 GB limit

Compaction and Defragmentation

etcd keeps a history of all key revisions to support watch semantics. Over time this consumes significant disk space. Enable auto-compaction in the etcd configuration:

# In etcd static pod manifest or etcd.conf
--auto-compaction-mode=periodic
--auto-compaction-retention=1h

Even with auto-compaction, the on-disk B-tree file (bbolt) does not shrink automatically because bbolt does not reclaim free pages. Run defragmentation periodically during off-peak hours:

ETCDCTL_API=3 etcdctl defrag \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

In a multi-node cluster, defrag one member at a time. Defragging the leader causes a brief leadership change. Schedule defrag monthly or when etcd_mvcc_db_total_size_in_bytes is significantly larger than etcd_mvcc_db_total_size_in_use_in_bytes.

Performance Tuning

etcd is extremely sensitive to disk latency. The WAL fsync must complete before a Raft entry is considered committed. Recommendations:

  • Dedicated SSD: Never share etcd’s data disk with application workloads. Use a dedicated NVMe or SSD with sustained random write IOPS > 2000.
  • Heartbeat and election timeouts: Default heartbeat-interval is 100ms and election-timeout is 1000ms. In high-latency environments (cloud VMs with noisy neighbors), increase to 250ms / 1250ms.
--heartbeat-interval=250
--election-timeout=1250
  • DB quota: Default is 2 GB. Increase to 8 GB for large clusters with many namespaces or frequent object churn: --quota-backend-bytes=8589934592
  • Network: etcd peer traffic should be on a dedicated, low-latency network path. Do not route etcd peer traffic through a shared application load balancer.

Production Backup Script with Alerting

#!/bin/bash
set -euo pipefail

BACKUP_DIR="/opt/etcd-backups"
RETENTION_COUNT=24          # keep last 24 snapshots (24h at hourly)
S3_BUCKET="s3://my-cluster-etcd-backups"
SLACK_WEBHOOK="https://hooks.slack.com/services/xxx/yyy/zzz"
ETCD_ENDPOINTS="https://127.0.0.1:2379"
CACERT="/etc/kubernetes/pki/etcd/ca.crt"
CERT="/etc/kubernetes/pki/etcd/server.crt"
KEY="/etc/kubernetes/pki/etcd/server.key"

TIMESTAMP=$(date +%Y%m%d-%H%M%S)
SNAPSHOT_FILE="${BACKUP_DIR}/etcd-${TIMESTAMP}.db"

alert() {
  local msg="$1"
  curl -s -X POST -H 'Content-type: application/json' \
    --data "{\"text\":\"[etcd-backup] ${msg}\"}" "${SLACK_WEBHOOK}" || true
}

mkdir -p "${BACKUP_DIR}"

if ! ETCDCTL_API=3 etcdctl snapshot save "${SNAPSHOT_FILE}" \
    --endpoints="${ETCD_ENDPOINTS}" \
    --cacert="${CACERT}" --cert="${CERT}" --key="${KEY}"; then
  alert "CRITICAL: etcd snapshot save FAILED on $(hostname) at ${TIMESTAMP}"
  exit 1
fi

# Verify snapshot
KEYS=$(ETCDCTL_API=3 etcdctl snapshot status "${SNAPSHOT_FILE}" \
  --write-out=json | python3 -c "import sys,json; print(json.load(sys.stdin)['totalKey'])")
if [ "${KEYS}" -lt 100 ]; then
  alert "WARNING: snapshot has only ${KEYS} keys — possible empty or corrupt snapshot"
  exit 1
fi

# Upload to S3
aws s3 cp "${SNAPSHOT_FILE}" "${S3_BUCKET}/$(basename ${SNAPSHOT_FILE})" \
  --storage-class STANDARD_IA

# Rotate local copies
ls -t "${BACKUP_DIR}"/etcd-*.db | tail -n +$((RETENTION_COUNT + 1)) | xargs -r rm -f

echo "Backup complete: ${SNAPSHOT_FILE} (${KEYS} keys)"

Disaster Recovery Scenarios

Scenario 1: Single member failure (quorum intact) The cluster continues operating with 2/3 or 3/5 healthy members. Replace the failed member using etcdctl member remove + etcdctl member add and join a new etcd process to the cluster without a restore.

Scenario 2: Quorum loss (majority of members down) The cluster becomes read-only and kubectl writes will fail. If members can be recovered (disk intact, network issue), bring them back online. If data is lost, restore from snapshot on all members.

Scenario 3: Full cluster restore (all data lost) Stop all control plane components on all nodes, restore the snapshot on each node with consistent --initial-cluster-token, restart control plane components in order: etcd first, then kube-apiserver, then kube-controller-manager and kube-scheduler. Verify all nodes re-register and all Pods report correct status.

Gotchas and Edge Cases

etcdctl version mismatch — Always set ETCDCTL_API=3. Run etcdctl version and verify the client version matches the server version. Mismatches cause silent failures or corrupted restores.

Snapshot from non-leader — Snapshots taken from a follower member are valid but may lag behind the leader by a few entries. For critical restores, take the snapshot from the leader.

Restore overwrites the data directoryetcdctl snapshot restore writes to --data-dir. If the directory already exists, the restore fails. Always move the existing data directory out of the way first.

CronJob on control plane node — etcd CronJobs must tolerate the node-role.kubernetes.io/control-plane: NoSchedule taint and use nodeSelector to land on control plane nodes where the certs are mounted via hostPath.

Clock skew between members — etcd peer TLS certificates are time-sensitive. If node clocks diverge by more than a few minutes, certificate validation fails. Ensure NTP is configured and synchronised on all control plane nodes.

Managed cluster surprises — On GKE or EKS, attempting to exec into the etcd pod will fail or be blocked. If you are on a managed cluster, shift to Velero immediately and do not rely on etcd-level backup.

Summary

  • etcd stores all Kubernetes cluster state; losing it without a backup means rebuilding from scratch
  • Use etcdctl snapshot save with TLS flags pointing to /etc/kubernetes/pki/etcd/ for kubeadm clusters
  • Always run etcdctl snapshot status to verify snapshots after creation
  • Restore requires stopping the API server and etcd, running etcdctl snapshot restore, and restarting the control plane
  • Multi-node restore requires consistent --initial-cluster-token and correct per-node --initial-advertise-peer-urls on all members
  • Enable auto-compaction (--auto-compaction-retention=1h) and run etcdctl defrag monthly
  • Dedicate a low-latency SSD to etcd data; monitor WAL fsync latency with Prometheus
  • Store snapshots offsite (S3/GCS) with at least 24h retention; automate with a CronJob + alerting script
  • For EKS, GKE, AKS: etcd is managed internally — use Velero for application-level backup instead