etcd is the distributed key-value store at the heart of every Kubernetes cluster, storing all cluster state from pod definitions to secrets. When etcd goes down or loses data, the entire control plane stops working. This guide walks you through deploying etcd as a standalone service and as a three-node cluster, securing it with mutual TLS, taking and restoring snapshots, and understanding how Kubernetes uses it — so you can operate it confidently in production.

Prerequisites

  • Linux server(s) running Ubuntu 22.04 or later (one for single-node, three for cluster)
  • Root or sudo access
  • Basic familiarity with systemd service management
  • curl and tar installed
  • For TLS: cfssl or openssl available
  • Ports 2379 (client) and 2380 (peer) open between etcd nodes

What Is etcd and How It Works

etcd is an open-source, strongly consistent distributed key-value store originally built by CoreOS and now a Cloud Native Computing Foundation (CNCF) graduated project. It uses the Raft consensus algorithm to ensure all nodes agree on the current state even when network partitions or node failures occur.

The key properties you need to understand:

  • Strong consistency: every read returns the latest committed write, never a stale value
  • Watch API: clients subscribe to key changes and receive notifications in real time
  • Atomic transactions: compare-and-swap operations let you build distributed locks
  • Leases: keys can expire automatically, enabling heartbeat-based leader election

Kubernetes relies on etcd for every piece of cluster state. The API server is the only component that reads and writes to etcd directly — all other components (scheduler, controller manager, kubelet) communicate through the API server.

FeatureetcdRedisConsulZooKeeper
ConsensusRaftNone (standalone)RaftZAB
Strong consistencyYesNo (eventual)YesYes
Watch APIYesPub/sub onlyYesYes
Kubernetes nativeYesNoNoNo
TLS built-inYesOptionalYesOptional
Operational complexityLowVery lowMediumHigh

etcd wins for Kubernetes because it was purpose-built for control plane use cases: small values, high read rate, infrequent writes, and correctness over raw throughput.

Installing etcd

Download the latest release from the official GitHub repository. At the time of writing, 3.5.x is the stable series recommended for Kubernetes 1.29+.

ETCD_VER=v3.5.12
DOWNLOAD_URL=https://github.com/etcd-io/etcd/releases/download

curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz \
  -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz

tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /tmp/

sudo mv /tmp/etcd-${ETCD_VER}-linux-amd64/etcd /usr/local/bin/
sudo mv /tmp/etcd-${ETCD_VER}-linux-amd64/etcdctl /usr/local/bin/
sudo mv /tmp/etcd-${ETCD_VER}-linux-amd64/etcdutl /usr/local/bin/

etcd --version
etcdctl version

Create the data directory and a dedicated system user:

sudo groupadd --system etcd
sudo useradd -s /sbin/nologin --system -g etcd etcd
sudo mkdir -p /var/lib/etcd
sudo chown -R etcd:etcd /var/lib/etcd
sudo chmod 700 /var/lib/etcd

Generating TLS Certificates

Never run etcd without TLS in any environment beyond a local laptop. etcd stores secrets and credentials — any unencrypted listener is an immediate security risk.

Install cfssl:

curl -Lo /usr/local/bin/cfssl \
  https://github.com/cloudflare/cfssl/releases/latest/download/cfssl_linux-amd64
curl -Lo /usr/local/bin/cfssljson \
  https://github.com/cloudflare/cfssl/releases/latest/download/cfssljson_linux-amd64
chmod +x /usr/local/bin/cfssl /usr/local/bin/cfssljson

Create a CA and server certificate. Save the following as ca-config.json:

{
  "signing": {
    "default": { "expiry": "87600h" },
    "profiles": {
      "etcd": {
        "expiry": "87600h",
        "usages": ["signing","key encipherment","server auth","client auth"]
      }
    }
  }
}

Generate the CA and server certificates:

# CA
cfssl gencert -initca ca-csr.json | cfssljson -bare ca

# Server + peer cert (add all node IPs/hostnames in the CSR hosts array)
cfssl gencert \
  -ca=ca.pem -ca-key=ca-key.pem \
  -config=ca-config.json \
  -profile=etcd \
  etcd-csr.json | cfssljson -bare etcd

sudo mkdir -p /etc/etcd/pki
sudo cp ca.pem etcd.pem etcd-key.pem /etc/etcd/pki/
sudo chown -R etcd:etcd /etc/etcd/pki
sudo chmod 600 /etc/etcd/pki/*-key.pem

Setting Up a Three-Node etcd Cluster

A production etcd cluster needs three or five members. With three members, it tolerates one failure and still maintains quorum. With five, it tolerates two simultaneous failures.

Assume three nodes with the following IPs:

  • etcd-1: 10.0.0.11
  • etcd-2: 10.0.0.12
  • etcd-3: 10.0.0.13

Create /etc/etcd/etcd.conf.yml on each node, substituting the node-specific values:

# /etc/etcd/etcd.conf.yml — example for etcd-1
name: etcd-1
data-dir: /var/lib/etcd

listen-peer-urls: https://10.0.0.11:2380
listen-client-urls: https://10.0.0.11:2379,https://127.0.0.1:2379

advertise-client-urls: https://10.0.0.11:2379
initial-advertise-peer-urls: https://10.0.0.11:2380

initial-cluster: >-
  etcd-1=https://10.0.0.11:2380,
  etcd-2=https://10.0.0.12:2380,
  etcd-3=https://10.0.0.13:2380
initial-cluster-token: etcd-cluster-prod-01
initial-cluster-state: new

client-transport-security:
  cert-file: /etc/etcd/pki/etcd.pem
  key-file: /etc/etcd/pki/etcd-key.pem
  trusted-ca-file: /etc/etcd/pki/ca.pem
  client-cert-auth: true

peer-transport-security:
  cert-file: /etc/etcd/pki/etcd.pem
  key-file: /etc/etcd/pki/etcd-key.pem
  trusted-ca-file: /etc/etcd/pki/ca.pem
  peer-client-cert-auth: true

Create the systemd unit /etc/systemd/system/etcd.service on all nodes:

[Unit]
Description=etcd distributed key-value store
Documentation=https://etcd.io
After=network.target

[Service]
User=etcd
Group=etcd
Type=notify
ExecStart=/usr/local/bin/etcd --config-file /etc/etcd/etcd.conf.yml
Restart=on-failure
RestartSec=5
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

Start the cluster — all three nodes must start within the election timeout (default 1 second) for the initial bootstrap to succeed:

# Run on all three nodes in quick succession
sudo systemctl daemon-reload
sudo systemctl enable --now etcd

Verify the cluster formed correctly:

export ETCDCTL_API=3
export ETCDCTL_ENDPOINTS=https://10.0.0.11:2379,https://10.0.0.12:2379,https://10.0.0.13:2379
export ETCDCTL_CACERT=/etc/etcd/pki/ca.pem
export ETCDCTL_CERT=/etc/etcd/pki/etcd.pem
export ETCDCTL_KEY=/etc/etcd/pki/etcd-key.pem

etcdctl endpoint health
etcdctl endpoint status --write-out=table
etcdctl member list --write-out=table

etcd Backup and Snapshot Restore

etcd backup is non-negotiable in production. A snapshot captures the entire key-value store at a point in time. Always back up before upgrading etcd or upgrading Kubernetes.

Taking a Snapshot

ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot-$(date +%Y%m%d-%H%M%S).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/etcd/pki/ca.pem \
  --cert=/etc/etcd/pki/etcd.pem \
  --key=/etc/etcd/pki/etcd-key.pem

# Verify the snapshot
ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot-*.db --write-out=table

Automate this with a cron job that rotates old snapshots:

# /etc/cron.d/etcd-backup
0 2 * * * etcd /usr/local/bin/etcdctl snapshot save \
  /backup/etcd-$(date +\%Y\%m\%d).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/etcd/pki/ca.pem \
  --cert=/etc/etcd/pki/etcd.pem \
  --key=/etc/etcd/pki/etcd-key.pem && \
  find /backup/ -name "etcd-*.db" -mtime +7 -delete

Restoring from a Snapshot

Restoration replaces the data directory. Perform this on every member in the cluster:

# 1. Stop etcd on ALL nodes first
sudo systemctl stop etcd

# 2. Restore on each node (use a unique --name and --initial-advertise-peer-urls per node)
ETCDCTL_API=3 etcdutl snapshot restore /backup/etcd-snapshot.db \
  --name etcd-1 \
  --initial-cluster "etcd-1=https://10.0.0.11:2380,etcd-2=https://10.0.0.12:2380,etcd-3=https://10.0.0.13:2380" \
  --initial-cluster-token etcd-cluster-prod-01 \
  --initial-advertise-peer-urls https://10.0.0.11:2380 \
  --data-dir /var/lib/etcd-restored

# 3. Swap the data directory
sudo mv /var/lib/etcd /var/lib/etcd-old
sudo mv /var/lib/etcd-restored /var/lib/etcd
sudo chown -R etcd:etcd /var/lib/etcd

# 4. Start etcd on ALL nodes
sudo systemctl start etcd

Real-World Scenario: Recovering a Kubernetes Control Plane

You have a three-node Kubernetes cluster where one control plane node lost its disk. The etcd member on that node is gone, and kubectl get nodes now hangs because the API server cannot reach quorum on writes.

Here is the recovery path:

  1. Check current membership from a healthy node: etcdctl member list. You will see the failed member with an empty URL.
  2. Remove the dead member: etcdctl member remove <MEMBER_ID>
  3. Provision a new node with the same hostname and IP (or update DNS).
  4. Add the replacement: etcdctl member add etcd-1-new --peer-urls=https://10.0.0.11:2380
  5. Start etcd on the new node with ETCD_INITIAL_CLUSTER_STATE=existing in the config — do not use new.
  6. Watch the new member sync: etcdctl endpoint status will show the revision number catch up in real time.

The entire process typically takes five to ten minutes for a cluster with less than 1 GB of etcd data. During removal and before the new member reaches quorum, Kubernetes continues to serve reads but blocks writes.

Gotchas and Edge Cases

Never run an even number of etcd members. A two-member cluster has no fault tolerance — losing one member immediately loses quorum. A four-member cluster tolerates only one failure, same as three members, but requires an extra node. Stick to 3 or 5.

Disk I/O is the primary etcd bottleneck. etcd calls fdatasync on every write. If your disk latency exceeds 10ms consistently, you will see leader elections and request timeout errors in Kubernetes. Use SSDs or NVMe storage. Check disk latency with: fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd --size=22m --bs=2300 --name=etcd-bench.

etcd is not a general-purpose database. The default storage quota is 2 GB. Kubernetes clusters with many objects or heavy use of custom resources can hit this limit. Monitor etcd_mvcc_db_total_size_in_bytes in Prometheus and compact/defragment regularly:

# Compact old revisions (keep last 1000)
rev=$(etcdctl endpoint status --write-out=json | jq '.[0].Status.header.revision')
etcdctl compact $((rev - 1000))
etcdctl defrag --cluster

Clock skew causes leader elections. etcd uses wall-clock time for lease expiry. If nodes diverge by more than a few hundred milliseconds, you will see spurious leader elections. Run NTP (chrony or systemd-timesyncd) on all etcd nodes and verify with chronyc tracking.

Snapshot restoration clears cluster membership. After restoring, all members are treated as a brand new cluster with the membership encoded in the snapshot. Never restore a snapshot onto a running cluster without stopping all members first — you will create a split-brain situation.

Common Issues and Troubleshooting

etcdserver: request timed out — Usually disk latency. Check iostat -x 1 on etcd nodes. Also check peer connectivity: etcdctl endpoint health from each node individually.

etcdserver: mvcc: database space exceeded — The 2 GB storage quota was hit. Run compaction and defrag as shown above, or increase the quota with --quota-backend-bytes=4294967296 (4 GB) in the etcd config.

raft: failed to send message — Firewall blocking port 2380 between peers, or a peer certificate CN mismatch. Verify with openssl s_client -connect 10.0.0.12:2380 from another etcd node.

certificate has expired — etcd peer and client certificates need rotation before expiry. Kubernetes kubeadm clusters auto-rotate these at 90 days to 1-year marks, but manually managed clusters do not. Check expiry: openssl x509 -in /etc/etcd/pki/etcd.pem -noout -dates.

etcd using too much memory — etcd caches the working set in memory. On clusters with many objects, RSS can exceed 8 GB. This is normal. Set --snapshot-count=5000 (default 100000) to trigger more frequent snapshots and reduce the raft log size in memory.

Summary

  • etcd stores all Kubernetes cluster state; it is the most critical service in the control plane
  • Always deploy three or five members for production — odd numbers only
  • Enable mutual TLS on both client and peer ports; never expose etcd without authentication
  • Take daily snapshots with etcdctl snapshot save and verify them with snapshot status
  • Restore by stopping all members, running etcdutl snapshot restore on each with unique node parameters, swapping the data directory, then restarting
  • Monitor disk latency (keep below 10ms), database size (compact when approaching 2 GB), and leader stability
  • Use etcdctl member remove + member add to replace a failed node without a full restore