TL;DR — Quick Summary

Master HashiCorp Consul for service discovery and service mesh: architecture, health checks, ACLs, Envoy sidecar proxies, and multi-datacenter federation.

HashiCorp Consul is a distributed, highly available service networking platform that solves one of microservices’ hardest problems: how do services find each other, verify each other is healthy, and communicate securely across a dynamic infrastructure where IP addresses change constantly? Consul answers all three questions through a unified control plane that combines service discovery, health checking, a key/value store, and a full service mesh with mutual TLS enforcement. This guide covers every major Consul capability, from a three-node cluster setup through Envoy sidecar proxies and multi-datacenter federation.

Prerequisites

Before proceeding, ensure the following are available on your system:

  • Linux or macOS host — Consul runs on any Unix-like OS and Windows; examples use Linux.
  • Docker 24+ — for the containerized production example at the end of this guide.
  • curl and dig — for testing DNS and HTTP API responses.
  • Basic understanding of microservices — familiarity with concepts like load balancing, health checks, and TLS will help.
  • Consul binary 1.17+ — install via the HashiCorp APT/YUM repository or download from releases.hashicorp.com.

Install on Ubuntu/Debian:

wget -O - https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install consul
consul version
# Consul v1.20.2

Consul Architecture

Understanding Consul’s components is essential for correct cluster design. Consul separates concerns across three distinct layers.

Agents: Servers vs Clients

Every node in a Consul cluster runs a Consul agent. Agents operate in one of two modes:

Server agents form the control plane. They participate in the Raft consensus algorithm to elect a leader, store the authoritative cluster state (services, health results, KV data, ACL policies), and replicate that state across the cluster. The leader processes all writes. Followers forward writes to the leader and serve reads. You must run an odd number of servers: 3 servers tolerate 1 failure; 5 servers tolerate 2 failures.

Client agents run on every non-server node (your application hosts). They register local services and health checks, forward queries to servers, and participate in the gossip network. Clients are lightweight — they hold no cluster state themselves.

Gossip Protocol (Serf)

Consul uses Serf, an implementation of the SWIM gossip protocol, for cluster membership and failure detection. All agents participate in two gossip pools:

  • LAN gossip pool — agents within the same datacenter. Used for member discovery, health propagation, and event broadcasting.
  • WAN gossip pool — servers across datacenters. Enables multi-datacenter federation.

Gossip is eventually consistent and operates via UDP by default (with TCP fallback). It scales to thousands of nodes with O(log N) message complexity.

Raft Consensus

Raft is used only among Consul servers for strong consistency of the cluster state. The leader accepts all write operations and replicates log entries to followers before committing. This means:

  • Reads from a server return consistent data (use ?consistent parameter for strict linearizability).
  • A network partition that isolates the leader causes a new election — the cluster is briefly unavailable for writes until a new leader is elected.
  • With 3 servers, you need 2 alive for a quorum. With 5 servers, you need 3.

Datacenter Model

A Consul datacenter is an isolated unit: one set of servers sharing a LAN gossip pool. Multiple datacenters communicate via WAN gossip or mesh gateways. Within a datacenter, service discovery is automatic and fast. Cross-datacenter queries are possible but involve a WAN round-trip.

Installation Methods

Binary Installation

Download and install the single Go binary:

# Download specific version
CONSUL_VERSION="1.20.2"
wget "https://releases.hashicorp.com/consul/${CONSUL_VERSION}/consul_${CONSUL_VERSION}_linux_amd64.zip"
unzip consul_${CONSUL_VERSION}_linux_amd64.zip
sudo install consul /usr/local/bin/
consul version

Docker

# Run a single dev-mode server (non-production)
docker run -d \
  --name consul-dev \
  -p 8500:8500 \
  -p 8600:8600/udp \
  hashicorp/consul:1.20 agent -dev -client=0.0.0.0

# Verify
curl http://localhost:8500/v1/agent/self | jq .Config.NodeName

Helm Chart for Kubernetes

helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update

# Install with minimal production values
helm install consul hashicorp/consul \
  --namespace consul \
  --create-namespace \
  --values consul-values.yaml

Minimal consul-values.yaml for Kubernetes:

global:
  name: consul
  datacenter: dc1
  tls:
    enabled: true
  acls:
    manageSystemACLs: true
server:
  replicas: 3
  bootstrapExpect: 3
connectInject:
  enabled: true

Server Cluster Setup

Configuration File

Create /etc/consul.d/consul.hcl on each server node:

datacenter  = "dc1"
data_dir    = "/opt/consul"
log_level   = "INFO"
node_name   = "consul-server-01"   # unique per node
server      = true
bootstrap_expect = 3               # number of servers in cluster

# Other server IPs for initial join
retry_join = [
  "10.0.1.11",
  "10.0.1.12",
  "10.0.1.13"
]

# Gossip encryption (generate with: consul keygen)
encrypt = "BASE64_GOSSIP_KEY_HERE"

# UI and API bindings
ui_config {
  enabled = true
}
client_addr    = "0.0.0.0"
bind_addr      = "{{ GetPrivateIP }}"
advertise_addr = "{{ GetPrivateIP }}"

# Performance tuning
performance {
  raft_multiplier = 1   # 1 = fast LAN, increase for WAN-latency
}

Generate the gossip key once and use it across all nodes:

consul keygen
# Output: j5Y3fIFjQKs2b9s9rG2ufA==

Starting the Cluster

# Create data directory and set permissions
sudo mkdir -p /opt/consul
sudo chown consul:consul /opt/consul

# Start Consul (systemd service in production)
sudo systemctl enable consul
sudo systemctl start consul

# Verify cluster formation (run on any server)
consul members
# Node               Address          Status  Type    Build   Protocol  DC   Segment
# consul-server-01   10.0.1.11:8301   alive   server  1.20.2  2         dc1  <all>
# consul-server-02   10.0.1.12:8301   alive   server  1.20.2  2         dc1  <all>
# consul-server-03   10.0.1.13:8301   alive   server  1.20.2  2         dc1  <all>

# Check Raft leader election
consul operator raft list-peers

Service Registration

Service Definition Files

Create a JSON file in /etc/consul.d/ on the client agent running the service:

{
  "service": {
    "id":   "web-01",
    "name": "web",
    "port": 8080,
    "tags": ["v2", "primary"],
    "meta": {
      "version": "2.1.0",
      "environment": "production"
    },
    "check": {
      "id":       "web-http",
      "name":     "HTTP health check",
      "http":     "http://localhost:8080/health",
      "interval": "10s",
      "timeout":  "3s",
      "deregister_critical_service_after": "90s"
    }
  }
}

Reload the agent to pick up the new definition:

consul reload
# or
kill -HUP $(pidof consul)

HTTP API Registration

Register a service programmatically without a config file:

curl -s -X PUT http://localhost:8500/v1/agent/service/register \
  -H "Content-Type: application/json" \
  -d '{
    "ID":   "api-01",
    "Name": "api",
    "Port": 3000,
    "Tags": ["v1"],
    "Check": {
      "HTTP":     "http://localhost:3000/healthz",
      "Interval": "15s",
      "Timeout":  "5s"
    }
  }'

Consul Catalog vs Agent

The agent API (/v1/agent/service/*) manages services registered on the local agent. The catalog API (/v1/catalog/service/*) queries the global registry aggregated from all agents — it is eventually consistent and includes node information:

# Agent view — local node only
curl http://localhost:8500/v1/agent/services | jq .

# Catalog view — all nodes, all datacenters
curl http://localhost:8500/v1/catalog/service/web | jq '[.[] | {node: .Node, addr: .ServiceAddress, port: .ServicePort}]'

Health Checks

Consul supports six health check types. All checks share interval, timeout, and deregister_critical_service_after parameters.

HTTP Check

Consul performs an HTTP GET to the specified URL. Status 2xx = passing; 429 = warning; anything else = critical.

"check": {
  "http":     "http://localhost:8080/health",
  "method":   "GET",
  "interval": "10s",
  "timeout":  "3s",
  "header": {
    "Authorization": ["Bearer internal-token"]
  }
}

TCP Check

Opens a TCP connection to the specified host:port. Successful connection = passing.

"check": {
  "tcp":      "localhost:5432",
  "interval": "15s",
  "timeout":  "5s"
}

Script Check

Executes a shell command. Exit code 0 = passing, 1 = warning, any other = critical.

"check": {
  "args":     ["/usr/local/bin/check-redis.sh"],
  "interval": "30s",
  "timeout":  "10s"
}

TTL Check

The application itself must call the Consul API to update its health status. Useful for background workers.

"check": {
  "ttl": "60s"
}

Application update:

# Report passing
curl -X PUT http://localhost:8500/v1/agent/check/pass/service:worker-01 \
  -d "Processing queue normally"

# Report warning
curl -X PUT http://localhost:8500/v1/agent/check/warn/service:worker-01 \
  -d "Queue depth exceeding threshold"

gRPC Check

For gRPC services implementing the gRPC health protocol:

"check": {
  "grpc":               "localhost:9000",
  "grpc_use_tls":       false,
  "interval":           "10s"
}

Deregistration After Critical

Always set deregister_critical_service_after to automatically remove a service that has been continuously failing, preventing stale entries in the catalog:

"deregister_critical_service_after": "90s"

DNS Interface

Consul runs a DNS server on port 8600. Services are resolved at <service>.service.<datacenter>.consul (datacenter defaults to local).

# Resolve all healthy instances of "web"
dig @127.0.0.1 -p 8600 web.service.consul A

# Resolve with SRV record (includes port)
dig @127.0.0.1 -p 8600 web.service.consul SRV

# Resolve by tag
dig @127.0.0.1 -p 8600 primary.web.service.consul A

# Resolve node by name
dig @127.0.0.1 -p 8600 consul-server-01.node.consul A

DNS Forwarding with systemd-resolved

Configure /etc/systemd/resolved.conf.d/consul.conf so that .consul queries go to Consul:

[Resolve]
DNS=127.0.0.1:8600
Domains=~consul
sudo systemctl restart systemd-resolved
# Now you can use: curl http://web.service.consul:8080/

DNS Forwarding with dnsmasq

# /etc/dnsmasq.d/10-consul
server=/consul/127.0.0.1#8600

Prepared Queries

Prepared queries add failover logic at the DNS level. Create a query that falls back to a secondary datacenter:

curl -X POST http://localhost:8500/v1/query \
  -d '{
    "Name": "web",
    "Service": {
      "Service":     "web",
      "OnlyPassing": true,
      "Failover": {
        "NearestN":    2,
        "Datacenters": ["dc2", "dc3"]
      }
    }
  }'

# Query via DNS using prepared query name
dig @127.0.0.1 -p 8600 web.query.consul A

Service Mesh / Connect

Consul Connect is the built-in service mesh. It uses Envoy as the sidecar proxy and issues short-lived TLS certificates signed by Consul’s CA, providing mutual TLS between services without changing application code.

Sidecar Registration

Add a connect.sidecar_service block to the service definition:

{
  "service": {
    "name": "web",
    "port": 8080,
    "connect": {
      "sidecar_service": {
        "port": 21000,
        "proxy": {
          "upstreams": [
            {
              "destination_name": "api",
              "local_bind_port":  9191
            }
          ]
        }
      }
    }
  }
}

Launch the Envoy sidecar:

consul connect envoy -sidecar-for web &

The web service now connects to the api service via localhost:9191 — all traffic is automatically encrypted with mTLS.

Intentions

Intentions define which services are allowed to communicate. By default, when ACLs are enabled, all traffic is denied.

# Allow web to reach api
consul intention create web api

# Allow all services to reach monitoring
consul intention create '*' monitoring

# Deny a specific path
consul intention create -deny payments logging

# List all intentions
consul intention list

Intentions can also be defined as config entries for version control:

cat <<EOF | consul config write -
Kind   = "service-intentions"
Name   = "api"
Sources = [
  {
    Name       = "web"
    Action     = "allow"
    Precedence = 9
  }
]
EOF

Transparent Proxy

With transparent proxy mode, Kubernetes or systemd-managed workloads automatically route all outbound traffic through Envoy — no upstream configuration needed in the service definition:

# Enable transparent proxy for a service
consul connect redirect-traffic \
  -proxy-id web-sidecar-proxy \
  -proxy-inbound-port 20000 \
  -proxy-outbound-port 15001

Key/Value Store

Consul’s KV store is a distributed, strongly consistent store suitable for configuration data, feature flags, and leader election.

# Write a value
consul kv put config/app/log_level info
consul kv put config/app/max_connections 100

# Read a value
consul kv get config/app/log_level
# info

# Read with metadata
consul kv get -detailed config/app/log_level

# List all keys under a prefix
consul kv list config/app/

# Delete a key
consul kv delete config/app/log_level

# Delete a tree
consul kv delete -recurse config/app/

Watches for Configuration Management

Consul watches allow reacting to KV changes in real time:

consul watch -type=key -key=config/app/log_level \
  /usr/local/bin/reload-app.sh

consul-template

consul-template renders templates from Consul KV and service data, reloading services when values change:

# Install
wget https://releases.hashicorp.com/consul-template/0.39.0/consul-template_0.39.0_linux_amd64.zip
unzip consul-template_0.39.0_linux_amd64.zip
sudo install consul-template /usr/local/bin/

Template file (nginx.conf.ctmpl):

upstream web_backend {
  {{ range service "web" }}
  server {{ .Address }}:{{ .Port }};
  {{ end }}
}

Run consul-template:

consul-template \
  -template "nginx.conf.ctmpl:/etc/nginx/conf.d/web.conf:nginx -s reload" \
  -once

ACL System

The ACL system provides authentication (token-based identity) and authorization (policy-based access) for all Consul operations.

Bootstrap ACLs

# Enable ACLs in consul.hcl
cat >> /etc/consul.d/consul.hcl <<EOF
acl {
  enabled                  = true
  default_policy           = "deny"
  enable_token_persistence = true
}
EOF

# Bootstrap — do this once, save the output
consul acl bootstrap
# AccessorID: 0a1a0b1a-...
# SecretID:   db14c...   ← this is your bootstrap (root) token

Policies and Tokens

# Set the token in environment
export CONSUL_HTTP_TOKEN="db14c..."

# Create a policy that allows reading the web service
consul acl policy create \
  -name "web-service-read" \
  -rules 'service "web" { policy = "read" }
node_prefix "" { policy = "read" }'

# Create a token with this policy
consul acl token create \
  -description "Web service read token" \
  -policy-name "web-service-read"

Agent Token

Each client agent needs a token to register itself and its services:

# Policy for agent
consul acl policy create \
  -name "agent-policy" \
  -rules 'node_prefix "" { policy = "write" }
service_prefix "" { policy = "read" }'

# Token for agent
consul acl token create \
  -description "Client agent token" \
  -policy-name "agent-policy"

# Set the token in agent config
consul acl set-agent-token agent "TOKEN_SECRET_ID"

Auth Methods

Consul supports auth methods for platform-based identity:

  • Kubernetes — workloads present a Kubernetes service account JWT; Consul validates it with the API server and issues a Consul token.
  • AWS IAM — EC2/ECS workloads authenticate with AWS IAM credentials.
  • JWT/OIDC — integration with external identity providers.
# Configure Kubernetes auth method
consul acl auth-method create \
  -type kubernetes \
  -name k8s \
  -kubernetes-host https://kubernetes.default.svc \
  -kubernetes-ca-cert @/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  -kubernetes-service-account-jwt @/var/run/secrets/kubernetes.io/serviceaccount/token

Consul UI

The Consul web UI is available at http://localhost:8500/ui when ui_config { enabled = true } is set in the server config. Key sections:

  • Services — all registered services with health status, tags, and instance count. Click a service to see all instances, their health checks, and metadata.
  • Nodes — all agents (servers and clients) with their health status and registered services.
  • Key/Value — a browser for the KV store; supports create, edit, delete, and folder-style navigation.
  • Intentions — visualize and manage service-to-service authorization rules.
  • Access Controls — manage tokens, policies, and roles (visible only when ACLs are enabled).

Enable the UI in production with a reverse proxy (Nginx) to add TLS and authentication.

Multi-Datacenter Federation

WAN Gossip Federation

The traditional method — servers in each datacenter join the WAN gossip pool:

# On dc2 servers, join the WAN pool of dc1
consul join -wan 10.0.1.11   # dc1 server IP

With WAN federation, services can query remote datacenters:

# Query web service in dc2
dig @127.0.0.1 -p 8600 web.service.dc2.consul A

# HTTP API cross-datacenter query
curl "http://localhost:8500/v1/catalog/service/web?dc=dc2"

Mesh Gateway Federation

Mesh gateways are the modern approach — they proxy traffic between datacenters without exposing internal services to the WAN:

# Register a mesh gateway
service {
  name = "mesh-gateway"
  kind = "mesh-gateway"
  port = 8443
  proxy {
    config {
      envoy_gateway_bind_addresses = {
        default = {
          address = "0.0.0.0"
          port    = 8443
        }
      }
    }
  }
}
# Launch mesh gateway Envoy
consul connect envoy -gateway=mesh -register -address $(hostname -I | awk '{print $1}'):8443

Comparison: Consul vs Alternatives

FeatureConsuletcdZooKeeperEurekaIstioLinkerd
Service DiscoveryDNS + HTTP APIAPI onlyAPI onlyHTTP APIVia KubernetesVia Kubernetes
Health ChecksBuilt-in (6 types)None nativeNone nativeClient heartbeatKubernetes probesKubernetes probes
Service MeshYes (Connect)NoNoNoYes (Envoy)Yes (proxy2)
KV StoreYesYes (primary use)YesNoNoNo
ACL SystemYes (rich)BasicSASL/ACLNoRBACRBAC
Multi-datacenterNativeNoNoReplicated peersNoNo
ConsensusRaftRaftZABEventualN/AN/A
LanguageGoGoJavaJavaGoRust/Go
Kubernetes nativeHelm + CRDsNativeNoNoNativeNative
Standalone (no K8s)YesYesYesYesNoNo

Use Consul when: you need service discovery + health checks + service mesh + KV store in a single tool, especially in hybrid (VMs + Kubernetes) or multi-datacenter environments.

Use Istio/Linkerd when: your entire workload is on Kubernetes and you want tight integration with Kubernetes-native resources (Gateway API, NetworkPolicy).

Use etcd when: you need a simple, high-performance KV store (it underpins Kubernetes itself) without service discovery.

Production Deployment with Docker Compose

A three-server Consul cluster plus a client agent and the UI:

name: consul-cluster

services:
  consul-server-1:
    image: hashicorp/consul:1.20
    command: >
      agent -server
      -node=server-1
      -bootstrap-expect=3
      -datacenter=dc1
      -data-dir=/consul/data
      -config-dir=/consul/config
      -bind=0.0.0.0
      -client=0.0.0.0
      -retry-join=consul-server-2
      -retry-join=consul-server-3
      -ui
    volumes:
      - consul-server-1-data:/consul/data
      - ./consul-config:/consul/config:ro
    ports:
      - "8500:8500"
      - "8600:8600/udp"
    networks:
      - consul-net
    restart: unless-stopped

  consul-server-2:
    image: hashicorp/consul:1.20
    command: >
      agent -server
      -node=server-2
      -bootstrap-expect=3
      -datacenter=dc1
      -data-dir=/consul/data
      -config-dir=/consul/config
      -bind=0.0.0.0
      -client=0.0.0.0
      -retry-join=consul-server-1
      -retry-join=consul-server-3
    volumes:
      - consul-server-2-data:/consul/data
      - ./consul-config:/consul/config:ro
    networks:
      - consul-net
    restart: unless-stopped

  consul-server-3:
    image: hashicorp/consul:1.20
    command: >
      agent -server
      -node=server-3
      -bootstrap-expect=3
      -datacenter=dc1
      -data-dir=/consul/data
      -config-dir=/consul/config
      -bind=0.0.0.0
      -client=0.0.0.0
      -retry-join=consul-server-1
      -retry-join=consul-server-2
    volumes:
      - consul-server-3-data:/consul/data
      - ./consul-config:/consul/config:ro
    networks:
      - consul-net
    restart: unless-stopped

  consul-client:
    image: hashicorp/consul:1.20
    command: >
      agent
      -node=client-1
      -datacenter=dc1
      -data-dir=/consul/data
      -config-dir=/consul/config
      -bind=0.0.0.0
      -client=0.0.0.0
      -retry-join=consul-server-1
    volumes:
      - consul-client-data:/consul/data
      - ./consul-config:/consul/config:ro
      - ./services:/consul/services:ro
    networks:
      - consul-net
    restart: unless-stopped

volumes:
  consul-server-1-data:
  consul-server-2-data:
  consul-server-3-data:
  consul-client-data:

networks:
  consul-net:
    driver: bridge

Create consul-config/encryption.hcl:

encrypt = "YOUR_GOSSIP_KEY_HERE"

Start the cluster:

GOSSIP_KEY=$(docker run --rm hashicorp/consul:1.20 keygen)
echo "encrypt = \"${GOSSIP_KEY}\"" > consul-config/encryption.hcl
docker compose up -d
# Wait for cluster to form
sleep 5
docker compose exec consul-server-1 consul members

Gotchas and Edge Cases

  • bootstrap_expect mismatch — all servers must have the same bootstrap_expect value. A mismatch prevents cluster formation.
  • Gossip key rotation — Consul supports gossip key rotation with consul keyring, but all nodes must complete the rotation before removing the old key.
  • ACL token distribution — avoid hardcoding tokens in config files; use Vault’s Consul secrets engine or Kubernetes secrets for token injection.
  • Prepared query staleness — DNS responses from prepared queries default to allow_stale=true. Use consistent=true only for workloads that require it; it adds latency.
  • Envoy version compatibility — always match the Envoy version to Consul’s supported matrix. Running an unsupported Envoy version causes the proxy to fail silently.
  • Health check timeouts — set timeout less than interval. A check with interval: 10s and timeout: 15s creates overlapping check executions.
  • Large cluster scaling — for clusters with hundreds of agents, tune raft_multiplier, gossip RetransmitMult, and increase the number of servers (5 is the maximum recommended for Raft performance).

Summary

  • Three or five servers with Raft consensus provide the control plane; run an odd number for quorum tolerance.
  • Gossip (Serf) handles cluster membership and failure detection at O(log N) scale across both LAN and WAN.
  • Service definitions (JSON files or HTTP API) register services with rich health checks (HTTP, TCP, script, TTL, gRPC) that automatically remove unhealthy instances from DNS.
  • DNS interface on port 8600 lets any application resolve service.service.consul without code changes; forward .consul via systemd-resolved or dnsmasq.
  • Consul Connect adds mTLS between services via Envoy sidecars; intentions enforce authorization without touching application code.
  • Prepared queries provide DNS-level failover across datacenters for high availability.
  • The KV store combined with consul-template enables dynamic configuration management and automatic service reloads.
  • ACLs with default_policy = deny lock down the cluster; use auth methods (Kubernetes, AWS IAM) for zero-secret token issuance.
  • Mesh gateways enable secure multi-datacenter federation without exposing internal services across the WAN.