TL;DR — Quick Summary
Complete Envoy Proxy guide for service mesh and edge proxy. Covers xDS API, load balancing, mTLS, observability, circuit breaking, and Kubernetes sidecar setup.
Envoy Proxy is the data plane at the heart of modern service meshes. Originally built at Lyft to solve microservice observability and reliability at scale, Envoy is now the de-facto sidecar for Istio, the edge proxy for many ingress controllers, and a general-purpose L3/L4/L7 proxy used by Google, AWS, and thousands of companies. This guide covers Envoy’s architecture, static and dynamic configuration, load balancing algorithms, observability pipeline, TLS management, and advanced filters — everything needed to run Envoy as a front proxy or service mesh data plane.
Prerequisites
- Docker (for standalone testing) or a Kubernetes cluster.
- Basic understanding of HTTP, TLS, and reverse proxy concepts.
- Familiarity with YAML configuration syntax.
curland optionallyjqfor testing endpoints.
Envoy Architecture
Envoy operates as an out-of-process network proxy — it runs alongside your application rather than as a library inside it. This keeps the proxy language-agnostic and allows independent upgrades.
Core components:
- Listeners — Network ports Envoy binds to (downstream connections arrive here).
- Filter chains — Ordered list of network and HTTP filters applied to each connection.
- Clusters — Named groups of upstream endpoints (your backend services).
- Endpoints — Individual IP:port pairs within a cluster, discovered via EDS or static config.
- Routes — Rules mapping incoming requests to clusters based on path, header, or query parameters.
Thread model: Envoy uses a single main thread for management plus one worker thread per CPU core. Each worker thread independently handles connections using non-blocking I/O via libevent. There is no lock contention on the hot path — each worker has its own connection pool.
Hot restart: Envoy supports zero-downtime binary upgrades via a shared-memory handshake between the old and new process. The new process takes ownership of existing connections without dropping traffic — critical for production deployments.
Static Configuration (envoy.yaml)
The fastest way to start is a static YAML config with all resources defined inline:
admin:
address:
socket_address:
address: 0.0.0.0
port_value: 9901
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
route_config:
name: local_route
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: service_backend
timeout: 15s
retry_policy:
retry_on: "5xx,reset,connect-failure"
num_retries: 3
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: service_backend
connect_timeout: 0.5s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: service_backend
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: backend-service
port_value: 8080
health_checks:
- timeout: 1s
interval: 10s
unhealthy_threshold: 2
healthy_threshold: 2
http_health_check:
path: /health
The admin block exposes Envoy’s management API on port 9901. Use it to query /stats, /clusters, /config_dump, and /healthcheck/fail for circuit-breaker testing.
Dynamic Configuration: The xDS API
Static configs work for simple deployments but become unwieldy at scale. Envoy’s xDS (x Discovery Service) protocol lets a control plane push configuration changes at runtime — no reload, no restart.
xDS resource types:
| API | Manages |
|---|---|
| LDS (Listener Discovery Service) | Listeners and filter chains |
| RDS (Route Discovery Service) | Virtual hosts and route tables |
| CDS (Cluster Discovery Service) | Cluster definitions and policies |
| EDS (Endpoint Discovery Service) | Individual endpoint IP:port health |
| SDS (Secret Discovery Service) | TLS certificates and private keys |
ADS (Aggregated Discovery Service): Combines all xDS APIs into a single bidirectional gRPC stream. This is the recommended mode because it guarantees ordering — a new cluster is always delivered before the route that references it, preventing temporary 503 errors during updates.
Delta xDS: Rather than sending the full state on every update, delta xDS sends only the added, modified, or removed resources. Essential for large meshes with thousands of clusters.
To enable dynamic config, replace static_resources with a dynamic_resources block pointing at your control plane:
dynamic_resources:
ads_config:
api_type: GRPC
transport_api_version: V3
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
lds_config:
resource_api_version: V3
ads: {}
cds_config:
resource_api_version: V3
ads: {}
Control planes that implement xDS: Istio istiod, Consul Connect, solo.io Gloo, and the reference go-control-plane library for custom implementations.
Load Balancing Algorithms
Envoy supports six load balancing policies selectable per cluster:
| Policy | Best For |
|---|---|
ROUND_ROBIN | Uniform backend capacity, default choice |
LEAST_REQUEST | Variable request duration, avoids hot backends |
RING_HASH | Consistent hashing — cache affinity, stateful services |
RANDOM | Simple, low overhead, resilient to slow endpoints |
MAGLEV | Google’s consistent hash — more even distribution than ring hash |
CLUSTER_PROVIDED | Delegates decision to upstream cluster type |
Least Request uses a power-of-two random choices algorithm: picks two random endpoints and routes to the one with fewer active requests. This outperforms round-robin when request durations vary significantly.
Ring Hash maps requests to endpoints using a consistent hash ring. Useful for upstream caches where the same key should always reach the same backend. Configure minimum_ring_size (default 1024) and maximum_ring_size for distribution quality.
Observability
Envoy is opinionated about observability — it was built to make distributed systems debuggable. Three built-in pillars:
Stats: Envoy emits thousands of counters, gauges, and histograms. Expose them to Prometheus at the admin endpoint:
curl http://localhost:9901/stats/prometheus
Key metrics: envoy_cluster_upstream_rq_total, envoy_cluster_upstream_rq_time, envoy_http_downstream_rq_5xx, envoy_cluster_circuit_breakers_default_open.
Distributed Tracing: Envoy automatically generates and propagates trace context headers for Jaeger, Zipkin, and OpenTelemetry. Add a tracing block to the HttpConnectionManager:
tracing:
provider:
name: envoy.tracers.opentelemetry
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
grpc_service:
envoy_grpc:
cluster_name: opentelemetry_collector
service_name: my-service
Envoy generates the x-request-id header for correlation and propagates traceparent / b3 headers downstream. Your application only needs to forward these headers — Envoy handles trace creation.
Access Logging: Structured JSON access logs with all request metadata:
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
log_format:
json_format:
start_time: "%START_TIME%"
method: "%REQ(:METHOD)%"
path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
response_code: "%RESPONSE_CODE%"
duration: "%DURATION%"
upstream_cluster: "%UPSTREAM_CLUSTER%"
bytes_sent: "%BYTES_SENT%"
TLS and mTLS with SDS
Manually distributing certificates across hundreds of services does not scale. Envoy’s Secret Discovery Service (SDS) solves this: certificates are fetched from a control plane at runtime and rotated without process restart.
For mutual TLS between services, configure a cluster’s transport_socket:
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
tls_certificate_sds_secret_configs:
- name: client_cert
sds_config:
resource_api_version: V3
ads: {}
combined_validation_context:
default_validation_context:
match_subject_alt_names:
- exact: "spiffe://cluster.local/ns/default/sa/backend"
validation_context_sds_secret_config:
name: validation_context
sds_config:
resource_api_version: V3
ads: {}
The match_subject_alt_names field enforces SPIFFE identity — only connections from services with the expected SPIFFE URI are accepted. This is how Istio implements zero-trust networking: every pod-to-pod connection is mutually authenticated via short-lived certificates rotated by SPIRE.
Advanced Filters
Circuit Breaking: Prevents cascade failures by limiting pending requests, retries, and connections:
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1000
max_pending_requests: 1000
max_requests: 1000
max_retries: 3
Outlier Detection: Automatically ejects unhealthy hosts from the load balancing pool:
outlier_detection:
consecutive_5xx: 5
interval: 10s
base_ejection_time: 30s
max_ejection_percent: 10
After five consecutive 5xx responses, the endpoint is ejected for 30 seconds. max_ejection_percent prevents ejecting all hosts when the upstream degrades globally.
Fault Injection: Inject latency or errors into a percentage of requests for chaos testing:
- name: envoy.filters.http.fault
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.fault.v3.HTTPFault
delay:
fixed_delay: 2s
percentage:
numerator: 10
denominator: HUNDRED
abort:
http_status: 503
percentage:
numerator: 5
denominator: HUNDRED
External Authorization: Delegate authorization decisions to an external gRPC service (e.g., OPA, Ory Keto). The ext_authz filter sends request headers to the authz service before forwarding upstream — enabling policy-as-code without application changes.
Wasm Extensions: Envoy supports WebAssembly filter plugins for custom business logic in any language that compiles to Wasm (Go, Rust, C++, AssemblyScript). Wasm filters are hot-reloaded via remote fetch without binary upgrades.
Envoy vs Other Proxies
| Feature | Envoy | Nginx | HAProxy | Traefik | Linkerd | MOSN |
|---|---|---|---|---|---|---|
| Dynamic config | xDS API (no reload) | nginx -s reload | Runtime API | Auto-discovers K8s | xDS (limited) | xDS API |
| Service mesh | Yes (Istio, Consul) | No | No | No (ingress only) | Yes (Linkerd2) | Yes (MOSN mesh) |
| L7 protocols | HTTP/1.1, HTTP/2, gRPC, Thrift, Kafka | HTTP/1.1, HTTP/2 | HTTP/1.1, HTTP/2 | HTTP/1.1, HTTP/2 | HTTP/1.1, HTTP/2, gRPC | HTTP/1.1, HTTP/2, Dubbo |
| Observability | Built-in stats + tracing | Module-based | Stats socket | Prometheus plugin | Built-in golden signals | Built-in stats |
| mTLS | SDS + SPIFFE | Manual certs | Manual certs | Manual certs | Automatic | SDS |
| Wasm filters | Yes | No | No | No | No | Yes |
| Config language | YAML/Protobuf | nginx.conf | haproxy.cfg | YAML/Labels | Linkerd CRDs | YAML/JSON |
Practical Example: Envoy as a Front Proxy
A real-world scenario: you have three microservices (users, orders, products) behind Envoy as an edge proxy, with traffic split between v1 and v2 of the orders service for canary deployment.
virtual_hosts:
- name: microservices
domains: ["api.example.com"]
routes:
- match:
prefix: "/users"
route:
cluster: users_service
- match:
prefix: "/products"
route:
cluster: products_service
- match:
prefix: "/orders"
route:
weighted_clusters:
clusters:
- name: orders_v1
weight: 90
- name: orders_v2
weight: 10
total_weight: 100
This weighted cluster configuration sends 10% of /orders traffic to the v2 canary without any application code changes. Envoy’s stats will show per-cluster request rates — allowing you to compare error rates and latencies before shifting more traffic.
Gotchas and Edge Cases
- Header case sensitivity: HTTP/2 headers are lowercase by default. Envoy normalizes headers — ensure your upstream services handle lowercase
content-type,authorization, etc. - Upstream timeouts vs route timeouts: Cluster
connect_timeout(TCP) and routetimeout(request) are independent. A missing route timeout defaults to 15 seconds — set it explicitly. - Retry budget: Without
retry_onlimits, retries under load can amplify failures. Always pair retries withretry_priorityand a circuit breaker. - EDS vs STRICT_DNS: Use
EDSfor dynamic service discovery via a control plane. UseSTRICT_DNSorLOGICAL_DNSfor simpler setups where DNS resolves the upstream.STATICis for fixed IP:port lists. - Wasm filter isolation: Each Wasm VM instance is isolated per worker thread, so plugin initialization runs once per thread. Shared state across workers requires external storage (Redis, etc.).
Troubleshooting
| Symptom | Likely Cause | Fix |
|---|---|---|
503 upstream_reset_before_response_started | Upstream closed connection before responding | Check upstream health check path; increase connect_timeout |
| 404 from Envoy (not upstream) | No matching route | Run /config_dump on admin port; check virtual host domain match |
| Circuit breaker open in stats | Upstream overwhelmed | Increase max_pending_requests or scale upstream |
| mTLS handshake failure | Certificate SAN mismatch | Verify match_subject_alt_names matches actual SPIFFE URI |
| High P99 latency | Thread starvation | Increase worker thread count via concurrency in bootstrap config |
| xDS update not applied | Control plane version mismatch | Ensure control plane uses xDS v3 proto; check Envoy version compatibility |
Summary
- xDS API enables fully dynamic configuration without restarts — clusters, routes, listeners, and certificates all update live.
- Load balancing offers six algorithms including Ring Hash for cache affinity and Least Request for heterogeneous workloads.
- Built-in observability provides Prometheus stats, distributed tracing headers, and structured JSON access logs out of the box.
- mTLS via SDS + SPIFFE delivers zero-trust networking with short-lived, automatically rotated certificates.
- Advanced filters (circuit breaking, outlier detection, fault injection, ext_authz, Wasm) make Envoy extensible without touching application code.
- Front proxy or sidecar — Envoy works standalone as an edge proxy or as the Istio/Consul data plane sidecar.