What is PostgreSQL streaming replication?

Streaming replication continuously ships WAL records from a primary server to one or more standby servers in real time, enabling high availability and read scaling.

How do I monitor replication lag in PostgreSQL?

Use the pg_stat_replication view on the primary to check sent_lsn versus replay_lsn, or query pg_last_wal_replay_lsn() on the standby and compare timestamps.

What is the difference between streaming and logical replication?

Streaming replication copies WAL bytes for a full binary replica, while logical replication decodes changes at the row level allowing selective table replication across different versions.

How do I perform a failover to a PostgreSQL standby?

Promote the standby using pg_ctl promote, the pg_promote() SQL function, or create a trigger file defined in promote_trigger_file, then redirect application connections.

PostgreSQL Streaming Replication Setup Guide

PostgreSQL streaming replication provides a robust mechanism for maintaining real-time copies of your database across multiple servers. Whether you need high availability, disaster recovery, or read scaling, streaming replication delivers byte-level consistency with minimal lag. This guide walks you through setting up a primary-standby architecture from scratch, configuring WAL archiving, monitoring replication health, and planning failover strategies.

Prerequisites

Two Linux servers (Ubuntu 22.04+ or RHEL 9+) with network connectivity
PostgreSQL 16 or later installed on both servers
Sufficient disk space for WAL files and base backups
Root or sudo access on both machines
Port 5432 open between primary and standby servers
Basic familiarity with PostgreSQL configuration files

Understanding PostgreSQL Replication

PostgreSQL supports two primary replication modes: streaming replication and logical replication. Streaming replication operates at the WAL (Write-Ahead Log) byte level, sending a continuous stream of WAL records from the primary to standbys. This creates an exact binary copy of the entire cluster.

The replication process follows this flow:

The primary writes changes to WAL segments
The WAL sender process streams records to connected standbys
The standby WAL receiver writes records to local WAL
The standby startup process replays WAL records to update data files

Streaming replication can operate in asynchronous (default) or synchronous mode. Asynchronous provides better performance with minimal lag, while synchronous guarantees zero data loss at the cost of write latency.

Configuring the Primary Server

Edit the primary server’s postgresql.conf to enable replication:

# /etc/postgresql/16/main/postgresql.conf

listen_addresses = '*'
wal_level = replica
max_wal_senders = 5
wal_keep_size = '1GB'
hot_standby = on

The wal_level = replica setting ensures WAL contains enough information for replication. max_wal_senders controls how many concurrent streaming connections are allowed. wal_keep_size prevents the primary from recycling WAL segments before the standby has consumed them.

Next, configure pg_hba.conf to allow the standby to connect for replication:

# /etc/postgresql/16/main/pg_hba.conf

# TYPE  DATABASE        USER            ADDRESS              METHOD
host    replication     replicator      10.0.1.20/32         scram-sha-256

Replace 10.0.1.20 with your standby server’s IP address.

Create a dedicated replication user:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'your_secure_password';

Restart PostgreSQL to apply configuration changes:

sudo systemctl restart postgresql

Setting Up the Standby Server

On the standby server, stop PostgreSQL and clear the existing data directory:

sudo systemctl stop postgresql
sudo rm -rf /var/lib/postgresql/16/main/*

Run pg_basebackup to clone the primary’s data:

sudo -u postgres pg_basebackup \
  -h 10.0.1.10 \
  -U replicator \
  -D /var/lib/postgresql/16/main \
  -Fp -Xs -P -R

The flags serve specific purposes:

-Fp: Plain format output
-Xs: Stream WAL during backup to avoid gaps
-P: Show progress
-R: Automatically create standby.signal and write primary_conninfo to postgresql.auto.conf

If you prefer manual configuration, create the standby signal and connection settings yourself:

touch /var/lib/postgresql/16/main/standby.signal

Add to postgresql.conf on the standby:

primary_conninfo = 'host=10.0.1.10 port=5432 user=replicator password=your_secure_password application_name=standby1'
hot_standby = on

Start the standby server:

sudo systemctl start postgresql

WAL Archiving

WAL archiving provides an additional safety net by copying completed WAL segments to an archive location. This enables point-in-time recovery (PITR) and protects against scenarios where the standby falls too far behind.

Configure archiving on the primary:

# /etc/postgresql/16/main/postgresql.conf

archive_mode = on
archive_command = 'cp %p /var/lib/postgresql/wal_archive/%f'

Create the archive directory:

sudo mkdir -p /var/lib/postgresql/wal_archive
sudo chown postgres:postgres /var/lib/postgresql/wal_archive

For production environments, use rsync or a cloud storage solution instead of local cp:

archive_command = 'rsync -a %p backup-server:/wal_archive/%f'

Configure the standby to use archived WAL as a fallback with restore_command:

# On the standby postgresql.conf
restore_command = 'cp /mnt/wal_archive/%f %p'

This allows the standby to recover WAL segments from the archive when streaming falls behind.

Monitoring Replication Lag

Monitoring replication lag is critical for ensuring your standby stays current. On the primary server, query pg_stat_replication:

SELECT
  client_addr,
  application_name,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes,
  write_lag,
  flush_lag,
  replay_lag
FROM pg_stat_replication;

On the standby server, check the receiver status and calculate lag:

SELECT
  status,
  received_lsn,
  latest_end_lsn,
  last_msg_send_time,
  last_msg_receipt_time
FROM pg_stat_wal_receiver;

-- Time-based lag estimation
SELECT
  now() - pg_last_xact_replay_timestamp() AS replication_delay;

Set up an alert when lag exceeds your threshold:

#!/bin/bash
LAG_BYTES=$(psql -h primary -U monitor -t -c \
  "SELECT pg_wal_lsn_diff(sent_lsn, replay_lsn) FROM pg_stat_replication WHERE application_name='standby1';")

if [ "$LAG_BYTES" -gt 104857600 ]; then
  echo "ALERT: Replication lag exceeds 100MB ($LAG_BYTES bytes)" | mail -s "PG Replication Alert" admin@example.com
fi

Failover Strategies

When the primary fails, you need a clear failover plan. PostgreSQL provides several promotion methods:

Manual promotion:

# Using pg_ctl
sudo -u postgres pg_ctl promote -D /var/lib/postgresql/16/main

# Using SQL (PostgreSQL 12+)
SELECT pg_promote(wait := true, wait_seconds := 60);

Trigger file promotion:

Configure promote_trigger_file in the standby’s postgresql.conf:

promote_trigger_file = '/tmp/postgresql.trigger'

Create the file to trigger promotion:

touch /tmp/postgresql.trigger

Automated failover with Patroni:

For production environments, use a tool like Patroni with etcd:

# /etc/patroni/patroni.yml (excerpt)
scope: pg-cluster
namespace: /db/
name: node1

restapi:
  listen: 0.0.0.0:8008

etcd:
  hosts: 10.0.1.100:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    maximum_lag_on_failover: 1048576

postgresql:
  listen: 0.0.0.0:5432
  data_dir: /var/lib/postgresql/16/main
  authentication:
    replication:
      username: replicator
      password: your_secure_password

After failover, the old primary must be reintroduced as a standby using pg_rewind:

sudo -u postgres pg_rewind \
  --target-pgdata=/var/lib/postgresql/16/main \
  --source-server="host=10.0.1.20 port=5432 user=postgres"

Streaming vs Logical Replication Comparison

Feature	Streaming Replication	Logical Replication
Replication level	Byte-level (WAL)	Row-level (decoded changes)
Cluster scope	Entire cluster	Per-table or per-database
Cross-version support	Same major version required	Different major versions supported
Write on standby	Read-only (hot standby)	Read-write on subscriber
DDL replication	Automatic	Manual (not replicated)
Performance overhead	Very low	Moderate (decoding cost)
Setup complexity	Simple	Moderate
Use case	HA and disaster recovery	Selective replication and migrations

Choose streaming replication for full cluster high availability. Choose logical replication when you need selective table replication, cross-version upgrades, or writable subscribers.

Real-World Scenario

You manage a production e-commerce database handling 5,000 transactions per second. The business requires less than 30 seconds of downtime during any failure. Here is how you architect the solution:

Primary server (10.0.1.10) handles all writes. Synchronous standby (10.0.1.20) in the same datacenter ensures zero data loss. Asynchronous standby (10.0.2.10) in a remote datacenter provides disaster recovery.

Configure synchronous replication on the primary:

synchronous_standby_names = 'FIRST 1 (standby_dc1, standby_dc2)'
synchronous_commit = on

Route read queries to standbys using PgBouncer or HAProxy:

# /etc/haproxy/haproxy.cfg (excerpt)
listen pg-read
    bind *:5433
    mode tcp
    balance roundrobin
    option pgsql-check user haproxy
    server standby1 10.0.1.20:5432 check
    server standby2 10.0.2.10:5432 check

This architecture provides zero data loss for local failures and minimal data loss for datacenter-level events, while offloading read traffic to standbys.

Gotchas and Edge Cases

WAL segment removal: If the standby disconnects longer than wal_keep_size allows, it cannot catch up via streaming. Always configure WAL archiving as a safety net or use replication slots: SELECT pg_create_physical_replication_slot('standby1_slot');
Replication slots and disk usage: Unused replication slots prevent WAL cleanup, potentially filling the disk. Monitor pg_replication_slots and drop inactive slots
Large transactions: A single massive transaction (bulk import) generates huge WAL volumes that may overwhelm the standby or network. Break large operations into batches
Synchronous commit latency: Enabling synchronous_commit = on with synchronous standbys adds network round-trip time to every commit. Test write throughput before enabling
Timeline divergence: After promoting a standby, the old primary is on a different timeline. You must use pg_rewind or rebuild it entirely before reintroducing it as a standby
Password in primary_conninfo: Store credentials in .pgpass instead of plaintext in configuration files

Troubleshooting

Standby not connecting:

# Check standby logs
sudo tail -f /var/log/postgresql/postgresql-16-main.log

# Verify replication connections on primary
sudo -u postgres psql -c "SELECT * FROM pg_stat_replication;"

# Test connectivity from standby
psql -h 10.0.1.10 -U replicator -d postgres -c "IDENTIFY_SYSTEM;"

Replication lag growing:

-- Check if standby is replaying
SELECT pg_is_in_recovery(), pg_last_wal_replay_lsn();

-- Check for long-running queries blocking replay
SELECT pid, query, state, wait_event FROM pg_stat_activity
WHERE state != 'idle' AND backend_type = 'client backend';

WAL archive filling up:

# Check archive status
sudo -u postgres psql -c "SELECT * FROM pg_stat_archiver;"

# Manually clean old archives (keep at least 1 day)
find /var/lib/postgresql/wal_archive -mtime +1 -delete

Summary

PostgreSQL streaming replication creates real-time binary copies of your entire database cluster
Configure wal_level = replica, max_wal_senders, and replication user on the primary
Use pg_basebackup with the -R flag to bootstrap standbys with automatic configuration
WAL archiving provides a safety net for standbys that fall behind streaming
Monitor replication lag via pg_stat_replication and pg_stat_wal_receiver
Use Patroni or similar tooling for automated failover in production
Replication slots prevent WAL removal but require monitoring to avoid disk exhaustion
Always test your failover procedure before you need it in an emergency