What is MariaDB Galera Cluster?
MariaDB Galera Cluster provides a synchronous multi-master replication solution for MariaDB. Unlike traditional asynchronous replication where writes go to a single master, Galera allows reads and writes to any node in the cluster. All nodes contain the same data at any given time (virtually synchronous), providing true high availability with automatic failover.
Key features:
- Synchronous replication: All nodes are consistent — no slave lag, no lost transactions.
- Multi-master: Read and write to any node, simplifying application load balancing.
- Automatic node provisioning: New nodes joining the cluster automatically receive a full dataset copy (SST) or incremental updates (IST).
- Automatic membership control: Failed nodes are detected and removed from the cluster automatically.
This guide covers installation, configuration, bootstrapping, and troubleshooting of a 3-node MariaDB Galera Cluster on Ubuntu/Debian.
Prerequisites
- Three Linux servers (Ubuntu 22.04+ or Debian 12+ recommended) with static IPs.
- MariaDB 10.6+ or 11.x installed on all nodes.
- Ports open between all nodes: 3306 (MySQL), 4567 (Galera replication), 4568 (IST), 4444 (SST).
- Root or sudo access on all servers.
- Firewall configured to allow inter-node communication.
Step-by-Step Solution
1. Install MariaDB and Galera on All Nodes
# Ubuntu/Debian
sudo apt update
sudo apt install -y mariadb-server galera-4 mariadb-backup
# Verify installation
mariadbd --version
Important: All nodes must run the exact same MariaDB version. Version mismatches will cause SST failures.
2. Configure Galera on Each Node
Create or edit /etc/mysql/mariadb.conf.d/60-galera.cnf on each node:
[galera]
# Galera provider
wsrep_on = ON
wsrep_provider = /usr/lib/galera/libgalera_smm.so
# Cluster configuration
wsrep_cluster_name = "my_galera_cluster"
wsrep_cluster_address = "gcomm://192.168.1.101,192.168.1.102,192.168.1.103"
# Node-specific settings (change on each node)
wsrep_node_address = "192.168.1.101" # This node's IP
wsrep_node_name = "node1" # This node's name
# SST method (mariabackup is recommended for production)
wsrep_sst_method = mariabackup
wsrep_sst_auth = "sstuser:sstpassword"
# InnoDB settings (required for Galera)
binlog_format = ROW
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
innodb_force_primary_key = 1
# Performance tuning
wsrep_slave_threads = 4
innodb_flush_log_at_trx_commit = 2
3. Create the SST User
On one node (before bootstrapping):
-- Start MariaDB normally first
sudo systemctl start mariadb
-- Create the SST user
CREATE USER 'sstuser'@'localhost' IDENTIFIED BY 'sstpassword';
GRANT RELOAD, PROCESS, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'sstuser'@'localhost';
FLUSH PRIVILEGES;
-- Stop MariaDB
sudo systemctl stop mariadb
4. Bootstrap the First Node
On Node 1 only:
sudo galera_new_cluster
Verify the cluster has started:
SHOW STATUS LIKE 'wsrep_cluster_size';
-- Should return: 1
SHOW STATUS LIKE 'wsrep_cluster_status';
-- Should return: Primary
SHOW STATUS LIKE 'wsrep_ready';
-- Should return: ON
5. Join Remaining Nodes
On Node 2 and Node 3, simply start MariaDB normally:
sudo systemctl start mariadb
Each node will automatically connect to the cluster and receive data via SST. Monitor the process:
# Watch the MariaDB error log
sudo tail -f /var/log/mysql/error.log
Verify the cluster size increases:
SHOW STATUS LIKE 'wsrep_cluster_size';
-- Should return: 3 (after all nodes join)
Troubleshooting Common Issues
Split-Brain Recovery
A split-brain occurs when network partitioning causes nodes to disagree about cluster membership. The minority partition will stop accepting queries (wsrep_ready = OFF).
-- Check for split-brain
SHOW STATUS LIKE 'wsrep_cluster_status';
-- "Non-Primary" means this node is in the minority partition
Recovery:
# On the minority partition node, stop MariaDB
sudo systemctl stop mariadb
# Fix the network issue, then restart
sudo systemctl start mariadb
# The node will rejoin the majority partition
Full Cluster Crash Recovery
When all nodes crash or are stopped simultaneously:
# On each node, find the most recent data
sudo galera_recovery
# Look for "Recovered position:" — the node with the highest seqno has the latest data
# Bootstrap from the node with the highest seqno
sudo galera_new_cluster # On the most recent node ONLY
# Start remaining nodes normally
sudo systemctl start mariadb # On other nodes
Warning: Never bootstrap from a node that is not the most up-to-date. This can lead to data loss.
SST Failures
If SST fails during node joining:
# Check the error log
sudo tail -100 /var/log/mysql/error.log | grep -i "sst\|wsrep"
# Common causes:
# 1. Wrong SST user credentials — verify wsrep_sst_auth
# 2. mariabackup not installed — install mariadb-backup
# 3. Firewall blocking port 4444 — open it
# 4. Disk full on joining node — free space
Cluster Won’t Start After Safe Shutdown
If you gracefully stopped all nodes and the cluster refuses to start:
# Check safe_to_bootstrap in grastate.dat
cat /var/lib/mysql/grastate.dat
# If safe_to_bootstrap: 0 on all nodes, manually set it to 1
# on the node you want to bootstrap from
sudo sed -i 's/safe_to_bootstrap: 0/safe_to_bootstrap: 1/' /var/lib/mysql/grastate.dat
sudo galera_new_cluster
Gotchas and Edge Cases
- No
ALTER TABLEon large tables during traffic: Large DDL operations will block the entire cluster due to Total Order Isolation (TOI). Usept-online-schema-changeor rolling schema upgrades instead. - Only InnoDB: Galera only replicates InnoDB tables. MyISAM, Aria, and other engines are not supported.
- Auto-increment gaps: With
innodb_autoinc_lock_mode = 2, auto-increment values will have gaps. This is expected and necessary for multi-master writes. - Minimum 3 nodes: Always run an odd number of nodes (3, 5, 7) to avoid split-brain scenarios where both partitions are equal in size.
- gcache sizing: Set
wsrep_provider_options = "gcache.size=1G"to enable IST for nodes that were briefly offline (IST is much faster than full SST).
Summary
- MariaDB Galera Cluster provides synchronous multi-master replication with automatic failover.
- Always bootstrap from the node with the most recent data after a full cluster shutdown.
- Use
mariabackupas the SST method for production clusters. - Monitor
wsrep_cluster_size,wsrep_cluster_status, andwsrep_readyfor cluster health. - Run an odd number of nodes (minimum 3) to avoid split-brain scenarios.