If you manage a Linux server, waking up to find your critical database or web server process unexpectedly dead is a nightmare. Often, the culprit is the Out of Memory (OOM) Killer, a merciless but necessary component of the Linux kernel. This guide explains why the OOM Killer strikes, how to definitively diagnose an OOM event, and how to configure your system to prevent it from crippling your infrastructure.
What is the OOM Killer?
When a Linux system runs completely out of physical RAM and Swap space, it faces a total system freeze. To prevent this, the kernel invokes the OOM Killer.
The OOM Killer examines all running processes, calculates a “badness” score (oom_score) based primarily on memory usage, and then violently terminates (SIGKILL) the process with the highest score to instantly free up memory and save the system.
The Error: How to Catch the OOM Killer in Action
Because the OOM Killer operates at the kernel level, your application rarely has time to write an error to its own log file. Instead, the process simply vanishes.
To confirm an OOM kill, you must check the kernel logs. Run:
dmesg -T | grep -i "Out of memory"
Or, to see exactly which process was assassinated:
dmesg -T | egrep -i "killed process"
You will see output resembling this:
[Fri Feb 27 10:15:30 2026] Out of memory: Killed process 12345 (mysqld) total-vm:4567890kB, anon-rss:3456780kB, file-rss:0kB, shmem-rss:0kB, UID:111 pgtables:8000kB oom_score_adj:0
Root Cause: Why Did It Happen?
The OOM Killer only strikes when the system is starving for memory. The common scenarios include:
- A Memory Leak: A buggy application (often custom code, or memory-managed languages like Java or Node.js without proper limits) slowly consumes RAM over time until the server is exhausted.
- Sudden Traffic Spike: A burst of web traffic spawned too many PHP-FPM or Apache worker processes, exponentially increasing memory usage.
- Improper Database Configuration: A database like MySQL is configured to use more memory (e.g.,
innodb_buffer_pool_size) than the server physically possesses. - No Swap Space: The server has exactly 0 bytes of Swap configured, leaving the kernel with no safety net when RAM fills up.
Step-by-Step Solution
Step 1: Add or Increase Swap Space
If your server has no swap space, adding even a small amount can absorb sudden, temporary spikes in memory usage and give you time to react before the OOM Killer acts.
Create a 2GB swap file:
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
Make it permanent by adding it to /etc/fstab:
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab
Step 2: Optimize Application Configuration
If MySQL or PostgreSQL is constantly being killed, it is likely misconfigured.
Check your database configuration (e.g., /etc/mysql/my.cnf) and ensure buffers are sized appropriately for your hardware. A common rule for innodb_buffer_pool_size is 50-70% of total RAM, but if you run a web server on the exact same machine, that must be significantly lower.
For PHP/Apache/Nginx: Reduce the maximum number of worker processes or children allowed to spawn simultaneously.
Step 3: Manipulate the OOM Score (Protecting Critical Processes)
Every process has an oom_score_adj value ranging from -1000 to 1000. The higher the number, the more likely the OOM Killer will target it. A value of -1000 makes the process immune.
If you have a critical process (like sshd so you can always log in, or a central daemon) that you absolutely cannot lose, you can adjust its score.
First, find the PID of the process (e.g., PID 1234).
echo -500 > /proc/1234/oom_score_adj
(You will need a script to do this automatically upon process restart, or use systemd).
Using Systemd (The Better Way):
If the process is managed by systemd, you can configure the OOM score natively.
Edit the service file (e.g., sudo systemctl edit sshd):
[Service]
OOMScoreAdjust=-1000
Then run sudo systemctl daemon-reload and restart the service.
Step 4: Prevent Kernel Overcommit
By default, Linux allows applications to request more memory than the system actually has (Memory Overcommit). It assumes not all applications will use what they request. When they actually do use it, you get an OOM event.
You can restrict this behavior via sysctl:
sudo sysctl -w vm.overcommit_memory=2
sudo sysctl -w vm.overcommit_ratio=80
This tells the kernel to not allow memory allocation exceeding (Swap + 80% of RAM). Add these to /etc/sysctl.conf to persist across reboots.
Prevention
To keep the OOM killer at bay permanently:
- Implement Alerting: Set up Prometheus, Datadog, or simple Monit scripts to alert you when memory usage exceeds 85%.
- Setup Cgroups: If running Docker or Kubernetes, set strict memory limits on containers (
--memory="512m"). If a process inside the container goes crazy, the OOM Killer will only kill the container, leaving the host system completely unharmed. - Right-size Your Server: Sometimes, there is no software fix for a lack of hardware. If your base application legitimately requires 8GB of RAM to run efficiently and you are on a 4GB VPS, you need to upgrade.
Summary
- The Linux OOM Killer is a kernel defense mechanism that kills the most memory-intensive process to save a starving system from freezing.
- Diagnosing an OOM event requires checking the kernel ring buffer using
dmesg. - Prevent OOM kills by configuring Swap space, tuning application memory limits (like MySQL buffer sizes), and setting strict limits via cgroups or systemd.
- You can protect critical daemons by lowering their
OOMScoreAdjustvia systemd.