strace on Linux is the go-to tool for understanding what a process is doing at the kernel level. When a program fails silently, hangs, or behaves unexpectedly, strace reveals the exact system calls being made — file opens, network connections, memory allocations, and signal handling. This guide covers practical strace usage from basic tracing through production debugging techniques, with real-world scenarios that demonstrate how to diagnose the problems you actually encounter.
Prerequisites
- A Linux system (any distribution — strace works on all of them)
- Root or
sudoaccess (required for tracing processes owned by other users) - Basic understanding of Linux processes and file descriptors
- strace installed (covered in the installation section below)
Installing strace on Linux
strace is available in every major distribution’s package repository. It may already be installed on your system.
# Debian / Ubuntu
sudo apt install strace
# RHEL / CentOS / Fedora
sudo dnf install strace
# Arch Linux
sudo pacman -S strace
# Alpine Linux
sudo apk add strace
# Check installed version
strace --version
On minimal container images (Alpine, distroless), strace is usually missing. Install it temporarily for debugging, then remove it from production images.
Tracing a Command from Startup
The simplest use of strace is running it in front of any command:
strace ls /tmp
This prints every system call ls makes from execve() to exit_group(). The output goes to stderr, so the normal command output still appears on stdout.
Reading strace Output
Each line follows this format:
syscall_name(arguments...) = return_value
For example:
openat(AT_FDCWD, "/tmp", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
getdents64(3, [{d_ino=2, d_off=1, d_type=DT_DIR, d_name="."},...], 32768) = 480
close(3) = 0
This tells you: ls opened /tmp as file descriptor 3, read directory entries, then closed it. A return value of -1 means the call failed, and strace shows the errno:
openat(AT_FDCWD, "/etc/shadow", O_RDONLY) = -1 EACCES (Permission denied)
That single line often reveals why a program fails — no need to read source code or add debug logging.
Useful Startup Flags
# Follow child processes (fork/clone)
strace -f ./my_app
# Print timestamps for each syscall
strace -t ls /tmp
# Print relative time between syscalls (find hangs)
strace -r ls /tmp
# Print string arguments fully (default truncates at 32 chars)
strace -s 256 ./my_app
# Write output to a file instead of stderr
strace -o /tmp/trace.log ls /tmp
Attaching strace to a Running Process
You do not need to restart a process to trace it. Attach to any running process by PID:
# Find the PID
pidof nginx
# or
ps aux | grep my_app
# Attach to the process
sudo strace -p 12345
# Attach and follow all threads/children
sudo strace -fp 12345
Press Ctrl+C to detach. The traced process continues running normally — strace does not kill it on detach.
Real-World Scenario: Debugging a Hanging Application
You have a production web application that occasionally stops responding to requests. Logs show nothing. Instead of restarting blindly:
# Find the stuck worker PID
sudo strace -fp $(pidof my_app) -e trace=network,file -s 256 -o /tmp/hang_trace.log
Send a test request, then check the trace:
grep -E 'futex|poll|select|epoll_wait' /tmp/hang_trace.log | tail -20
If you see the process stuck in futex(FUTEX_WAIT) — it is waiting on a mutex lock (deadlock). If stuck in connect() or poll() with a long timeout — it is waiting on an upstream service that is not responding.
Filtering System Calls with strace
Full traces are noisy. Use -e trace= to focus on what matters:
# File operations only (open, read, write, close, stat, etc.)
strace -e trace=file ls /tmp
# Network operations only (socket, connect, send, recv, etc.)
strace -e trace=network curl https://example.com
# Process management (fork, clone, execve, wait, exit)
strace -e trace=process bash -c "ls | grep foo"
# Memory operations (mmap, mprotect, brk)
strace -e trace=memory ./my_app
# Specific syscalls by name
strace -e trace=openat,read,write cat /etc/hostname
# Negate — trace everything EXCEPT these
strace -e trace=!mmap,mprotect,brk ./my_app
Filtering by Return Value
Find only failed system calls — extremely useful for debugging:
# Show only calls that returned an error
strace -Z ./my_app
# Show only successful calls
strace -z ./my_app
The -Z flag (strace 5.2+) is a game-changer for production debugging. Instead of wading through thousands of successful calls, you see only the failures.
Performance Analysis with strace
System Call Summary
The -c flag produces a statistical summary instead of a live trace:
strace -c ls /tmp
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
25.00 0.000045 5 9 mmap
19.44 0.000035 5 7 close
16.67 0.000030 5 6 openat
11.11 0.000020 4 5 fstat
8.33 0.000015 5 3 mprotect
5.56 0.000010 3 3 read
...
------ ----------- ----------- --------- --------- ----------------
100.00 0.000180 4 42 2 total
This tells you which system calls consume the most time. If read or write dominates, you have an I/O-bound process. If futex dominates, you have lock contention.
Timing Individual Calls
# Wall-clock time per syscall
strace -T ./my_app
# Combined: timestamps + duration
strace -tT ./my_app
The -T flag appends the duration in angle brackets:
openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY) = 3 <0.000024>
read(3, "nameserver 8.8.8.8\n", 4096) = 19 <0.000011>
connect(4, {sa_family=AF_INET, sin_port=htons(443)}, 16) = -1 EINPROGRESS <0.000089>
That connect() taking 89 microseconds is fine. If you see it taking 5+ seconds, your DNS or upstream is the bottleneck.
Comparing strace with Alternative Tracing Tools
| Feature | strace | ltrace | perf trace | bpftrace |
|---|---|---|---|---|
| Traces | Kernel syscalls | Library calls | Kernel syscalls | Kernel + userspace |
| Overhead | High (10-100x) | High | Low (~5%) | Very low (~2%) |
| Attach to running PID | Yes | Yes | Yes | Yes |
| Filter by syscall | Yes (-e trace=) | Yes (-e) | Yes (—filter) | Yes (custom scripts) |
| Statistical summary | Yes (-c) | Yes (-c) | Yes (—summary) | Custom scripts |
| Kernel version needed | Any | Any | 3.7+ | 4.9+ (eBPF) |
| Production safe | Brief use only | Brief use only | Yes | Yes |
| Best for | Quick debugging | Library call issues | Low-overhead profiling | Complex tracing logic |
Rule of thumb: Use strace for quick diagnosis (attach, find the problem, detach). Switch to perf trace or bpftrace for sustained production monitoring where overhead matters.
Common strace Debugging Patterns
Pattern 1: Find Why a Program Cannot Find a File
strace -e trace=openat,stat,access ./my_app 2>&1 | grep -i "no such file\|enoent\|eacces"
This instantly shows every file the program tries to open and fails. Common for configuration file issues, missing shared libraries, or wrong paths.
Pattern 2: Find What Configuration Files a Program Reads
strace -e trace=openat ./my_app 2>&1 | grep -v ENOENT | grep '= [0-9]'
This filters to only successful file opens — showing you exactly which config files, libraries, and data files the program actually uses.
Pattern 3: Debug DNS Resolution Issues
strace -e trace=network -s 256 curl https://example.com 2>&1 | grep -E 'connect|sendto|recvfrom'
You will see the DNS query going to /etc/resolv.conf nameservers, the response, and then the actual HTTPS connection. If the DNS query takes seconds, you have found your latency source.
Pattern 4: Find Why a Service Fails on Startup
sudo strace -f -o /tmp/service_trace.log systemctl start my_service
# Then search the trace for errors
grep '= -1' /tmp/service_trace.log | grep -v 'ENOENT.*locale\|ENOENT.*lib' | head -30
The grep filters out harmless “file not found” errors from locale and library probing (which are normal), leaving you with the real failures.
Pattern 5: Monitor File Writes in Real-Time
sudo strace -fp $(pidof my_app) -e trace=write -s 1024 2>&1 | grep 'write([0-9]*,'
Watch every byte the process writes to any file descriptor — useful for debugging logging issues or unexpected file modifications.
Gotchas and Edge Cases
Tracing setuid binaries: strace cannot attach to setuid programs unless you run strace itself as root. The kernel drops ptrace permissions for security.
Multi-threaded applications: Always use -f (follow forks) with multi-threaded apps. Without it, you only see the main thread’s syscalls and miss the worker threads where the actual problem lives.
Container environments: Inside Docker containers, strace requires the SYS_PTRACE capability. Run with --cap-add=SYS_PTRACE or use --privileged for debugging:
docker run --cap-add=SYS_PTRACE my_image strace ./my_app
In Kubernetes, add the capability to your pod security context:
securityContext:
capabilities:
add: ["SYS_PTRACE"]
Performance impact is real: strace intercepts every system call via ptrace, which requires two context switches per call. A process making 100,000 syscalls/second will be dramatically slowed. Never leave strace attached to a production process longer than needed to capture the issue.
strace -c masks individual slow calls: The -c summary shows averages. A process might make 10,000 fast read() calls and one 30-second read() — the average looks fine. Use -C (capital) to get both the summary and the live trace to catch outliers.
Troubleshooting Common strace Issues
”Operation not permitted” when attaching
# Check ptrace scope (Ubuntu/Debian)
cat /proc/sys/kernel/yama/ptrace_scope
If the value is 1 (default on Ubuntu), you can only trace your own processes. To temporarily allow tracing any process:
# Temporary — resets on reboot
sudo sysctl kernel.yama.ptrace_scope=0
# Or just use sudo with strace
sudo strace -p 12345
Trace output is overwhelming
# Combine filters: only failed file operations with timing
strace -Z -e trace=file -T -s 256 ./my_app 2>&1 | head -50
Need to trace a short-lived process
For processes that start and exit quickly (like cron jobs):
# Wrap the command
strace -f -o /tmp/cron_trace.log /path/to/cron_script.sh
# Or trace the parent that spawns it
sudo strace -fp $(pidof crond) -o /tmp/cron_trace.log
Summary
- strace traces kernel system calls — use it when a process fails silently, hangs, or misbehaves and logs give no clues
- Attach to running processes with
strace -p PIDwithout restarting them — add-ffor multi-threaded applications - Filter with
-e trace=to focus on file, network, process, or memory operations instead of drowning in noise - Use
-Zto show only failed calls — the fastest way to find why something is broken - Use
-cfor performance summaries — identify which system calls consume the most time - Performance overhead is significant — attach briefly in production, then detach. For sustained tracing, use
perf traceorbpftrace - Containers need SYS_PTRACE capability — add it explicitly in Docker or Kubernetes