strace on Linux is the go-to tool for understanding what a process is doing at the kernel level. When a program fails silently, hangs, or behaves unexpectedly, strace reveals the exact system calls being made — file opens, network connections, memory allocations, and signal handling. This guide covers practical strace usage from basic tracing through production debugging techniques, with real-world scenarios that demonstrate how to diagnose the problems you actually encounter.

Prerequisites

  • A Linux system (any distribution — strace works on all of them)
  • Root or sudo access (required for tracing processes owned by other users)
  • Basic understanding of Linux processes and file descriptors
  • strace installed (covered in the installation section below)

Installing strace on Linux

strace is available in every major distribution’s package repository. It may already be installed on your system.

# Debian / Ubuntu
sudo apt install strace

# RHEL / CentOS / Fedora
sudo dnf install strace

# Arch Linux
sudo pacman -S strace

# Alpine Linux
sudo apk add strace

# Check installed version
strace --version

On minimal container images (Alpine, distroless), strace is usually missing. Install it temporarily for debugging, then remove it from production images.

Tracing a Command from Startup

The simplest use of strace is running it in front of any command:

strace ls /tmp

This prints every system call ls makes from execve() to exit_group(). The output goes to stderr, so the normal command output still appears on stdout.

Reading strace Output

Each line follows this format:

syscall_name(arguments...) = return_value

For example:

openat(AT_FDCWD, "/tmp", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
getdents64(3, [{d_ino=2, d_off=1, d_type=DT_DIR, d_name="."},...], 32768) = 480
close(3)                                = 0

This tells you: ls opened /tmp as file descriptor 3, read directory entries, then closed it. A return value of -1 means the call failed, and strace shows the errno:

openat(AT_FDCWD, "/etc/shadow", O_RDONLY) = -1 EACCES (Permission denied)

That single line often reveals why a program fails — no need to read source code or add debug logging.

Useful Startup Flags

# Follow child processes (fork/clone)
strace -f ./my_app

# Print timestamps for each syscall
strace -t ls /tmp

# Print relative time between syscalls (find hangs)
strace -r ls /tmp

# Print string arguments fully (default truncates at 32 chars)
strace -s 256 ./my_app

# Write output to a file instead of stderr
strace -o /tmp/trace.log ls /tmp

Attaching strace to a Running Process

You do not need to restart a process to trace it. Attach to any running process by PID:

# Find the PID
pidof nginx
# or
ps aux | grep my_app

# Attach to the process
sudo strace -p 12345

# Attach and follow all threads/children
sudo strace -fp 12345

Press Ctrl+C to detach. The traced process continues running normally — strace does not kill it on detach.

Real-World Scenario: Debugging a Hanging Application

You have a production web application that occasionally stops responding to requests. Logs show nothing. Instead of restarting blindly:

# Find the stuck worker PID
sudo strace -fp $(pidof my_app) -e trace=network,file -s 256 -o /tmp/hang_trace.log

Send a test request, then check the trace:

grep -E 'futex|poll|select|epoll_wait' /tmp/hang_trace.log | tail -20

If you see the process stuck in futex(FUTEX_WAIT) — it is waiting on a mutex lock (deadlock). If stuck in connect() or poll() with a long timeout — it is waiting on an upstream service that is not responding.

Filtering System Calls with strace

Full traces are noisy. Use -e trace= to focus on what matters:

# File operations only (open, read, write, close, stat, etc.)
strace -e trace=file ls /tmp

# Network operations only (socket, connect, send, recv, etc.)
strace -e trace=network curl https://example.com

# Process management (fork, clone, execve, wait, exit)
strace -e trace=process bash -c "ls | grep foo"

# Memory operations (mmap, mprotect, brk)
strace -e trace=memory ./my_app

# Specific syscalls by name
strace -e trace=openat,read,write cat /etc/hostname

# Negate — trace everything EXCEPT these
strace -e trace=!mmap,mprotect,brk ./my_app

Filtering by Return Value

Find only failed system calls — extremely useful for debugging:

# Show only calls that returned an error
strace -Z ./my_app

# Show only successful calls
strace -z ./my_app

The -Z flag (strace 5.2+) is a game-changer for production debugging. Instead of wading through thousands of successful calls, you see only the failures.

Performance Analysis with strace

System Call Summary

The -c flag produces a statistical summary instead of a live trace:

strace -c ls /tmp
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 25.00    0.000045           5         9           mmap
 19.44    0.000035           5         7           close
 16.67    0.000030           5         6           openat
 11.11    0.000020           4         5           fstat
  8.33    0.000015           5         3           mprotect
  5.56    0.000010           3         3           read
  ...
------ ----------- ----------- --------- --------- ----------------
100.00    0.000180           4        42         2 total

This tells you which system calls consume the most time. If read or write dominates, you have an I/O-bound process. If futex dominates, you have lock contention.

Timing Individual Calls

# Wall-clock time per syscall
strace -T ./my_app

# Combined: timestamps + duration
strace -tT ./my_app

The -T flag appends the duration in angle brackets:

openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY) = 3 <0.000024>
read(3, "nameserver 8.8.8.8\n", 4096)  = 19 <0.000011>
connect(4, {sa_family=AF_INET, sin_port=htons(443)}, 16) = -1 EINPROGRESS <0.000089>

That connect() taking 89 microseconds is fine. If you see it taking 5+ seconds, your DNS or upstream is the bottleneck.

Comparing strace with Alternative Tracing Tools

Featurestraceltraceperf tracebpftrace
TracesKernel syscallsLibrary callsKernel syscallsKernel + userspace
OverheadHigh (10-100x)HighLow (~5%)Very low (~2%)
Attach to running PIDYesYesYesYes
Filter by syscallYes (-e trace=)Yes (-e)Yes (—filter)Yes (custom scripts)
Statistical summaryYes (-c)Yes (-c)Yes (—summary)Custom scripts
Kernel version neededAnyAny3.7+4.9+ (eBPF)
Production safeBrief use onlyBrief use onlyYesYes
Best forQuick debuggingLibrary call issuesLow-overhead profilingComplex tracing logic

Rule of thumb: Use strace for quick diagnosis (attach, find the problem, detach). Switch to perf trace or bpftrace for sustained production monitoring where overhead matters.

Common strace Debugging Patterns

Pattern 1: Find Why a Program Cannot Find a File

strace -e trace=openat,stat,access ./my_app 2>&1 | grep -i "no such file\|enoent\|eacces"

This instantly shows every file the program tries to open and fails. Common for configuration file issues, missing shared libraries, or wrong paths.

Pattern 2: Find What Configuration Files a Program Reads

strace -e trace=openat ./my_app 2>&1 | grep -v ENOENT | grep '= [0-9]'

This filters to only successful file opens — showing you exactly which config files, libraries, and data files the program actually uses.

Pattern 3: Debug DNS Resolution Issues

strace -e trace=network -s 256 curl https://example.com 2>&1 | grep -E 'connect|sendto|recvfrom'

You will see the DNS query going to /etc/resolv.conf nameservers, the response, and then the actual HTTPS connection. If the DNS query takes seconds, you have found your latency source.

Pattern 4: Find Why a Service Fails on Startup

sudo strace -f -o /tmp/service_trace.log systemctl start my_service
# Then search the trace for errors
grep '= -1' /tmp/service_trace.log | grep -v 'ENOENT.*locale\|ENOENT.*lib' | head -30

The grep filters out harmless “file not found” errors from locale and library probing (which are normal), leaving you with the real failures.

Pattern 5: Monitor File Writes in Real-Time

sudo strace -fp $(pidof my_app) -e trace=write -s 1024 2>&1 | grep 'write([0-9]*,'

Watch every byte the process writes to any file descriptor — useful for debugging logging issues or unexpected file modifications.

Gotchas and Edge Cases

Tracing setuid binaries: strace cannot attach to setuid programs unless you run strace itself as root. The kernel drops ptrace permissions for security.

Multi-threaded applications: Always use -f (follow forks) with multi-threaded apps. Without it, you only see the main thread’s syscalls and miss the worker threads where the actual problem lives.

Container environments: Inside Docker containers, strace requires the SYS_PTRACE capability. Run with --cap-add=SYS_PTRACE or use --privileged for debugging:

docker run --cap-add=SYS_PTRACE my_image strace ./my_app

In Kubernetes, add the capability to your pod security context:

securityContext:
  capabilities:
    add: ["SYS_PTRACE"]

Performance impact is real: strace intercepts every system call via ptrace, which requires two context switches per call. A process making 100,000 syscalls/second will be dramatically slowed. Never leave strace attached to a production process longer than needed to capture the issue.

strace -c masks individual slow calls: The -c summary shows averages. A process might make 10,000 fast read() calls and one 30-second read() — the average looks fine. Use -C (capital) to get both the summary and the live trace to catch outliers.

Troubleshooting Common strace Issues

”Operation not permitted” when attaching

# Check ptrace scope (Ubuntu/Debian)
cat /proc/sys/kernel/yama/ptrace_scope

If the value is 1 (default on Ubuntu), you can only trace your own processes. To temporarily allow tracing any process:

# Temporary — resets on reboot
sudo sysctl kernel.yama.ptrace_scope=0

# Or just use sudo with strace
sudo strace -p 12345

Trace output is overwhelming

# Combine filters: only failed file operations with timing
strace -Z -e trace=file -T -s 256 ./my_app 2>&1 | head -50

Need to trace a short-lived process

For processes that start and exit quickly (like cron jobs):

# Wrap the command
strace -f -o /tmp/cron_trace.log /path/to/cron_script.sh

# Or trace the parent that spawns it
sudo strace -fp $(pidof crond) -o /tmp/cron_trace.log

Summary

  • strace traces kernel system calls — use it when a process fails silently, hangs, or misbehaves and logs give no clues
  • Attach to running processes with strace -p PID without restarting them — add -f for multi-threaded applications
  • Filter with -e trace= to focus on file, network, process, or memory operations instead of drowning in noise
  • Use -Z to show only failed calls — the fastest way to find why something is broken
  • Use -c for performance summaries — identify which system calls consume the most time
  • Performance overhead is significant — attach briefly in production, then detach. For sustained tracing, use perf trace or bpftrace
  • Containers need SYS_PTRACE capability — add it explicitly in Docker or Kubernetes