How do I trace a running process with strace without restarting it?

Use strace -p PID to attach to an already running process. Add -f to also trace child threads. You need root or ptrace permissions. Detach cleanly with Ctrl+C and the process continues running normally.

Does strace slow down the process being traced?

Yes, strace adds significant overhead because every system call triggers a context switch to the tracer. Expect 10-100x slowdowns depending on syscall frequency. For production, use strace briefly on a single request or switch to eBPF-based tools like bpftrace for lower overhead.

What is the difference between strace and ltrace on Linux?

strace traces kernel system calls (open, read, write, mmap) while ltrace traces userspace library calls (malloc, printf, strlen). Use strace for I/O, permission, and kernel-level issues. Use ltrace for debugging library interactions and function call patterns.

How do I filter strace output to show only specific system calls?

Use the -e flag with a trace expression. For example, strace -e trace=open,read,write shows only file operations. Use -e trace=network for socket calls, -e trace=file for filesystem calls, or -e trace=process for fork/exec/exit calls.

strace on Linux: Debug and Trace System Calls

strace on Linux is the go-to tool for understanding what a process is doing at the kernel level. When a program fails silently, hangs, or behaves unexpectedly, strace reveals the exact system calls being made — file opens, network connections, memory allocations, and signal handling. This guide covers practical strace usage from basic tracing through production debugging techniques, with real-world scenarios that demonstrate how to diagnose the problems you actually encounter.

Prerequisites

A Linux system (any distribution — strace works on all of them)
Root or sudo access (required for tracing processes owned by other users)
Basic understanding of Linux processes and file descriptors
strace installed (covered in the installation section below)

Installing strace on Linux

strace is available in every major distribution’s package repository. It may already be installed on your system.

# Debian / Ubuntu
sudo apt install strace

# RHEL / CentOS / Fedora
sudo dnf install strace

# Arch Linux
sudo pacman -S strace

# Alpine Linux
sudo apk add strace

# Check installed version
strace --version

On minimal container images (Alpine, distroless), strace is usually missing. Install it temporarily for debugging, then remove it from production images.

Tracing a Command from Startup

The simplest use of strace is running it in front of any command:

strace ls /tmp

This prints every system call ls makes from execve() to exit_group(). The output goes to stderr, so the normal command output still appears on stdout.

Reading strace Output

Each line follows this format:

syscall_name(arguments...) = return_value

For example:

openat(AT_FDCWD, "/tmp", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 3
getdents64(3, [{d_ino=2, d_off=1, d_type=DT_DIR, d_name="."},...], 32768) = 480
close(3)                                = 0

This tells you: ls opened /tmp as file descriptor 3, read directory entries, then closed it. A return value of -1 means the call failed, and strace shows the errno:

openat(AT_FDCWD, "/etc/shadow", O_RDONLY) = -1 EACCES (Permission denied)

That single line often reveals why a program fails — no need to read source code or add debug logging.

Useful Startup Flags

# Follow child processes (fork/clone)
strace -f ./my_app

# Print timestamps for each syscall
strace -t ls /tmp

# Print relative time between syscalls (find hangs)
strace -r ls /tmp

# Print string arguments fully (default truncates at 32 chars)
strace -s 256 ./my_app

# Write output to a file instead of stderr
strace -o /tmp/trace.log ls /tmp

Attaching strace to a Running Process

You do not need to restart a process to trace it. Attach to any running process by PID:

# Find the PID
pidof nginx
# or
ps aux | grep my_app

# Attach to the process
sudo strace -p 12345

# Attach and follow all threads/children
sudo strace -fp 12345

Press Ctrl+C to detach. The traced process continues running normally — strace does not kill it on detach.

Real-World Scenario: Debugging a Hanging Application

You have a production web application that occasionally stops responding to requests. Logs show nothing. Instead of restarting blindly:

# Find the stuck worker PID
sudo strace -fp $(pidof my_app) -e trace=network,file -s 256 -o /tmp/hang_trace.log

Send a test request, then check the trace:

grep -E 'futex|poll|select|epoll_wait' /tmp/hang_trace.log | tail -20

If you see the process stuck in futex(FUTEX_WAIT) — it is waiting on a mutex lock (deadlock). If stuck in connect() or poll() with a long timeout — it is waiting on an upstream service that is not responding.

Filtering System Calls with strace

Full traces are noisy. Use -e trace= to focus on what matters:

# File operations only (open, read, write, close, stat, etc.)
strace -e trace=file ls /tmp

# Network operations only (socket, connect, send, recv, etc.)
strace -e trace=network curl https://example.com

# Process management (fork, clone, execve, wait, exit)
strace -e trace=process bash -c "ls | grep foo"

# Memory operations (mmap, mprotect, brk)
strace -e trace=memory ./my_app

# Specific syscalls by name
strace -e trace=openat,read,write cat /etc/hostname

# Negate — trace everything EXCEPT these
strace -e trace=!mmap,mprotect,brk ./my_app

Filtering by Return Value

Find only failed system calls — extremely useful for debugging:

# Show only calls that returned an error
strace -Z ./my_app

# Show only successful calls
strace -z ./my_app

The -Z flag (strace 5.2+) is a game-changer for production debugging. Instead of wading through thousands of successful calls, you see only the failures.

Performance Analysis with strace

System Call Summary

The -c flag produces a statistical summary instead of a live trace:

strace -c ls /tmp

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 25.00    0.000045           5         9           mmap
 19.44    0.000035           5         7           close
 16.67    0.000030           5         6           openat
 11.11    0.000020           4         5           fstat
  8.33    0.000015           5         3           mprotect
  5.56    0.000010           3         3           read
  ...
------ ----------- ----------- --------- --------- ----------------
100.00    0.000180           4        42         2 total

This tells you which system calls consume the most time. If read or write dominates, you have an I/O-bound process. If futex dominates, you have lock contention.

Timing Individual Calls

# Wall-clock time per syscall
strace -T ./my_app

# Combined: timestamps + duration
strace -tT ./my_app

The -T flag appends the duration in angle brackets:

openat(AT_FDCWD, "/etc/resolv.conf", O_RDONLY) = 3 <0.000024>
read(3, "nameserver 8.8.8.8\n", 4096)  = 19 <0.000011>
connect(4, {sa_family=AF_INET, sin_port=htons(443)}, 16) = -1 EINPROGRESS <0.000089>

That connect() taking 89 microseconds is fine. If you see it taking 5+ seconds, your DNS or upstream is the bottleneck.

Comparing strace with Alternative Tracing Tools

Feature	strace	ltrace	perf trace	bpftrace
Traces	Kernel syscalls	Library calls	Kernel syscalls	Kernel + userspace
Overhead	High (10-100x)	High	Low (~5%)	Very low (~2%)
Attach to running PID	Yes	Yes	Yes	Yes
Filter by syscall	Yes (-e trace=)	Yes (-e)	Yes (—filter)	Yes (custom scripts)
Statistical summary	Yes (-c)	Yes (-c)	Yes (—summary)	Custom scripts
Kernel version needed	Any	Any	3.7+	4.9+ (eBPF)
Production safe	Brief use only	Brief use only	Yes	Yes
Best for	Quick debugging	Library call issues	Low-overhead profiling	Complex tracing logic

Rule of thumb: Use strace for quick diagnosis (attach, find the problem, detach). Switch to perf trace or bpftrace for sustained production monitoring where overhead matters.

Common strace Debugging Patterns

Pattern 1: Find Why a Program Cannot Find a File

strace -e trace=openat,stat,access ./my_app 2>&1 | grep -i "no such file\|enoent\|eacces"

This instantly shows every file the program tries to open and fails. Common for configuration file issues, missing shared libraries, or wrong paths.

Pattern 2: Find What Configuration Files a Program Reads

strace -e trace=openat ./my_app 2>&1 | grep -v ENOENT | grep '= [0-9]'

This filters to only successful file opens — showing you exactly which config files, libraries, and data files the program actually uses.

Pattern 3: Debug DNS Resolution Issues

strace -e trace=network -s 256 curl https://example.com 2>&1 | grep -E 'connect|sendto|recvfrom'

You will see the DNS query going to /etc/resolv.conf nameservers, the response, and then the actual HTTPS connection. If the DNS query takes seconds, you have found your latency source.

Pattern 4: Find Why a Service Fails on Startup

sudo strace -f -o /tmp/service_trace.log systemctl start my_service
# Then search the trace for errors
grep '= -1' /tmp/service_trace.log | grep -v 'ENOENT.*locale\|ENOENT.*lib' | head -30

The grep filters out harmless “file not found” errors from locale and library probing (which are normal), leaving you with the real failures.

Pattern 5: Monitor File Writes in Real-Time

sudo strace -fp $(pidof my_app) -e trace=write -s 1024 2>&1 | grep 'write([0-9]*,'

Watch every byte the process writes to any file descriptor — useful for debugging logging issues or unexpected file modifications.

Gotchas and Edge Cases

Tracing setuid binaries: strace cannot attach to setuid programs unless you run strace itself as root. The kernel drops ptrace permissions for security.

Multi-threaded applications: Always use -f (follow forks) with multi-threaded apps. Without it, you only see the main thread’s syscalls and miss the worker threads where the actual problem lives.

Container environments: Inside Docker containers, strace requires the SYS_PTRACE capability. Run with --cap-add=SYS_PTRACE or use --privileged for debugging:

docker run --cap-add=SYS_PTRACE my_image strace ./my_app

In Kubernetes, add the capability to your pod security context:

securityContext:
  capabilities:
    add: ["SYS_PTRACE"]

Performance impact is real: strace intercepts every system call via ptrace, which requires two context switches per call. A process making 100,000 syscalls/second will be dramatically slowed. Never leave strace attached to a production process longer than needed to capture the issue.

strace -c masks individual slow calls: The -c summary shows averages. A process might make 10,000 fast read() calls and one 30-second read() — the average looks fine. Use -C (capital) to get both the summary and the live trace to catch outliers.

Troubleshooting Common strace Issues

”Operation not permitted” when attaching

# Check ptrace scope (Ubuntu/Debian)
cat /proc/sys/kernel/yama/ptrace_scope

If the value is 1 (default on Ubuntu), you can only trace your own processes. To temporarily allow tracing any process:

# Temporary — resets on reboot
sudo sysctl kernel.yama.ptrace_scope=0

# Or just use sudo with strace
sudo strace -p 12345

Trace output is overwhelming

# Combine filters: only failed file operations with timing
strace -Z -e trace=file -T -s 256 ./my_app 2>&1 | head -50

Need to trace a short-lived process

For processes that start and exit quickly (like cron jobs):

# Wrap the command
strace -f -o /tmp/cron_trace.log /path/to/cron_script.sh

# Or trace the parent that spawns it
sudo strace -fp $(pidof crond) -o /tmp/cron_trace.log

Summary

strace traces kernel system calls — use it when a process fails silently, hangs, or misbehaves and logs give no clues
Attach to running processes with strace -p PID without restarting them — add -f for multi-threaded applications
Filter with -e trace= to focus on file, network, process, or memory operations instead of drowning in noise
Use -Z to show only failed calls — the fastest way to find why something is broken
Use -c for performance summaries — identify which system calls consume the most time
Performance overhead is significant — attach briefly in production, then detach. For sustained tracing, use perf trace or bpftrace
Containers need SYS_PTRACE capability — add it explicitly in Docker or Kubernetes