grep is the backbone of text search on Linux. Whether you need to find a string inside a log file, filter pipeline output, or hunt for a pattern across thousands of source files, mastering grep and regular expressions is one of the most valuable skills a Linux user or sysadmin can have. This guide covers everything from basic usage to advanced regex patterns, real-world log filtering scenarios, and a practical comparison of grep, ripgrep, and ack.
Prerequisites
- A Linux system (Ubuntu, Debian, CentOS, Arch, or similar)
- Basic terminal familiarity (see Use the Terminal on Ubuntu)
- grep pre-installed (it is on every Linux distribution by default)
- Optional: ripgrep (
apt install ripgrepordnf install ripgrep) and ack (apt install ack)
Basic grep Usage
grep reads one or more files (or standard input) and prints lines that match a pattern. The simplest form is:
grep 'pattern' filename
Key flags you will use every day:
| Flag | Meaning |
|---|---|
-i | Case-insensitive match |
-n | Show line numbers |
-c | Count matching lines |
-l | Print only filenames that match |
-L | Print filenames with NO match |
-v | Invert — print non-matching lines |
-w | Match whole words only |
-r | Recurse into directories |
-A N | Show N lines after match |
-B N | Show N lines before match |
-C N | Show N lines before and after |
--color | Highlight match in output |
Examples:
# Find all lines with "error" (case-insensitive)
grep -i 'error' /var/log/syslog
# Show line numbers in the result
grep -n 'Failed password' /var/log/auth.log
# Count how many times a pattern appears
grep -c 'GET /api' access.log
# Search recursively in all .py files
grep -r 'import os' --include='*.py' ./project
Basic Regular Expressions (BRE)
By default grep uses Basic Regular Expressions. The essential metacharacters:
| Metachar | Meaning | Example |
|---|---|---|
. | Any single character | gr.p matches grep, grip, gr p |
* | Zero or more of previous | go*d matches gd, god, good |
^ | Start of line | ^ERROR matches lines starting with ERROR |
$ | End of line | \.log$ matches lines ending with .log |
\b | Word boundary | \bfail\b matches fail but not failure |
[ ] | Character class | [aeiou] matches any vowel |
[^ ] | Negated class | [^0-9] matches any non-digit |
\{n,m\} | Repetition range | [0-9]\{2,4\} matches 2–4 digits |
# Lines starting with a date stamp (e.g., 2026-02-21)
grep '^[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}' app.log
# Lines ending with a connection refused message
grep 'refused$' /var/log/nginx/error.log
Extended Regular Expressions (ERE) with grep -E
Extended regex (ERE) drops the backslashes from repetition and grouping operators, making patterns more readable. Use grep -E or the egrep alias:
grep -E 'pattern' file
ERE additions over BRE:
| Metachar | Meaning | BRE equivalent |
|---|---|---|
+ | One or more | \+ |
? | Zero or one | \? |
| ` | ` | Alternation (OR) |
( ) | Grouping | \( \) |
Practical ERE examples:
# Match ERROR or WARN or CRITICAL
grep -E 'ERROR|WARN|CRITICAL' /var/log/app.log
# Match IP addresses (simplified)
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log
# Match lines with an HTTP 4xx or 5xx status code
grep -E ' [45][0-9]{2} ' access.log
# Extract email addresses from a file
grep -Eo '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt
The -o flag (only) prints just the matched portion, not the entire line — essential for extracting data.
Filtering Log Output in Real Time
Combine grep with tail -f to monitor logs live:
# Follow syslog and show only error lines
tail -f /var/log/syslog | grep -i 'error'
# Show nginx errors but exclude healthcheck noise
tail -f /var/log/nginx/access.log | grep -v '/health'
# Highlight multiple patterns simultaneously
tail -f /var/log/app.log | grep --color -E 'ERROR|WARN|INFO'
Pipe chains are grep’s strongest use case for ops work — you can compose complex filters without touching the log file:
# Count failed SSH logins per IP (last 1000 lines)
tail -1000 /var/log/auth.log \
| grep 'Failed password' \
| grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' \
| sort | uniq -c | sort -rn
Comparison: grep vs ripgrep vs ack
| Feature | grep | ripgrep (rg) | ack |
|---|---|---|---|
| Built-in on Linux | Yes | No | No |
| Speed on large trees | Good | Excellent | Good |
| Respects .gitignore | No | Yes | Partial |
| File type filters | --include | -t py | --python |
| Default color | With --color | Yes | Yes |
| Binary file handling | Skips or errors | Smart skip | Skip |
| Config file | No | .ripgreprc | ~/.ackrc |
| Best for | System logs, scripts | Code search | Code search |
When to use grep: system log analysis, shell scripts, one-off searches on servers where you only have standard tools.
When to use ripgrep: searching codebases, CI pipelines, anywhere speed matters. It parallelizes file reading and uses SIMD for pattern matching.
When to use ack: legacy environments or teams already using ack; otherwise ripgrep is the better choice for new setups.
# grep — explicit include filter
grep -r 'TODO' --include='*.js' ./src
# ripgrep — same, with file type shorthand
rg 'TODO' -t js ./src
# ack — type-aware by default
ack --js 'TODO' ./src
Praxisbeispiel — Production Server Log Triage
You have a production web server generating 50 GB of logs per day. After an incident alert, you need to find all requests that returned 500 errors between 14:00 and 15:00, identify the slowest ones, and extract unique client IPs.
# Step 1 — isolate the time window
grep '21/Feb/2026:1[4-5]:' /var/log/nginx/access.log > /tmp/window.log
# Step 2 — filter to 500 errors only
grep -E ' 500 ' /tmp/window.log > /tmp/errors_500.log
echo "Total 500 errors in window: $(wc -l < /tmp/errors_500.log)"
# Step 3 — extract and rank client IPs
grep -Eo '^[0-9.]+ ' /tmp/errors_500.log \
| sort | uniq -c | sort -rn | head -20
# Step 4 — find requests with the highest response time (last field)
sort -t' ' -k NF -rn /tmp/errors_500.log | head -10
This multi-step pipeline goes from 50 GB down to actionable data in seconds — no database, no special tools, just grep and standard Unix utilities.
Gotchas and Edge Cases
Grepping binary files: grep will print “Binary file matches” and skip output. Force text mode with -a (--text) or use strings first.
Special characters in patterns: If your search term contains ., *, [, or \, either escape them with \ or use grep -F (fixed string, no regex). grep -F '1.2.3.4' finds literal dots, not “any character”.
Newline handling: grep is line-oriented — it cannot match patterns that span multiple lines by default. Use pcregrep -M or awk for multiline matching (see sed and awk Text Processing Guide).
Performance on huge files: grep is single-threaded. For files over a few GB, consider ripgrep (parallel) or mmap-based tools. On compressed logs, use zgrep or zcat file.gz | grep.
Locale issues: On some systems [a-z] includes accented characters depending on the LC_ALL locale. Use LC_ALL=C grep for predictable ASCII-only behavior.
Anchoring vs whole-word: ^error only matches lines starting with “error”. -w matches whole words anywhere on the line. Know which you need.
Troubleshooting
grep returns exit code 1 (no match) in scripts: This is expected behavior — grep exits 1 when no lines match, which causes set -e scripts to abort. Use grep ... || true or check [ $? -ne 2 ] to distinguish “no match” from errors.
Pattern not matching despite visible text: Check for carriage returns (\r) in files from Windows. Run grep -P '\r' (Perl regex) to detect them, then dos2unix to strip them.
“Argument list too long” error: When using grep pattern * with thousands of files, the shell expands * into too many arguments. Use grep -r pattern . instead.
Slow recursive search: Add --exclude-dir=.git (or use ripgrep which does this by default) to avoid crawling the .git directory.
Summary
- grep searches files and stdin for lines matching a pattern;
-i,-n,-r,-vare your most-used flags - Basic regex (BRE) is the default; use
grep -Efor extended regex with+,?, and| -oextracts only the matched portion — essential for data extraction from logs- Pipe grep into
tail -ffor real-time log monitoring - Use
grep -Ffor literal string searches to avoid regex metacharacter surprises - ripgrep is faster and smarter for code search; grep remains king for server log work and scripts
- Exit code 1 means “no match” — handle it explicitly in shell scripts to avoid false failures