grep is the backbone of text search on Linux. Whether you need to find a string inside a log file, filter pipeline output, or hunt for a pattern across thousands of source files, mastering grep and regular expressions is one of the most valuable skills a Linux user or sysadmin can have. This guide covers everything from basic usage to advanced regex patterns, real-world log filtering scenarios, and a practical comparison of grep, ripgrep, and ack.

Prerequisites

  • A Linux system (Ubuntu, Debian, CentOS, Arch, or similar)
  • Basic terminal familiarity (see Use the Terminal on Ubuntu)
  • grep pre-installed (it is on every Linux distribution by default)
  • Optional: ripgrep (apt install ripgrep or dnf install ripgrep) and ack (apt install ack)

Basic grep Usage

grep reads one or more files (or standard input) and prints lines that match a pattern. The simplest form is:

grep 'pattern' filename

Key flags you will use every day:

FlagMeaning
-iCase-insensitive match
-nShow line numbers
-cCount matching lines
-lPrint only filenames that match
-LPrint filenames with NO match
-vInvert — print non-matching lines
-wMatch whole words only
-rRecurse into directories
-A NShow N lines after match
-B NShow N lines before match
-C NShow N lines before and after
--colorHighlight match in output

Examples:

# Find all lines with "error" (case-insensitive)
grep -i 'error' /var/log/syslog

# Show line numbers in the result
grep -n 'Failed password' /var/log/auth.log

# Count how many times a pattern appears
grep -c 'GET /api' access.log

# Search recursively in all .py files
grep -r 'import os' --include='*.py' ./project

Basic Regular Expressions (BRE)

By default grep uses Basic Regular Expressions. The essential metacharacters:

MetacharMeaningExample
.Any single charactergr.p matches grep, grip, gr p
*Zero or more of previousgo*d matches gd, god, good
^Start of line^ERROR matches lines starting with ERROR
$End of line\.log$ matches lines ending with .log
\bWord boundary\bfail\b matches fail but not failure
[ ]Character class[aeiou] matches any vowel
[^ ]Negated class[^0-9] matches any non-digit
\{n,m\}Repetition range[0-9]\{2,4\} matches 2–4 digits
# Lines starting with a date stamp (e.g., 2026-02-21)
grep '^[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}' app.log

# Lines ending with a connection refused message
grep 'refused$' /var/log/nginx/error.log

Extended Regular Expressions (ERE) with grep -E

Extended regex (ERE) drops the backslashes from repetition and grouping operators, making patterns more readable. Use grep -E or the egrep alias:

grep -E 'pattern' file

ERE additions over BRE:

MetacharMeaningBRE equivalent
+One or more\+
?Zero or one\?
``Alternation (OR)
( )Grouping\( \)

Practical ERE examples:

# Match ERROR or WARN or CRITICAL
grep -E 'ERROR|WARN|CRITICAL' /var/log/app.log

# Match IP addresses (simplified)
grep -E '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' access.log

# Match lines with an HTTP 4xx or 5xx status code
grep -E ' [45][0-9]{2} ' access.log

# Extract email addresses from a file
grep -Eo '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' contacts.txt

The -o flag (only) prints just the matched portion, not the entire line — essential for extracting data.

Filtering Log Output in Real Time

Combine grep with tail -f to monitor logs live:

# Follow syslog and show only error lines
tail -f /var/log/syslog | grep -i 'error'

# Show nginx errors but exclude healthcheck noise
tail -f /var/log/nginx/access.log | grep -v '/health'

# Highlight multiple patterns simultaneously
tail -f /var/log/app.log | grep --color -E 'ERROR|WARN|INFO'

Pipe chains are grep’s strongest use case for ops work — you can compose complex filters without touching the log file:

# Count failed SSH logins per IP (last 1000 lines)
tail -1000 /var/log/auth.log \
  | grep 'Failed password' \
  | grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' \
  | sort | uniq -c | sort -rn

Comparison: grep vs ripgrep vs ack

Featuregrepripgrep (rg)ack
Built-in on LinuxYesNoNo
Speed on large treesGoodExcellentGood
Respects .gitignoreNoYesPartial
File type filters--include-t py--python
Default colorWith --colorYesYes
Binary file handlingSkips or errorsSmart skipSkip
Config fileNo.ripgreprc~/.ackrc
Best forSystem logs, scriptsCode searchCode search

When to use grep: system log analysis, shell scripts, one-off searches on servers where you only have standard tools.

When to use ripgrep: searching codebases, CI pipelines, anywhere speed matters. It parallelizes file reading and uses SIMD for pattern matching.

When to use ack: legacy environments or teams already using ack; otherwise ripgrep is the better choice for new setups.

# grep — explicit include filter
grep -r 'TODO' --include='*.js' ./src

# ripgrep — same, with file type shorthand
rg 'TODO' -t js ./src

# ack — type-aware by default
ack --js 'TODO' ./src

Praxisbeispiel — Production Server Log Triage

You have a production web server generating 50 GB of logs per day. After an incident alert, you need to find all requests that returned 500 errors between 14:00 and 15:00, identify the slowest ones, and extract unique client IPs.

# Step 1 — isolate the time window
grep '21/Feb/2026:1[4-5]:' /var/log/nginx/access.log > /tmp/window.log

# Step 2 — filter to 500 errors only
grep -E ' 500 ' /tmp/window.log > /tmp/errors_500.log

echo "Total 500 errors in window: $(wc -l < /tmp/errors_500.log)"

# Step 3 — extract and rank client IPs
grep -Eo '^[0-9.]+ ' /tmp/errors_500.log \
  | sort | uniq -c | sort -rn | head -20

# Step 4 — find requests with the highest response time (last field)
sort -t' ' -k NF -rn /tmp/errors_500.log | head -10

This multi-step pipeline goes from 50 GB down to actionable data in seconds — no database, no special tools, just grep and standard Unix utilities.

Gotchas and Edge Cases

Grepping binary files: grep will print “Binary file matches” and skip output. Force text mode with -a (--text) or use strings first.

Special characters in patterns: If your search term contains ., *, [, or \, either escape them with \ or use grep -F (fixed string, no regex). grep -F '1.2.3.4' finds literal dots, not “any character”.

Newline handling: grep is line-oriented — it cannot match patterns that span multiple lines by default. Use pcregrep -M or awk for multiline matching (see sed and awk Text Processing Guide).

Performance on huge files: grep is single-threaded. For files over a few GB, consider ripgrep (parallel) or mmap-based tools. On compressed logs, use zgrep or zcat file.gz | grep.

Locale issues: On some systems [a-z] includes accented characters depending on the LC_ALL locale. Use LC_ALL=C grep for predictable ASCII-only behavior.

Anchoring vs whole-word: ^error only matches lines starting with “error”. -w matches whole words anywhere on the line. Know which you need.

Troubleshooting

grep returns exit code 1 (no match) in scripts: This is expected behavior — grep exits 1 when no lines match, which causes set -e scripts to abort. Use grep ... || true or check [ $? -ne 2 ] to distinguish “no match” from errors.

Pattern not matching despite visible text: Check for carriage returns (\r) in files from Windows. Run grep -P '\r' (Perl regex) to detect them, then dos2unix to strip them.

“Argument list too long” error: When using grep pattern * with thousands of files, the shell expands * into too many arguments. Use grep -r pattern . instead.

Slow recursive search: Add --exclude-dir=.git (or use ripgrep which does this by default) to avoid crawling the .git directory.

Summary

  • grep searches files and stdin for lines matching a pattern; -i, -n, -r, -v are your most-used flags
  • Basic regex (BRE) is the default; use grep -E for extended regex with +, ?, and |
  • -o extracts only the matched portion — essential for data extraction from logs
  • Pipe grep into tail -f for real-time log monitoring
  • Use grep -F for literal string searches to avoid regex metacharacter surprises
  • ripgrep is faster and smarter for code search; grep remains king for server log work and scripts
  • Exit code 1 means “no match” — handle it explicitly in shell scripts to avoid false failures