Nota: Este artículo fue publicado originalmente en 2013. The regex techniques covered here are timeless and work across all modern text editors and programming languages. Ejemplos have been expanded with C#, JavaScript, and Python.
One of the most powerful everyday uses of regular expressions is transforming a raw list of data into a structured set of instructions or commands. Whether you need to generate firewall rules from a list of IP addresses, create SQL statements from a CSV export, or build configuration entries from a text dump, regex find-and-replace is the fastest way to do it.
The Core Concept: Capture and Replace
The fundamental technique involves two steps:
- Capture the data you need using a regex pattern with capture groups (parentheses).
- Replace each match with a template that references the captured groups via backreferences.
For example, if you have a list of IP addresses and you want to generate IIS IP restriction rules:
Input:
192.168.1.10
10.0.0.55
172.16.0.100
Find pattern:
\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b
Replace pattern:
<add ipAddress="\1" allowed="false" />
Output:
<add ipAddress="192.168.1.10" allowed="false" />
<add ipAddress="10.0.0.55" allowed="false" />
<add ipAddress="172.16.0.100" allowed="false" />
The \1 backreference inserts whatever text was captured by the first set of parentheses in the find pattern.
Paso-by-Paso: Using Notepad++
Notepad++ is one of the most popular tools for regex-based text transformation:
- Open your file containing the raw list in Notepad++.
- Press Ctrl+H to open the Find and Replace dialog.
- At the bottom, select Search Mode: Regular expression.
- In the Find what field, enter your regex pattern with capture groups.
- In the Replace with field, enter your template using
\1,\2, etc. for backreferences. - Click Replace All.
Example: Generate DNS Records from a Hostname List
Input:
webserver01
webserver02
dbserver01
mailserver01
Find:
^(.+)$
Replace:
\1 IN A 10.0.1.1
Output:
webserver01 IN A 10.0.1.1
webserver02 IN A 10.0.1.1
dbserver01 IN A 10.0.1.1
mailserver01 IN A 10.0.1.1
Example: Generate SQL DELETE Statements from a List of IDs
Input:
1024
1025
1037
1099
Find:
^(\d+)$
Replace:
DELETE FROM users WHERE id = \1;
Output:
DELETE FROM users WHERE id = 1024;
DELETE FROM users WHERE id = 1025;
DELETE FROM users WHERE id = 1037;
DELETE FROM users WHERE id = 1099;
Doing It Programmatically in C#
In C#, you can use Regex.Replace from the System.Text.RegularExpressions namespace to accomplish the same transformation:
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string input = @"192.168.1.10
10.0.0.55
172.16.0.100";
// Pattern to match IP addresses
string pattern = @"\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b";
string replacement = "<add ipAddress=\"$1\" allowed=\"false\" />";
string result = Regex.Replace(input, pattern, replacement);
Console.WriteLine(result);
}
}
Output:
<add ipAddress="192.168.1.10" allowed="false" />
<add ipAddress="10.0.0.55" allowed="false" />
<add ipAddress="172.16.0.100" allowed="false" />
Note that in C# (and .NET generally), backreferences in the replacement string use $1, $2, etc., rather than \1.
Advanced C# Example: Parsing CSV into SQL INSERT Statements
using System;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
string csvData = @"John,Doe,john@example.com
Jane,Smith,jane@example.com
Bob,Johnson,bob@example.com";
string pattern = @"^(.+?),(.+?),(.+?)$";
string replacement = "INSERT INTO contacts (first_name, last_name, email) VALUES ('$1', '$2', '$3');";
string result = Regex.Replace(csvData, pattern, replacement, RegexOptions.Multiline);
Console.WriteLine(result);
}
}
Output:
INSERT INTO contacts (first_name, last_name, email) VALUES ('John', 'Doe', 'john@example.com');
INSERT INTO contacts (first_name, last_name, email) VALUES ('Jane', 'Smith', 'jane@example.com');
INSERT INTO contacts (first_name, last_name, email) VALUES ('Bob', 'Johnson', 'bob@example.com');
Doing It in JavaScript
JavaScript’s String.prototype.replace() method supports regex with capture groups:
const input = `192.168.1.10
10.0.0.55
172.16.0.100`;
const pattern = /\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/g;
const result = input.replace(pattern, '<add ipAddress="$1" allowed="false" />');
console.log(result);
JavaScript: Using a Callback Function for Complex Transformations
For more complex logic, you can pass a function to replace():
const hostnames = `webserver01
webserver02
dbserver01`;
let counter = 1;
const result = hostnames.replace(/^(.+)$/gm, (match, hostname) => {
const ip = `10.0.1.${counter++}`;
return `${hostname} IN A ${ip}`;
});
console.log(result);
// webserver01 IN A 10.0.1.1
// webserver02 IN A 10.0.1.2
// dbserver01 IN A 10.0.1.3
Doing It in Python
Python’s re module provides the same functionality:
import re
input_text = """192.168.1.10
10.0.0.55
172.16.0.100"""
pattern = r'\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b'
replacement = r'<add ipAddress="\1" allowed="false" />'
result = re.sub(pattern, replacement, input_text)
print(result)
Python: Batch Processing a File
A common real-world scenario is reading a file, transforming each line, and writing the output:
import re
def generate_firewall_rules(input_file, output_file):
ip_pattern = re.compile(r'^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})$')
with open(input_file, 'r') as infile, open(output_file, 'w') as outfile:
for line in infile:
line = line.strip()
match = ip_pattern.match(line)
if match:
ip = match.group(1)
outfile.write(f'iptables -A INPUT -s {ip} -j DROP\n')
generate_firewall_rules('blocked_ips.txt', 'firewall_rules.sh')
Common Regex Patterns for List Transformation
Here is a reference table of useful patterns you can adapt for various transformation tasks:
| Data Type | Regex Pattern | Description |
|---|---|---|
| IP Address | \b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b | Matches IPv4 addresses |
| Email Address | ([\w.+-]+@[\w-]+\.[\w.]+) | Matches common email formats |
| Integer ID | ^(\d+)$ | Matches a line containing only digits |
| Hostname | ^([a-zA-Z0-9._-]+)$ | Matches hostnames |
| CSV (3 columns) | ^(.+?),(.+?),(.+?)$ | Captures three comma-separated fields |
| Key=Value | ^(\w+)=(.+)$ | Captures key-value pairs |
| URL | (https?://[^\s]+) | Matches HTTP/HTTPS URLs |
| GUID | ([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}) | Matches standard GUIDs |
Backreference Syntax by Tool and Language
Different tools and languages use different syntax for backreferences in replacement strings:
| Tool / Language | Backreference Syntax | Example |
|---|---|---|
| Notepad++ | \1, \2 | \1 for first group |
| Visual Studio Code | $1, $2 | $1 for first group |
| C# (.NET) | $1, $2 | Regex.Replace(input, pattern, "$1") |
| JavaScript | $1, $2 | str.replace(pattern, "$1") |
| Python | \1, \2 or \g<1> | re.sub(pattern, r'\1', input) |
| sed (Linux) | \1, \2 | sed 's/pattern/\1/' |
| PowerShell | $1, $2 | -replace 'pattern', '$1' |
Practical Use Cases
1. Generating Firewall Rules
Transform a list of malicious IPs into iptables rules:
# Using sed on the command line
sed 's/^\(.*\)$/iptables -A INPUT -s \1 -j DROP/' blocked_ips.txt > rules.sh
2. Creating Bulk User Accounts
Transform a CSV of user data into PowerShell commands:
Find: ^(.+?),(.+?),(.+?)$
Replace: New-ADUser -SamAccountName "$1" -Name "$2" -EmailAddress "$3" -Enabled $true
3. Building HTML from a List
Transform a plain list into an HTML unordered list:
Find: ^(.+)$
Replace: <li>\1</li>
Then wrap the result in <ul>...</ul> tags.
4. Converting Log Entries into CSV
Input:
[2024-01-15 08:30:22] ERROR: Connection timeout for host db01
[2024-01-15 08:31:45] WARN: Slow query detected on host web03
Find: ^\[(.+?)\] (\w+): (.+)$
Replace: "\1","\2","\3"
Tips and Mejores Prácticas
- Always test on a small sample first. A wrong regex applied to thousands of lines can produce garbage output.
- Use non-greedy quantifiers (
.+?instead of.+) when matching delimited fields to avoid over-matching. - Enable multiline mode (
RegexOptions.Multilinein C#,re.MULTILINEin Python,mflag in JavaScript) when your pattern uses^and$anchors on multi-line input. - Escape special characters in your replacement string if the output contains characters like
$,\, or{that have special meaning. - Back up your data before running a Replace All operation in a text editor.
Resumen
Regular expressions combined with find-and-replace give you a rapid, flexible way to transform raw lists into structured instructions, commands, configuration entries, or code. The technique works identically across text editors like Notepad++ and VS Code and programming languages like C#, JavaScript, and Python. By mastering capture groups and backreferences, you can automate tedious text manipulation tasks that would otherwise take hours of manual editing.