Nota: Este artigo foi publicado originalmente em 2013. The regex techniques covered here are timeless and work across all modern text editors and programming languages. Exemplos have been expanded with C#, JavaScript, and Python.

One of the most powerful everyday uses of regular expressions is transforming a raw list of data into a structured set of instructions or commands. Whether you need to generate firewall rules from a list of IP addresses, create SQL statements from a CSV export, or build configuration entries from a text dump, regex find-and-replace is the fastest way to do it.

The Core Concept: Capture and Replace

The fundamental technique involves two steps:

  1. Capture the data you need using a regex pattern with capture groups (parentheses).
  2. Replace each match with a template that references the captured groups via backreferences.

For example, if you have a list of IP addresses and you want to generate IIS IP restriction rules:

Input:

192.168.1.10
10.0.0.55
172.16.0.100

Find pattern:

\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b

Replace pattern:

<add ipAddress="\1" allowed="false" />

Output:

<add ipAddress="192.168.1.10" allowed="false" />
<add ipAddress="10.0.0.55" allowed="false" />
<add ipAddress="172.16.0.100" allowed="false" />

The \1 backreference inserts whatever text was captured by the first set of parentheses in the find pattern.

Passo-by-Passo: Using Notepad++

Notepad++ is one of the most popular tools for regex-based text transformation:

  1. Open your file containing the raw list in Notepad++.
  2. Press Ctrl+H to open the Find and Replace dialog.
  3. At the bottom, select Search Mode: Regular expression.
  4. In the Find what field, enter your regex pattern with capture groups.
  5. In the Replace with field, enter your template using \1, \2, etc. for backreferences.
  6. Click Replace All.

Example: Generate DNS Records from a Hostname List

Input:

webserver01
webserver02
dbserver01
mailserver01

Find:

^(.+)$

Replace:

\1    IN  A   10.0.1.1

Output:

webserver01    IN  A   10.0.1.1
webserver02    IN  A   10.0.1.1
dbserver01    IN  A   10.0.1.1
mailserver01    IN  A   10.0.1.1

Example: Generate SQL DELETE Statements from a List of IDs

Input:

1024
1025
1037
1099

Find:

^(\d+)$

Replace:

DELETE FROM users WHERE id = \1;

Output:

DELETE FROM users WHERE id = 1024;
DELETE FROM users WHERE id = 1025;
DELETE FROM users WHERE id = 1037;
DELETE FROM users WHERE id = 1099;

Doing It Programmatically in C#

In C#, you can use Regex.Replace from the System.Text.RegularExpressions namespace to accomplish the same transformation:

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        string input = @"192.168.1.10
10.0.0.55
172.16.0.100";

        // Pattern to match IP addresses
        string pattern = @"\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b";
        string replacement = "<add ipAddress=\"$1\" allowed=\"false\" />";

        string result = Regex.Replace(input, pattern, replacement);
        Console.WriteLine(result);
    }
}

Output:

<add ipAddress="192.168.1.10" allowed="false" />
<add ipAddress="10.0.0.55" allowed="false" />
<add ipAddress="172.16.0.100" allowed="false" />

Note that in C# (and .NET generally), backreferences in the replacement string use $1, $2, etc., rather than \1.

Advanced C# Example: Parsing CSV into SQL INSERT Statements

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
        string csvData = @"John,Doe,john@example.com
Jane,Smith,jane@example.com
Bob,Johnson,bob@example.com";

        string pattern = @"^(.+?),(.+?),(.+?)$";
        string replacement = "INSERT INTO contacts (first_name, last_name, email) VALUES ('$1', '$2', '$3');";

        string result = Regex.Replace(csvData, pattern, replacement, RegexOptions.Multiline);
        Console.WriteLine(result);
    }
}

Output:

INSERT INTO contacts (first_name, last_name, email) VALUES ('John', 'Doe', 'john@example.com');
INSERT INTO contacts (first_name, last_name, email) VALUES ('Jane', 'Smith', 'jane@example.com');
INSERT INTO contacts (first_name, last_name, email) VALUES ('Bob', 'Johnson', 'bob@example.com');

Doing It in JavaScript

JavaScript’s String.prototype.replace() method supports regex with capture groups:

const input = `192.168.1.10
10.0.0.55
172.16.0.100`;

const pattern = /\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b/g;
const result = input.replace(pattern, '<add ipAddress="$1" allowed="false" />');

console.log(result);

JavaScript: Using a Callback Function for Complex Transformations

For more complex logic, you can pass a function to replace():

const hostnames = `webserver01
webserver02
dbserver01`;

let counter = 1;
const result = hostnames.replace(/^(.+)$/gm, (match, hostname) => {
    const ip = `10.0.1.${counter++}`;
    return `${hostname}    IN  A   ${ip}`;
});

console.log(result);
// webserver01    IN  A   10.0.1.1
// webserver02    IN  A   10.0.1.2
// dbserver01    IN  A   10.0.1.3

Doing It in Python

Python’s re module provides the same functionality:

import re

input_text = """192.168.1.10
10.0.0.55
172.16.0.100"""

pattern = r'\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\b'
replacement = r'<add ipAddress="\1" allowed="false" />'

result = re.sub(pattern, replacement, input_text)
print(result)

Python: Batch Processing a File

A common real-world scenario is reading a file, transforming each line, and writing the output:

import re

def generate_firewall_rules(input_file, output_file):
    ip_pattern = re.compile(r'^(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})$')

    with open(input_file, 'r') as infile, open(output_file, 'w') as outfile:
        for line in infile:
            line = line.strip()
            match = ip_pattern.match(line)
            if match:
                ip = match.group(1)
                outfile.write(f'iptables -A INPUT -s {ip} -j DROP\n')

generate_firewall_rules('blocked_ips.txt', 'firewall_rules.sh')

Common Regex Patterns for List Transformation

Here is a reference table of useful patterns you can adapt for various transformation tasks:

Data TypeRegex PatternDescription
IP Address\b(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\bMatches IPv4 addresses
Email Address([\w.+-]+@[\w-]+\.[\w.]+)Matches common email formats
Integer ID^(\d+)$Matches a line containing only digits
Hostname^([a-zA-Z0-9._-]+)$Matches hostnames
CSV (3 columns)^(.+?),(.+?),(.+?)$Captures three comma-separated fields
Key=Value^(\w+)=(.+)$Captures key-value pairs
URL(https?://[^\s]+)Matches HTTP/HTTPS URLs
GUID([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})Matches standard GUIDs

Backreference Syntax by Tool and Language

Different tools and languages use different syntax for backreferences in replacement strings:

Tool / LanguageBackreference SyntaxExample
Notepad++\1, \2\1 for first group
Visual Studio Code$1, $2$1 for first group
C# (.NET)$1, $2Regex.Replace(input, pattern, "$1")
JavaScript$1, $2str.replace(pattern, "$1")
Python\1, \2 or \g<1>re.sub(pattern, r'\1', input)
sed (Linux)\1, \2sed 's/pattern/\1/'
PowerShell$1, $2-replace 'pattern', '$1'

Practical Use Cases

1. Generating Firewall Rules

Transform a list of malicious IPs into iptables rules:

# Using sed on the command line
sed 's/^\(.*\)$/iptables -A INPUT -s \1 -j DROP/' blocked_ips.txt > rules.sh

2. Creating Bulk User Accounts

Transform a CSV of user data into PowerShell commands:

Find: ^(.+?),(.+?),(.+?)$ Replace: New-ADUser -SamAccountName "$1" -Name "$2" -EmailAddress "$3" -Enabled $true

3. Building HTML from a List

Transform a plain list into an HTML unordered list:

Find: ^(.+)$ Replace: <li>\1</li>

Then wrap the result in <ul>...</ul> tags.

4. Converting Log Entries into CSV

Input:

[2024-01-15 08:30:22] ERROR: Connection timeout for host db01
[2024-01-15 08:31:45] WARN: Slow query detected on host web03

Find: ^\[(.+?)\] (\w+): (.+)$ Replace: "\1","\2","\3"

Tips and Melhores Práticas

  1. Always test on a small sample first. A wrong regex applied to thousands of lines can produce garbage output.
  2. Use non-greedy quantifiers (.+? instead of .+) when matching delimited fields to avoid over-matching.
  3. Enable multiline mode (RegexOptions.Multiline in C#, re.MULTILINE in Python, m flag in JavaScript) when your pattern uses ^ and $ anchors on multi-line input.
  4. Escape special characters in your replacement string if the output contains characters like $, \, or { that have special meaning.
  5. Back up your data before running a Replace All operation in a text editor.

Resumo

Regular expressions combined with find-and-replace give you a rapid, flexible way to transform raw lists into structured instructions, commands, configuration entries, or code. The technique works identically across text editors like Notepad++ and VS Code and programming languages like C#, JavaScript, and Python. By mastering capture groups and backreferences, you can automate tedious text manipulation tasks that would otherwise take hours of manual editing.

Artigos Relacionados