Python re.sub() Function
last modified April 20, 2025
Introduction to re.sub
The re.sub
function is a powerful tool in Python's re
module for performing substitutions using regular expressions. It searches
for patterns in strings and replaces them with specified text.
This function is essential for text processing tasks like data cleaning, formatting, and transformation. It offers more flexibility than simple string replacement methods.
re.sub
can use both literal replacements and callbacks for
dynamic substitutions. It supports flags to modify matching behavior and
can reference matched groups in replacements.
Basic Syntax
The basic syntax of re.sub
has three required parameters:
re.sub(pattern, repl, string, count=0, flags=0)
pattern
is the regex to match. repl
is the
replacement string. string
is the input text to process.
Optional count
limits replacements. flags
modify matching behavior. The function returns the modified string.
Basic Text Replacement
Let's start with a simple example replacing colors in a sentence.
#!/usr/bin/python import re text = "The sky is blue and the grass is green" result = re.sub(r'blue', 'gray', text) print(result)
This replaces all occurrences of 'blue' with 'gray' in the input text. The replacement is case-sensitive by default.
result = re.sub(r'blue', 'gray', text)
The first argument is the pattern to match. The second is the replacement string. The third is the text to process.
Using Regular Expression Patterns
re.sub
shines when using regex patterns for matching.
#!/usr/bin/python import re text = "Order 12345 shipped, Order 67890 processing" result = re.sub(r'Order \d+', 'Order XXXX', text) print(result)
This replaces all order numbers with 'XXXX'. The \d+
pattern
matches one or more digits.
The example demonstrates how regex patterns can match variable text for consistent replacements.
Referencing Matched Groups
We can reference matched groups in the replacement string.
#!/usr/bin/python import re text = "2023-04-20" result = re.sub(r'(\d{4})-(\d{2})-(\d{2})', r'\2/\3/\1', text) print(result)
This reformats a date from YYYY-MM-DD to MM/DD/YYYY. Parentheses create
capture groups referenced as \1
, \2
, etc.
r'(\d{4})-(\d{2})-(\d{2})'
The pattern captures year, month, and day as separate groups. Each
\d
matches a digit, with {n}
specifying quantity.
Using a Replacement Function
For dynamic replacements, we can use a callback function.
#!/usr/bin/python import re def double_match(match): return str(int(match.group()) * 2) text = "Scores: 10, 20, 30" result = re.sub(r'\d+', double_match, text) print(result)
This doubles all numbers in the text. The callback receives a match object and returns the replacement string.
The function approach enables complex transformations based on matched content. It's more flexible than static replacement strings.
Limiting Replacements with Count
The count
parameter limits how many substitutions occur.
#!/usr/bin/python import re text = "apple apple apple apple" result = re.sub(r'apple', 'orange', text, count=2) print(result)
This replaces only the first two occurrences of 'apple'. The remaining matches stay unchanged.
Controlling replacement count is useful when you want partial substitutions or to process only certain matches.
Case-Insensitive Replacement
Flags like re.IGNORECASE
modify matching behavior.
#!/usr/bin/python import re text = "Python is GREAT, really great!" result = re.sub(r'great', 'awesome', text, flags=re.IGNORECASE) print(result)
This replaces all case variants of 'great' with 'awesome'. The flag makes the match case-insensitive.
Flags can be combined using bitwise OR (|) when multiple behaviors are needed simultaneously.
Advanced: Swapping Words
Here's a more complex example swapping word positions.
#!/usr/bin/python import re text = "John Doe, Jane Smith, Mike Johnson" result = re.sub(r'(\w+) (\w+)', r'\2, \1', text) print(result)
This swaps first and last names, adding a comma between them. The
\w+
pattern matches word characters.
The example shows how regex groups can restructure text in powerful ways. This technique is useful for data reformatting.
Best Practices
When using re.sub
, consider these best practices:
- Use raw strings (
r''
) for patterns to avoid escaping issues - Pre-compile patterns with
re.compile
if reused frequently - Be specific with patterns to avoid unintended matches
- Use callback functions for complex replacement logic
- Test patterns thoroughly with various input cases
Performance Considerations
re.sub
performance depends on pattern complexity and input
size. Simple patterns on small texts are fast, while complex patterns
on large texts may need optimization.
For repeated substitutions, pre-compiling the pattern with
re.compile
improves performance. Avoid unnecessary
capturing groups when possible.
Source
This tutorial covered the essential aspects of Python's re.sub
function. Mastering pattern substitution will enhance your text processing
capabilities significantly.
Author
List all Python tutorials.