Python Match.span Method
last modified April 20, 2025
Introduction to Match.span
The Match.span method is part of Python's re module.
It returns a tuple containing the start and end positions of a match.
This method is available on match objects returned by regex operations. It provides precise location information about where matches occur in text.
Understanding span is crucial for text processing tasks that
require position information, such as highlighting or extracting substrings.
Basic Syntax
The syntax for Match.span is straightforward:
Match.span(group=0)
The optional group parameter specifies which capture group to
return positions for. Default is 0 (the entire match).
Basic Match Position Retrieval
Let's start with a simple example of finding a word's position in text.
#!/usr/bin/python
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = re.compile(r'fox')
match = pattern.search(text)
if match:
start, end = match.span()
print(f"Found 'fox' from position {start} to {end}")
print(f"Matched text: '{text[start:end]}'")
This example shows how to get the start and end positions of a match. The span is used to extract the matched substring from the original text.
start, end = match.span()
The span method returns a tuple with two integers.
The first is the start index, the second is the end index.
text[start:end]
Using the span indices, we can slice the original string to get exactly
the matched portion. This is more efficient than using group.
Span with Capture Groups
The span method can also return positions for specific groups.
#!/usr/bin/python
import re
text = "Date: 2023-12-25"
pattern = re.compile(r'(\d{4})-(\d{2})-(\d{2})')
match = pattern.search(text)
if match:
print(f"Full match span: {match.span()}")
print(f"Year span: {match.span(1)}")
print(f"Month span: {match.span(2)}")
print(f"Day span: {match.span(3)}")
This demonstrates getting spans for different capture groups. Group 0 is always the entire match, while groups 1+ are captures.
Multiple Matches with finditer
When processing multiple matches, span helps locate each one.
#!/usr/bin/python
import re
text = "cat bat hat mat"
pattern = re.compile(r'[a-z]at')
for match in pattern.finditer(text):
start, end = match.span()
word = text[start:end]
print(f"Found '{word}' at positions {start}-{end}")
The finditer method returns match objects for all occurrences.
We use span to get each match's position in the original text.
Span with Named Groups
Named groups make span positions more readable and maintainable.
#!/usr/bin/python
import re
text = "John Doe, age 30"
pattern = re.compile(r'(?P<first>\w+) (?P<last>\w+), age (?P<age>\d+)')
match = pattern.search(text)
if match:
print(f"Name span: {match.span('first')} to {match.span('last')}")
print(f"Age span: {match.span('age')}")
Named groups allow accessing spans by meaningful names instead of numbers. This makes code more self-documenting and less prone to errors.
Span in Replacement Operations
Span information can guide complex text replacement operations.
#!/usr/bin/python
import re
text = "Error 404: Not Found; Error 500: Server Error"
pattern = re.compile(r'Error (\d{3}): ([A-Za-z ]+)')
def replace_error(match):
code_span = match.span(1)
desc_span = match.span(2)
code = match.group(1)
desc = match.group(2).lower()
return f"Code {code} ({desc})"
result = pattern.sub(replace_error, text)
print(result)
This example uses span information within a replacement function. The spans help understand the structure of each match for processing.
Span with Overlapping Matches
The span method helps identify overlapping matches.
#!/usr/bin/python
import re
text = "ababababab"
pattern = re.compile(r'(?=(abab))')
for i, match in enumerate(pattern.finditer(text), 1):
start, end = match.span(1)
print(f"Match {i}: '{match.group(1)}' at {start}-{end}")
This finds all overlapping occurrences of 'abab' in the text.
The lookahead pattern with span captures each position.
Span in Multiline Text
Handling multiline text requires understanding how span counts positions.
#!/usr/bin/python
import re
text = """First line
Second line
Third line"""
pattern = re.compile(r'^(\w+)', re.MULTILINE)
for match in pattern.finditer(text):
start, end = match.span()
line = text[start:end]
print(f"Found '{line}' at positions {start}-{end}")
The re.MULTILINE flag makes ^ match line starts.
Span positions are counted across the entire string, including newlines.
Best Practices
When using Match.span, follow these best practices:
- Always check if a match was found before calling
span - Use named groups for better readability with span positions
- Remember span indices follow Python's slicing rules (end is exclusive)
- Combine with
groupwhen you need both text and position - Handle Unicode characters carefully as they may affect position counts
Performance Considerations
The span method is highly optimized and has minimal overhead.
It's more efficient than using start and end separately.
For very large texts, be mindful that span positions are byte offsets. With Unicode, character counts may differ from byte positions.
Source
Python Match.span() documentation
This tutorial covered the essential aspects of Python's Match.span
method. Mastering position retrieval will enhance your text processing capabilities.
Author
List all Python tutorials.