Python Match.span Method
last modified April 20, 2025
Introduction to Match.span
The Match.span
method is part of Python's re
module.
It returns a tuple containing the start and end positions of a match.
This method is available on match objects returned by regex operations. It provides precise location information about where matches occur in text.
Understanding span
is crucial for text processing tasks that
require position information, such as highlighting or extracting substrings.
Basic Syntax
The syntax for Match.span
is straightforward:
Match.span(group=0)
The optional group
parameter specifies which capture group to
return positions for. Default is 0 (the entire match).
Basic Match Position Retrieval
Let's start with a simple example of finding a word's position in text.
#!/usr/bin/python import re text = "The quick brown fox jumps over the lazy dog" pattern = re.compile(r'fox') match = pattern.search(text) if match: start, end = match.span() print(f"Found 'fox' from position {start} to {end}") print(f"Matched text: '{text[start:end]}'")
This example shows how to get the start and end positions of a match. The span is used to extract the matched substring from the original text.
start, end = match.span()
The span
method returns a tuple with two integers.
The first is the start index, the second is the end index.
text[start:end]
Using the span indices, we can slice the original string to get exactly
the matched portion. This is more efficient than using group
.
Span with Capture Groups
The span
method can also return positions for specific groups.
#!/usr/bin/python import re text = "Date: 2023-12-25" pattern = re.compile(r'(\d{4})-(\d{2})-(\d{2})') match = pattern.search(text) if match: print(f"Full match span: {match.span()}") print(f"Year span: {match.span(1)}") print(f"Month span: {match.span(2)}") print(f"Day span: {match.span(3)}")
This demonstrates getting spans for different capture groups. Group 0 is always the entire match, while groups 1+ are captures.
Multiple Matches with finditer
When processing multiple matches, span
helps locate each one.
#!/usr/bin/python import re text = "cat bat hat mat" pattern = re.compile(r'[a-z]at') for match in pattern.finditer(text): start, end = match.span() word = text[start:end] print(f"Found '{word}' at positions {start}-{end}")
The finditer
method returns match objects for all occurrences.
We use span
to get each match's position in the original text.
Span with Named Groups
Named groups make span positions more readable and maintainable.
#!/usr/bin/python import re text = "John Doe, age 30" pattern = re.compile(r'(?P<first>\w+) (?P<last>\w+), age (?P<age>\d+)') match = pattern.search(text) if match: print(f"Name span: {match.span('first')} to {match.span('last')}") print(f"Age span: {match.span('age')}")
Named groups allow accessing spans by meaningful names instead of numbers. This makes code more self-documenting and less prone to errors.
Span in Replacement Operations
Span information can guide complex text replacement operations.
#!/usr/bin/python import re text = "Error 404: Not Found; Error 500: Server Error" pattern = re.compile(r'Error (\d{3}): ([A-Za-z ]+)') def replace_error(match): code_span = match.span(1) desc_span = match.span(2) code = match.group(1) desc = match.group(2).lower() return f"Code {code} ({desc})" result = pattern.sub(replace_error, text) print(result)
This example uses span information within a replacement function. The spans help understand the structure of each match for processing.
Span with Overlapping Matches
The span
method helps identify overlapping matches.
#!/usr/bin/python import re text = "ababababab" pattern = re.compile(r'(?=(abab))') for i, match in enumerate(pattern.finditer(text), 1): start, end = match.span(1) print(f"Match {i}: '{match.group(1)}' at {start}-{end}")
This finds all overlapping occurrences of 'abab' in the text.
The lookahead pattern with span
captures each position.
Span in Multiline Text
Handling multiline text requires understanding how span counts positions.
#!/usr/bin/python import re text = """First line Second line Third line""" pattern = re.compile(r'^(\w+)', re.MULTILINE) for match in pattern.finditer(text): start, end = match.span() line = text[start:end] print(f"Found '{line}' at positions {start}-{end}")
The re.MULTILINE
flag makes ^
match line starts.
Span positions are counted across the entire string, including newlines.
Best Practices
When using Match.span
, follow these best practices:
- Always check if a match was found before calling
span
- Use named groups for better readability with span positions
- Remember span indices follow Python's slicing rules (end is exclusive)
- Combine with
group
when you need both text and position - Handle Unicode characters carefully as they may affect position counts
Performance Considerations
The span
method is highly optimized and has minimal overhead.
It's more efficient than using start
and end
separately.
For very large texts, be mindful that span positions are byte offsets. With Unicode, character counts may differ from byte positions.
Source
Python Match.span() documentation
This tutorial covered the essential aspects of Python's Match.span
method. Mastering position retrieval will enhance your text processing capabilities.
Author
List all Python tutorials.