ZetCode

Python Match.span Method

last modified April 20, 2025

Introduction to Match.span

The Match.span method is part of Python's re module. It returns a tuple containing the start and end positions of a match.

This method is available on match objects returned by regex operations. It provides precise location information about where matches occur in text.

Understanding span is crucial for text processing tasks that require position information, such as highlighting or extracting substrings.

Basic Syntax

The syntax for Match.span is straightforward:

Match.span(group=0)

The optional group parameter specifies which capture group to return positions for. Default is 0 (the entire match).

Basic Match Position Retrieval

Let's start with a simple example of finding a word's position in text.

basic_span.py
#!/usr/bin/python

import re

text = "The quick brown fox jumps over the lazy dog"
pattern = re.compile(r'fox')

match = pattern.search(text)
if match:
    start, end = match.span()
    print(f"Found 'fox' from position {start} to {end}")
    print(f"Matched text: '{text[start:end]}'")

This example shows how to get the start and end positions of a match. The span is used to extract the matched substring from the original text.

start, end = match.span()

The span method returns a tuple with two integers. The first is the start index, the second is the end index.

text[start:end]

Using the span indices, we can slice the original string to get exactly the matched portion. This is more efficient than using group.

Span with Capture Groups

The span method can also return positions for specific groups.

group_span.py
#!/usr/bin/python

import re

text = "Date: 2023-12-25"
pattern = re.compile(r'(\d{4})-(\d{2})-(\d{2})')

match = pattern.search(text)
if match:
    print(f"Full match span: {match.span()}")
    print(f"Year span: {match.span(1)}")
    print(f"Month span: {match.span(2)}")
    print(f"Day span: {match.span(3)}")

This demonstrates getting spans for different capture groups. Group 0 is always the entire match, while groups 1+ are captures.

Multiple Matches with finditer

When processing multiple matches, span helps locate each one.

multi_span.py
#!/usr/bin/python

import re

text = "cat bat hat mat"
pattern = re.compile(r'[a-z]at')

for match in pattern.finditer(text):
    start, end = match.span()
    word = text[start:end]
    print(f"Found '{word}' at positions {start}-{end}")

The finditer method returns match objects for all occurrences. We use span to get each match's position in the original text.

Span with Named Groups

Named groups make span positions more readable and maintainable.

named_span.py
#!/usr/bin/python

import re

text = "John Doe, age 30"
pattern = re.compile(r'(?P<first>\w+) (?P<last>\w+), age (?P<age>\d+)')

match = pattern.search(text)
if match:
    print(f"Name span: {match.span('first')} to {match.span('last')}")
    print(f"Age span: {match.span('age')}")

Named groups allow accessing spans by meaningful names instead of numbers. This makes code more self-documenting and less prone to errors.

Span in Replacement Operations

Span information can guide complex text replacement operations.

replace_span.py
#!/usr/bin/python

import re

text = "Error 404: Not Found; Error 500: Server Error"
pattern = re.compile(r'Error (\d{3}): ([A-Za-z ]+)')

def replace_error(match):
    code_span = match.span(1)
    desc_span = match.span(2)
    code = match.group(1)
    desc = match.group(2).lower()
    return f"Code {code} ({desc})"

result = pattern.sub(replace_error, text)
print(result)

This example uses span information within a replacement function. The spans help understand the structure of each match for processing.

Span with Overlapping Matches

The span method helps identify overlapping matches.

overlap_span.py
#!/usr/bin/python

import re

text = "ababababab"
pattern = re.compile(r'(?=(abab))')

for i, match in enumerate(pattern.finditer(text), 1):
    start, end = match.span(1)
    print(f"Match {i}: '{match.group(1)}' at {start}-{end}")

This finds all overlapping occurrences of 'abab' in the text. The lookahead pattern with span captures each position.

Span in Multiline Text

Handling multiline text requires understanding how span counts positions.

multiline_span.py
#!/usr/bin/python

import re

text = """First line
Second line
Third line"""
pattern = re.compile(r'^(\w+)', re.MULTILINE)

for match in pattern.finditer(text):
    start, end = match.span()
    line = text[start:end]
    print(f"Found '{line}' at positions {start}-{end}")

The re.MULTILINE flag makes ^ match line starts. Span positions are counted across the entire string, including newlines.

Best Practices

When using Match.span, follow these best practices:

Performance Considerations

The span method is highly optimized and has minimal overhead. It's more efficient than using start and end separately.

For very large texts, be mindful that span positions are byte offsets. With Unicode, character counts may differ from byte positions.

Source

Python Match.span() documentation

This tutorial covered the essential aspects of Python's Match.span method. Mastering position retrieval will enhance your text processing capabilities.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Python tutorials.