ZetCode

Python glob

last modified May 28, 2026

In this article, we show how to use the glob module in Python. The glob module finds all pathnames matching a specified pattern according to the rules used by the Unix shell. It supports the wildcards *, ?, and [...], as well as recursive matching with **.

The glob module is part of Python's standard library and requires no additional installation. It is particularly useful in scripts that need to process batches of files selected by name pattern rather than enumerating them explicitly.

Basic Pattern Matching

The glob.glob function returns a list of pathnames that match the given pattern. The * wildcard matches any number of characters within a single directory component (but not a path separator).

main.py
import glob
import os
from pathlib import Path

# Build a small directory of sample files
Path('data').mkdir(exist_ok=True)
for name in ['report.txt', 'summary.txt', 'notes.md', 'data.csv', 'backup.txt']:
    Path(f'data/{name}').touch()

# Match all .txt files in data/
txt_files = glob.glob('data/*.txt')
print("Text files:")
for f in sorted(txt_files):
    print(f"  {f}")

# Match all files in data/ regardless of extension
all_files = glob.glob('data/*')
print(f"\nAll files in data/: {len(all_files)}")

glob.glob returns an unsorted list of matching paths. The order depends on the filesystem, so it is good practice to call sorted when a predictable order is required. Paths that begin with a dot are not matched by * unless the pattern also begins with a dot.

Single-Character Wildcard

The ? wildcard matches exactly one character within a single directory component. It can appear multiple times in a pattern, and each occurrence matches one arbitrary character.

main.py
import glob
import os
from pathlib import Path

Path('logs').mkdir(exist_ok=True)
for name in ['log1.txt', 'log2.txt', 'log3.txt', 'log10.txt', 'error.txt']:
    Path(f'logs/{name}').touch()

# Match files whose name is exactly log?.txt (one digit)
single = glob.glob('logs/log?.txt')
print("Single-digit log files:")
for f in sorted(single):
    print(f"  {f}")

# Match files whose name is exactly log??.txt (two characters after log)
double = glob.glob('logs/log??.txt')
print("\nTwo-character suffix log files:")
for f in sorted(double):
    print(f"  {f}")

Because ? matches exactly one character, log?.txt matches log1.txt but not log10.txt. Use * when the number of characters is variable, and ? when an exact length is required.

Character Ranges

Square bracket notation [...] matches any single character listed inside the brackets. A range such as [a-z] or [0-9] matches any character in that range. Prefixing with ! negates the set.

main.py
import glob
import os
from pathlib import Path

Path('src').mkdir(exist_ok=True)
for name in ['moduleA.py', 'moduleB.py', 'moduleC.py',
             'module1.py', 'module2.py', 'helper.py']:
    Path(f'src/{name}').touch()

# Match only modules ending with an uppercase letter
upper = glob.glob('src/module[A-Z].py')
print("Modules with uppercase suffix:")
for f in sorted(upper):
    print(f"  {f}")

# Match only modules ending with a digit
digit = glob.glob('src/module[0-9].py')
print("\nModules with digit suffix:")
for f in sorted(digit):
    print(f"  {f}")

# Match anything that is NOT a digit suffix
non_digit = glob.glob('src/module[!0-9].py')
print("\nModules without digit suffix:")
for f in sorted(non_digit):
    print(f"  {f}")

Character ranges follow standard POSIX shell globbing rules. They are case-sensitive on Linux and macOS. On Windows, case sensitivity depends on the case_sensitive parameter introduced in Python 3.12.

Recursive Directory Search

When recursive=True is passed to glob.glob, the pattern ** matches zero or more directories and subdirectories. This allows a single pattern to search an entire directory tree.

main.py
import glob
import os

# Build a nested directory tree
for path in ['project/src', 'project/src/utils', 'project/tests']:
    os.makedirs(path, exist_ok=True)

files = {
    'project/src/main.py': '',
    'project/src/utils/helpers.py': '',
    'project/src/utils/validators.py': '',
    'project/tests/test_main.py': '',
    'project/README.md': '',
}
for path, content in files.items():
    with open(path, 'w') as f:
        f.write(content)

# Find all Python files anywhere in the project tree
py_files = glob.glob('project/**/*.py', recursive=True)
print("All Python files:")
for f in sorted(py_files):
    print(f"  {f}")

# Find all files at any depth
all_files = glob.glob('project/**/*', recursive=True)
all_files = [f for f in all_files if os.path.isfile(f)]
print(f"\nTotal files found: {len(all_files)}")

Without recursive=True, the ** pattern is treated as a literal two-star wildcard and will not traverse subdirectories. For large directory trees, consider using glob.iglob with recursive=True to avoid building the entire list in memory.

Iterator-Based Matching

The glob.iglob function works exactly like glob.glob but returns an iterator instead of a list. This avoids storing all matching paths in memory at once, which is beneficial when searching large directory trees or when only the first few results are needed.

main.py
import glob
import os

# Create sample files
os.makedirs('archive', exist_ok=True)
for i in range(1, 6):
    with open(f'archive/file{i:03d}.log', 'w') as f:
        f.write(f'log entry {i}\n')

# Iterate without building a full list
it = glob.iglob('archive/*.log')
print("Log files (via iterator):")
for path in it:
    print(f"  {path}")

# Process only the first matching result
first = next(glob.iglob('archive/*.log'), None)
if first:
    print(f"\nFirst log file: {first}")
else:
    print("\nNo log files found.")

# Count matches without storing them all
count = sum(1 for _ in glob.iglob('archive/*.log'))
print(f"Total log files: {count}")

Because iglob is lazy, it is safe to use even when the number of matching files is unknown and potentially very large. Once the iterator is exhausted it cannot be restarted; create a new one with another call to iglob if you need to iterate again.

Searching in a Specific Root Directory

The root_dir parameter, added in Python 3.10, sets the directory from which glob starts searching. When supplied, the returned paths are relative to that root, making it easy to work with portable patterns independently of the current working directory.

main.py
import glob
import os

# Build sample structure
base = '/tmp/myproject'
for sub in ['src', 'tests', 'docs']:
    os.makedirs(f'{base}/{sub}', exist_ok=True)
for path in ['src/app.py', 'src/config.py', 'tests/test_app.py', 'docs/index.md']:
    with open(f'{base}/{path}', 'w') as f:
        f.write('')

# Search relative to root_dir — results are relative paths
py_files = glob.glob('**/*.py', root_dir=base, recursive=True)
print("Python files relative to project root:")
for f in sorted(py_files):
    print(f"  {f}")

# Combine with root_dir to get full paths
full_paths = [os.path.join(base, f) for f in py_files]
print("\nFull paths:")
for f in sorted(full_paths):
    print(f"  {f}")

The root_dir parameter does not change the process working directory; it only affects where the glob search begins. It can be combined with dir_fd (a file descriptor) when working with low-level OS interfaces. The companion parameter dironly restricts results to directories only.

Escaping Special Characters

The glob.escape function escapes all special glob characters (*, ?, and [) in a string so it is treated as a literal path component. This is essential when a filename or directory name contains characters that would otherwise be interpreted as wildcards.

main.py
import glob
import os

# Create files whose names contain glob special characters
os.makedirs('special', exist_ok=True)
tricky_names = ['report[2024].txt', 'data?.csv', 'summary*.md', 'normal.txt']
for name in tricky_names:
    with open(f'special/{name}', 'w') as f:
        f.write('')

# Without escaping, brackets are interpreted as a character class
unescaped = glob.glob('special/report[2024].txt')
print(f"Unescaped result (may be wrong): {unescaped}")

# With escaping, the literal filename is matched
escaped_name = glob.escape('report[2024].txt')
escaped = glob.glob(f'special/{escaped_name}')
print(f"Escaped result (correct)       : {escaped}")

# Build a safe pattern from a user-supplied filename
user_input = 'data?.csv'
safe_pattern = os.path.join('special', glob.escape(user_input))
result = glob.glob(safe_pattern)
print(f"Safe lookup for '{user_input}'  : {result}")

Without escaping, report[2024].txt would be treated as a character class matching report2.txt, report0.txt, and so on. Always use glob.escape when constructing patterns from user-supplied or externally sourced strings to avoid unintended matches.

Filtering by Multiple Extensions

The glob module does not support alternation like {*.py,*.js} directly. The standard approach is to call glob.glob once per pattern and combine the results using itertools.chain, or to post-filter a broad match by extension.

main.py
import glob
import itertools
import os
from pathlib import Path

Path('webapp').mkdir(exist_ok=True)
for name in ['index.html', 'style.css', 'app.js', 'utils.js',
             'main.py', 'config.yaml', 'README.md']:
    Path(f'webapp/{name}').touch()

# Approach 1: chain multiple glob calls
extensions = ['*.py', '*.js', '*.html']
matches = list(itertools.chain.from_iterable(
    glob.glob(f'webapp/{ext}') for ext in extensions
))
print("Source files (chained globs):")
for f in sorted(matches):
    print(f"  {f}")

# Approach 2: broad match then filter by suffix
wanted = {'.py', '.js', '.html'}
filtered = [f for f in glob.glob('webapp/*')
            if os.path.splitext(f)[1] in wanted]
print("\nSource files (post-filtered):")
for f in sorted(filtered):
    print(f"  {f}")

Both approaches produce the same result. The chained approach is better when each pattern is complex; post-filtering is simpler when you only need to check the file extension. Deduplication with set is recommended when patterns overlap and the same file could be matched more than once.

Case-Sensitive Matching

Python 3.12 introduced the case_sensitive parameter for glob.glob and glob.iglob. Setting it to True forces case-sensitive matching even on Windows, while False forces case-insensitive matching on Linux and macOS.

main.py
import glob
import os
import sys
from pathlib import Path

Path('assets').mkdir(exist_ok=True)
for name in ['Logo.PNG', 'logo.png', 'LOGO.PNG', 'background.jpg']:
    Path(f'assets/{name}').touch()

# Default: platform-native behaviour
native = glob.glob('assets/*.png')
print(f"Native (platform default): {sorted(native)}")

# Force case-sensitive (requires Python 3.12+)
if sys.version_info >= (3, 12):
    sensitive = glob.glob('assets/*.png', case_sensitive=True)
    print(f"Case-sensitive           : {sorted(sensitive)}")

    insensitive = glob.glob('assets/*.png', case_sensitive=False)
    print(f"Case-insensitive         : {sorted(insensitive)}")
else:
    # Fallback for older Python: filter with str.lower()
    all_files = glob.glob('assets/*')
    insensitive = [f for f in all_files if f.lower().endswith('.png')]
    print(f"Case-insensitive fallback: {sorted(insensitive)}")

On Linux, the default is case-sensitive; on Windows, case-insensitive; on macOS it depends on the filesystem. The explicit case_sensitive parameter overrides the platform default, making scripts portable across all operating systems without needing to branch on the host platform.

Sorting and Deduplicating Results

glob.glob does not guarantee a specific ordering of its results. When scripts must process files in a deterministic sequence — for example, when combining numbered log files — it is important to sort the list explicitly. Similarly, combining multiple glob calls can produce duplicates that should be removed.

main.py
import glob
import os
import re
from pathlib import Path

Path('releases').mkdir(exist_ok=True)
for name in ['v1.0.0.tar.gz', 'v1.2.0.tar.gz', 'v1.10.0.tar.gz',
             'v2.0.0.tar.gz', 'v1.9.0.tar.gz']:
    Path(f'releases/{name}').touch()

# Lexicographic sort (v1.10 sorts before v1.9 — wrong for versions)
lex_sorted = sorted(glob.glob('releases/*.tar.gz'))
print("Lexicographic order:")
for f in lex_sorted:
    print(f"  {f}")

# Natural / version-aware sort
def version_key(path):
    parts = re.findall(r'\d+', os.path.basename(path))
    return tuple(int(p) for p in parts)

nat_sorted = sorted(glob.glob('releases/*.tar.gz'), key=version_key)
print("\nVersion-aware order:")
for f in nat_sorted:
    print(f"  {f}")

# Deduplicate results from two overlapping patterns
combined = glob.glob('releases/v1*.tar.gz') + glob.glob('releases/v1.0*.tar.gz')
unique = sorted(set(combined))
print(f"\nDeduplicated: {unique}")

Natural sorting is essential for version strings and numbered filenames where lexicographic order gives incorrect results — for example, v1.10 sorts before v1.9 lexicographically. Using a set to remove duplicates before further processing is a simple and efficient approach when patterns overlap.

Source

Python glob - Documentation

In this article, we have shown how to use the glob module in Python for file pattern matching. We covered the *, ?, and [...] wildcards, recursive search with **, memory-efficient iteration with iglob, scoped searching with root_dir, safe pattern construction with glob.escape, matching multiple extensions, controlling case sensitivity, and sorting results correctly.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Python tutorials.