ZetCode

Python itertools Module

last modified April 2, 2025

The itertools module provides a set of fast, memory-efficient tools for working with iterators. These functions are inspired by constructs from functional programming languages and are designed to work seamlessly with Python's iterator protocol. This guide covers all itertools functions with practical examples, performance considerations, and real-world applications.

Infinite Iterators

itertools provides three functions for creating infinite iterators: count, cycle, and repeat. These generate values indefinitely until explicitly stopped. This example demonstrates their basic usage patterns and common applications.

infinite_iterators.py
import itertools

# 1. count(start=0, step=1) - infinite arithmetic sequence
counter = itertools.count(start=5, step=3)
print("Count:", [next(counter) for _ in range(5)])  # [5, 8, 11, 14, 17]

# 2. cycle(iterable) - infinitely cycle through an iterable
cycler = itertools.cycle('ABC')
print("Cycle:", [next(cycler) for _ in range(6)])  # ['A', 'B', 'C', 'A', 'B', 'C']

# 3. repeat(object[, times]) - repeat object indefinitely or fixed times
repeater = itertools.repeat('hello', 3)
print("Repeat:", list(repeater))  # ['hello', 'hello', 'hello']

# Additional example: Creating sliding windows with count
data = [10, 20, 30, 40, 50]
windows = zip(itertools.count(), data, data[1:], data[2:])
print("Sliding windows:", list(windows))
# [(0, 10, 20, 30), (1, 20, 30, 40), (2, 30, 40, 50)]

count generates an infinite sequence of numbers with optional start and step values. cycle endlessly repeats the elements of a finite iterable. repeat yields the same value either indefinitely or a specified number of times.

These infinite iterators are memory-efficient as they generate values on-demand. They're often used with zip or islice to create finite sequences or with functions that need indefinite streams of values.

Combinatoric Iterators

The combinatoric iterators (product, permutations, combinations, etc.) generate complex sequences from input iterables. These functions are invaluable for solving problems involving combinations, permutations, or Cartesian products.

combinatoric_iterators.py
import itertools

# 4. product(*iterables, repeat=1) - Cartesian product
dice = itertools.product([1, 2, 3], ['a', 'b'])
print("Product:", list(dice))
# [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b'), (3, 'a'), (3, 'b')]

# 5. permutations(iterable, r=None) - r-length permutations
letters = itertools.permutations('ABC', 2)
print("Permutations:", list(letters))
# [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]

# 6. combinations(iterable, r) - r-length combinations, no repeats
cards = itertools.combinations(['♥A', '♦K', '♣Q'], 2)
print("Combinations:", list(cards))
# [('♥A', '♦K'), ('♥A', '♣Q'), ('♦K', '♣Q')]

# 7. combinations_with_replacement(iterable, r) - with repeats
dice_rolls = itertools.combinations_with_replacement([1, 2, 3], 2)
print("Combinations w/replacement:", list(dice_rolls))
# [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]

# Additional example: Generating truth tables
variables = [False, True]
truth_table = itertools.product(variables, repeat=2)
print("Truth table:")
for a, b in truth_table:
    print(f"{a} AND {b} = {a and b}")

product computes the Cartesian product of input iterables, equivalent to nested for-loops. permutations generates all possible orderings with no repeated elements. combinations produces subsequences where order doesn't matter, while combinations_with_replacement allows repeated elements.

These functions are particularly useful in probability, statistics, game development, and algorithm design. They can generate large result sets, so they're often used with other itertools to limit output.

Iterators Terminating on Shortest Input

This group includes functions like chain, zip_longest, and filterfalse that process multiple iterables until the shortest is exhausted (except zip_longest). These are essential for working with multiple data streams.

terminating_iterators.py
import itertools

# 8. chain(*iterables) - concatenate iterables
merged = itertools.chain('ABC', [1, 2, 3], (True, False))
print("Chain:", list(merged))
# ['A', 'B', 'C', 1, 2, 3, True, False]

# 9. zip_longest(*iterables, fillvalue=None) - zip to longest iterable
names = ['Alice', 'Bob']
scores = [85, 92, 78]
zipped = itertools.zip_longest(names, scores, fillvalue='N/A')
print("Zip longest:", list(zipped))
# [('Alice', 85), ('Bob', 92), ('N/A', 78)]

# 10. filterfalse(predicate, iterable) - elements where predicate is False
numbers = [0, 1, 0, 2, 3, 0, 4]
non_zeros = itertools.filterfalse(lambda x: x == 0, numbers)
print("Filterfalse:", list(non_zeros))  # [1, 2, 3, 4]

# 11. islice(iterable, stop) or islice(iterable, start, stop[, step]) - slice iterator
infinite = itertools.count()
first_5_evens = itertools.islice(infinite, 0, 10, 2)
print("Islice:", list(first_5_evens))  # [0, 2, 4, 6, 8]

# Additional example: Processing batches
data = range(100)
batch_size = 10
for batch in itertools.islice(data, 0, None, batch_size):
    print("Batch:", list(itertools.islice(data, batch, batch + batch_size)))

chain is particularly useful for combining disparate data sources. zip_longest handles uneven length iterables gracefully. filterfalse provides the inverse of the built-in filter. islice enables efficient slicing of iterators without converting to lists.

These functions shine in data processing pipelines where you need to combine, filter, or window streams of data without loading everything into memory. They're often used with file processing and database queries.

Grouping and Filtering

The groupby and takewhile/dropwhile functions provide powerful tools for organizing and filtering sequential data. These are particularly valuable for data analysis and preprocessing tasks.

grouping_filtering.py
import itertools

# 12. groupby(iterable, key=None) - group consecutive elements
animals = ['ant', 'bee', 'cat', 'dog', 'eagle', 'flamingo']
grouped = itertools.groupby(animals, key=lambda x: x[0])
print("Groupby:")
for key, group in grouped:
    print(f"{key}: {list(group)}")
# a: ['ant']
# b: ['bee']
# c: ['cat']
# d: ['dog']
# e: ['eagle']
# f: ['flamingo']

# 13. takewhile(predicate, iterable) - take until predicate fails
numbers = [1, 4, 6, 8, 2, 5, 3]
taken = itertools.takewhile(lambda x: x < 7, numbers)
print("Takewhile:", list(taken))  # [1, 4, 6]

# 14. dropwhile(predicate, iterable) - drop until predicate fails
dropped = itertools.dropwhile(lambda x: x < 7, numbers)
print("Dropwhile:", list(dropped))  # [8, 2, 5, 3]

# Additional example: Processing log files
log_lines = [
    "INFO: System started",
    "INFO: User logged in",
    "ERROR: File not found",
    "INFO: Request processed",
    "ERROR: Database timeout"
]

# Group by log level
get_level = lambda line: line.split(':')[0]
for level, lines in itertools.groupby(log_lines, key=get_level):
    print(f"\n{level} messages:")
    for line in lines:
        print("  ", line.split(':', 1)[1].strip())

groupby groups consecutive elements sharing a key (requires sorted input for complete grouping). takewhile yields items until the predicate fails, while dropwhile skips items until the predicate fails then yields the rest.

These functions are invaluable for processing sequential data like logs, time series, or any grouped records. They enable efficient processing without loading entire datasets into memory.

Performance Considerations

While itertools functions are memory-efficient, their performance characteristics vary. This section compares common operations and demonstrates optimization techniques for working with large datasets.

performance.py
import itertools
import timeit
import random

# 15. Comparing chain methods
def test_chain():
    list(itertools.chain(range(1000), range(1000, 2000)))

def test_concat():
    list(range(1000)) + list(range(1000, 2000))

print("Chain vs concat:")
print("itertools.chain:", timeit.timeit(test_chain, number=10000))
print("list concatenation:", timeit.timeit(test_concat, number=10000))

# 16. Memory efficiency demonstration
large_range = itertools.count()  # Infinite, uses almost no memory
# Compare with list(range(1000000)) which would consume significant memory

# 17. Early termination with islice
def process_data():
    data = itertools.count()  # Infinite stream
    processed = (x**2 for x in itertools.islice(data, 1000000))
    return sum(processed)  # Doesn't store all squared values

print("\nProcessing 1M numbers:", process_data())

# Additional example: Filtering large datasets
def large_dataset():
    return (random.random() for _ in range(1000000))

# Memory-efficient filtering
positive = itertools.filterfalse(lambda x: x < 0.5, large_dataset())
print("\nCount > 0.5:", sum(1 for _ in itertools.islice(positive, 0, 100000)))

The benchmark shows itertools.chain is faster than list concatenation for large iterables. The memory efficiency example demonstrates how itertools can handle theoretically infinite sequences. The early termination example processes a large range without materializing it in memory.

Key takeaways: itertools functions excel at memory efficiency and lazy evaluation. They're particularly advantageous when working with large or infinite sequences, but for small datasets, built-in functions may be simpler and equally performant.

Real-World Applications

These examples demonstrate practical applications of itertools in common programming scenarios, from data analysis to algorithm implementation.

applications.py
import itertools
import operator

# 18. Running averages
def running_avg(data):
    it = itertools.accumulate(data, operator.add)
    for i, total in enumerate(it, 1):
        yield total / i

print("Running averages:", list(running_avg([10, 20, 30, 40])))
# [10.0, 15.0, 20.0, 25.0]

# 19. Pairwise iteration (Python 3.10+ has itertools.pairwise)
def pairwise(iterable):
    a, b = itertools.tee(iterable)
    next(b, None)
    return zip(a, b)

print("Pairwise differences:", [(y-x) for x, y in pairwise([1, 3, 6, 10])])
# [2, 3, 4]

# 20. Pagination with islice
def paginate(items, page_size):
    page_start = 0
    while True:
        page = list(itertools.islice(items, page_start, page_start + page_size))
        if not page:
            break
        yield page
        page_start += page_size

data = range(0, 10)
print("Paginated data:")
for page in paginate(data, 3):
    print(page)
# [0, 1, 2]
# [3, 4, 5]
# [6, 7, 8]
# [9]

# Additional example: Cartesian product for parameter grids
params = {
    'learning_rate': [0.01, 0.1],
    'batch_size': [32, 64],
    'optimizer': ['adam', 'sgd']
}

param_grid = itertools.product(*params.values())
print("\nParameter combinations:")
for combo in param_grid:
    print(dict(zip(params.keys(), combo)))

The running averages example shows how accumulate can simplify stateful calculations. The pairwise iteration demonstrates a common pattern in time series analysis. The pagination example illustrates handling large datasets in chunks. The parameter grid example is useful in machine learning hyperparameter tuning.

These patterns are widely applicable in data processing, scientific computing, and web development. The itertools functions help keep the code concise and memory-efficient.

Best Practices

Use itertools for memory-efficient processing of large or infinite sequences. Combine multiple itertools functions for complex pipelines. Prefer itertools over manual implementations for common iteration patterns. Remember that many itertools consume iterators (like tee), so they can't be reused. Document complex itertools pipelines for maintainability. Consider generator expressions for simple cases where they're more readable.

Source References

Learn more from these resources: Python itertools Documentation, and more-itertools Library.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Python tutorials.