ZetCode

Python itertools groupby

last modified February 25, 2025

The groupby function from Python's itertools module is used to group data based on a key function. It is particularly useful for grouping sorted data into meaningful categories. This tutorial covers how to use groupby with practical examples.

The groupby function requires the data to be sorted by the same key that will be used for grouping. It returns an iterator that produces consecutive keys and groups from the iterable.

Basic Grouping

This example demonstrates how to group data by a single key.

basic_groupby.py
from itertools import groupby

# Dataset
data = [
    {'Adventurer': 'Lara', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 5, 'Danger_Level': 'Medium'},
    {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 10, 'Danger_Level': 'High'},
    {'Adventurer': 'Nathan', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 3, 'Danger_Level': 'Low'},
    {'Adventurer': 'Lara', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 8, 'Danger_Level': 'High'},
    {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 15, 'Danger_Level': 'High'},
    {'Adventurer': 'Nathan', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 4, 'Danger_Level': 'Medium'},
    {'Adventurer': 'Elena', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 6, 'Danger_Level': 'Low'},
    {'Adventurer': 'Lara', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 12, 'Danger_Level': 'Medium'}
]

# Sort data by Adventurer
data.sort(key=lambda x: x["Adventurer"])

# Group by Adventurer
for key, group in groupby(data, key=lambda x: x["Adventurer"]):
    print(f"Adventurer: {key}")
    for item in group:
        print(item)
    print()

The groupby function groups the data by the Adventurer key. The data is first sorted by the same key to ensure proper grouping.

Grouping by Multiple Keys

This example demonstrates how to group data by multiple keys.

groupby_multiple_keys.py
from itertools import groupby
from operator import itemgetter

# Dataset
data = [
    {'Adventurer': 'Lara', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 5, 'Danger_Level': 'Medium'},
    {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 10, 'Danger_Level': 'High'},
    {'Adventurer': 'Nathan', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 3, 'Danger_Level': 'Low'},
    {'Adventurer': 'Lara', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 8, 'Danger_Level': 'High'},
    {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 15, 'Danger_Level': 'High'},
    {'Adventurer': 'Nathan', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 4, 'Danger_Level': 'Medium'},
    {'Adventurer': 'Elena', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 6, 'Danger_Level': 'Low'},
    {'Adventurer': 'Lara', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 12, 'Danger_Level': 'Medium'}
]

# Sort data by Region and Danger_Level
data.sort(key=itemgetter("Region", "Danger_Level"))

# Group by Region and Danger_Level
for key, group in groupby(data, key=itemgetter("Region", "Danger_Level")):
    print(f"Region: {key[0]}, Danger Level: {key[1]}")
    for item in group:
        print(item)
    print()

The groupby function groups the items in the dataset based on the specified key. The for loop iterates over each group, where each group consists of a key and an iterator over the items in that group. Inside the loop, t he data by the Adventurer key. The data is first sorted by the same key to ensure proper grouping.

Aggregating Grouped Data

This example demonstrates how to aggregate data within each group.

aggregate_groupby.py
from itertools import groupby

# Dataset
data = [
    {'Adventurer': 'Lara', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 5, 'Danger_Level': 'Medium'},
    {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 10, 'Danger_Level': 'High'},
    {'Adventurer': 'Nathan', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 3, 'Danger_Level': 'Low'},
    {'Adventurer': 'Lara', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 8, 'Danger_Level': 'High'},
    {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 15, 'Danger_Level': 'High'},
    {'Adventurer': 'Nathan', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 4, 'Danger_Level': 'Medium'},
    {'Adventurer': 'Elena', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 6, 'Danger_Level': 'Low'},
    {'Adventurer': 'Lara', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 12, 'Danger_Level': 'Medium'}
]

# Sort data by Treasure
data.sort(key=lambda x: x["Treasure"])

# Group by Treasure and calculate total Quantity
for key, group in groupby(data, key=lambda x: x["Treasure"]):
    total_quantity = sum(item["Quantity"] for item in group)
    print(f"Treasure: {key}, Total Quantity: {total_quantity}")

The groupby function groups the data by Treasure, and the total quantity for each treasure type is calculated using the sum function.

Best Practices for Using groupby

Source

Python itertools groupby Documentation

In this article, we have explored how to use the groupby function from Python's itertools module to group and aggregate data.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Python tutorials.