Python itertools groupby
last modified February 25, 2025
The groupby
function from Python's itertools
module is
used to group data based on a key function. It is particularly useful for
grouping sorted data into meaningful categories. This tutorial covers how to use
groupby
with practical examples.
The groupby
function requires the data to be sorted by the same key
that will be used for grouping. It returns an iterator that produces consecutive
keys and groups from the iterable.
Basic Grouping
This example demonstrates how to group data by a single key.
from itertools import groupby # Dataset data = [ {'Adventurer': 'Lara', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 5, 'Danger_Level': 'Medium'}, {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 10, 'Danger_Level': 'High'}, {'Adventurer': 'Nathan', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 3, 'Danger_Level': 'Low'}, {'Adventurer': 'Lara', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 8, 'Danger_Level': 'High'}, {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 15, 'Danger_Level': 'High'}, {'Adventurer': 'Nathan', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 4, 'Danger_Level': 'Medium'}, {'Adventurer': 'Elena', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 6, 'Danger_Level': 'Low'}, {'Adventurer': 'Lara', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 12, 'Danger_Level': 'Medium'} ] # Sort data by Adventurer data.sort(key=lambda x: x["Adventurer"]) # Group by Adventurer for key, group in groupby(data, key=lambda x: x["Adventurer"]): print(f"Adventurer: {key}") for item in group: print(item) print()
The groupby
function groups the data by the Adventurer
key. The data is first sorted by the same key to ensure proper grouping.
Grouping by Multiple Keys
This example demonstrates how to group data by multiple keys.
from itertools import groupby from operator import itemgetter # Dataset data = [ {'Adventurer': 'Lara', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 5, 'Danger_Level': 'Medium'}, {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 10, 'Danger_Level': 'High'}, {'Adventurer': 'Nathan', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 3, 'Danger_Level': 'Low'}, {'Adventurer': 'Lara', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 8, 'Danger_Level': 'High'}, {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 15, 'Danger_Level': 'High'}, {'Adventurer': 'Nathan', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 4, 'Danger_Level': 'Medium'}, {'Adventurer': 'Elena', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 6, 'Danger_Level': 'Low'}, {'Adventurer': 'Lara', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 12, 'Danger_Level': 'Medium'} ] # Sort data by Region and Danger_Level data.sort(key=itemgetter("Region", "Danger_Level")) # Group by Region and Danger_Level for key, group in groupby(data, key=itemgetter("Region", "Danger_Level")): print(f"Region: {key[0]}, Danger Level: {key[1]}") for item in group: print(item) print()
The groupby
function groups the items in the dataset based on the
specified key. The for loop iterates over each group, where each group consists
of a key and an iterator over the items in that group. Inside the loop, t he
data by the Adventurer
key. The data is first sorted by the same
key to ensure proper grouping.
Aggregating Grouped Data
This example demonstrates how to aggregate data within each group.
from itertools import groupby # Dataset data = [ {'Adventurer': 'Lara', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 5, 'Danger_Level': 'Medium'}, {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 10, 'Danger_Level': 'High'}, {'Adventurer': 'Nathan', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 3, 'Danger_Level': 'Low'}, {'Adventurer': 'Lara', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 8, 'Danger_Level': 'High'}, {'Adventurer': 'Indy', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 15, 'Danger_Level': 'High'}, {'Adventurer': 'Nathan', 'Region': 'Forest', 'Treasure': 'Gems', 'Quantity': 4, 'Danger_Level': 'Medium'}, {'Adventurer': 'Elena', 'Region': 'Mountain', 'Treasure': 'Relics', 'Quantity': 6, 'Danger_Level': 'Low'}, {'Adventurer': 'Lara', 'Region': 'Desert', 'Treasure': 'Gold', 'Quantity': 12, 'Danger_Level': 'Medium'} ] # Sort data by Treasure data.sort(key=lambda x: x["Treasure"]) # Group by Treasure and calculate total Quantity for key, group in groupby(data, key=lambda x: x["Treasure"]): total_quantity = sum(item["Quantity"] for item in group) print(f"Treasure: {key}, Total Quantity: {total_quantity}")
The groupby
function groups the data by Treasure
, and
the total quantity for each treasure type is calculated using the
sum
function.
Best Practices for Using groupby
- Sort Data First: Always sort the data by the same key used for grouping.
- Use itemgetter for Multiple Keys: The
itemgetter
function simplifies sorting and grouping by multiple keys. - Avoid Large Datasets: For very large datasets, consider using libraries like
pandas
for better performance. - Aggregate Data Efficiently: Use built-in functions like
sum
,max
, ormin
to aggregate grouped data.
Source
Python itertools groupby Documentation
In this article, we have explored how to use the groupby
function
from Python's itertools
module to group and aggregate data.
Author
List all Python tutorials.