Pandas Grouping Data
last modified March 1, 2025
Pandas is a powerful Python library for data manipulation. Grouping data is a common task when analyzing datasets. This tutorial covers how to group and aggregate data using Pandas, with practical examples.
Grouping allows you to split data into groups based on criteria, apply functions
to each group, and combine the results. Pandas provides the groupby
function for this purpose.
Basic Grouping
This example shows how to group data by a single column.
import pandas as pd data = { 'Category': ['A', 'B', 'A', 'B', 'A'], 'Values': [10, 20, 30, 40, 50] } df = pd.DataFrame(data) grouped = df.groupby('Category').sum() print(grouped)
The groupby('Category').sum()
groups the data by the 'Category'
column and calculates the sum of 'Values' for each group. This is useful for
aggregating data.
Grouping by Multiple Columns
This example demonstrates grouping by multiple columns.
import pandas as pd data = { 'Category': ['A', 'B', 'A', 'B', 'A'], 'Subcategory': ['X', 'X', 'Y', 'Y', 'X'], 'Values': [10, 20, 30, 40, 50] } df = pd.DataFrame(data) grouped = df.groupby(['Category', 'Subcategory']).sum() print(grouped)
The groupby(['Category', 'Subcategory']).sum()
groups the data by
both 'Category' and 'Subcategory' columns. This is useful for hierarchical
grouping.
Applying Multiple Aggregations
This example shows how to apply multiple aggregation functions to grouped data.
import pandas as pd data = { 'Category': ['A', 'B', 'A', 'B', 'A'], 'Values': [10, 20, 30, 40, 50] } df = pd.DataFrame(data) grouped = df.groupby('Category').agg(['sum', 'mean', 'count']) print(grouped)
The agg(['sum', 'mean', 'count'])
applies multiple aggregation
functions to the grouped data. This is useful for comprehensive analysis.
Grouping and Filtering
This example demonstrates filtering groups based on aggregation results.
import pandas as pd data = { 'Category': ['A', 'B', 'A', 'B', 'A'], 'Values': [10, 20, 30, 40, 50] } df = pd.DataFrame(data) grouped = df.groupby('Category').filter(lambda x: x['Values'].sum() > 50) print(grouped)
The filter(lambda x: x['Values'].sum() > 50)
filters groups where
the sum of 'Values' is greater than 50. This is useful for conditional grouping.
Grouping and Transforming
This example shows how to transform grouped data.
import pandas as pd data = { 'Category': ['A', 'B', 'A', 'B', 'A'], 'Values': [10, 20, 30, 40, 50] } df = pd.DataFrame(data) df['Normalized'] = df.groupby('Category')['Values'].transform(lambda x: (x - x.mean()) / x.std()) print(df)
The transform(lambda x: (x - x.mean()) / x.std())
normalizes the
'Values' column within each group. This is useful for standardizing data.
Grouping and Counting
This example demonstrates counting values in grouped data.
import pandas as pd data = { 'Category': ['A', 'B', 'A', 'B', 'A'], 'Values': [10, 20, 30, 40, 50] } df = pd.DataFrame(data) grouped = df.groupby('Category').size() print(grouped)
The size
method counts the number of rows in each group. This is
useful for frequency analysis.
Grouping and Custom Aggregation
This example shows how to apply custom aggregation functions to grouped data.
import pandas as pd data = { 'Category': ['A', 'B', 'A', 'B', 'A'], 'Values': [10, 20, 30, 40, 50] } df = pd.DataFrame(data) def custom_agg(x): return x.max() - x.min() grouped = df.groupby('Category').agg(custom_agg) print(grouped)
The agg(custom_agg)
applies a custom aggregation function to the
grouped data. This is useful for specialized calculations.
Best Practices for Grouping Data
- Understand Data: Analyze data structure before grouping.
- Choose Appropriate Aggregations: Use functions like
sum
,mean
, or custom logic. - Filter Groups: Use
filter
to exclude irrelevant groups. - Validate Results: Check grouped data for accuracy and completeness.
Source
In this article, we have explored how to group and aggregate data in Pandas.
Author
List all Python tutorials.