Polars GroupBy Function
last modified March 1, 2025
Polars is a fast, efficient DataFrame library for Python. The groupby
function is used to group data based on one or more columns. This tutorial
covers how to use the groupby
function in Polars, with practical
examples.
Grouping data is essential for summarizing, aggregating, and analyzing datasets.
Polars provides a powerful groupby
method for these tasks.
Basic GroupBy: Count
This example shows how to group data and count the number of rows in each group.
import polars as pl data = { 'Category': ['A', 'B', 'A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40, 50, 60] } df = pl.DataFrame(data) result = df.groupby('Category').count() print(result)
The groupby('Category').count()
groups the data by 'Category' and
counts the number of rows in each group. This is useful for summarizing data.
GroupBy: Sum
This example demonstrates how to group data and calculate the sum of a column.
import polars as pl data = { 'Category': ['A', 'B', 'A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40, 50, 60] } df = pl.DataFrame(data) result = df.groupby('Category').sum() print(result)
The groupby('Category').sum()
groups the data by 'Category' and
calculates the sum of the 'Values' column. This is useful for aggregating data.
GroupBy: Mean
This example shows how to group data and calculate the mean of a column.
import polars as pl data = { 'Category': ['A', 'B', 'A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40, 50, 60] } df = pl.DataFrame(data) result = df.groupby('Category').mean() print(result)
The groupby('Category').mean()
groups the data by 'Category' and
calculates the mean of the 'Values' column. This is useful for analyzing trends.
GroupBy: Multiple Aggregations
This example demonstrates how to apply multiple aggregation functions to a group.
import polars as pl data = { 'Category': ['A', 'B', 'A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40, 50, 60] } df = pl.DataFrame(data) result = df.groupby('Category').agg([ pl.col('Values').sum().alias('Sum'), pl.col('Values').mean().alias('Mean') ]) print(result)
The groupby('Category').agg()
applies multiple aggregation functions
to the 'Values' column. This is useful for detailed analysis.
GroupBy: Custom Aggregation
This example shows how to apply a custom aggregation function to a group.
import polars as pl data = { 'Category': ['A', 'B', 'A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40, 50, 60] } df = pl.DataFrame(data) def custom_agg(x): return x.max() - x.min() result = df.groupby('Category').agg([ pl.col('Values').apply(custom_agg).alias('Range') ]) print(result)
The groupby('Category').agg()
applies a custom function to calculate
the range (max - min) within each group. This is useful for custom calculations.
GroupBy: Multiple Columns
This example demonstrates how to group data by multiple columns.
import polars as pl data = { 'Category': ['A', 'B', 'A', 'B', 'A', 'B'], 'SubCategory': ['X', 'X', 'Y', 'Y', 'X', 'Y'], 'Values': [10, 20, 30, 40, 50, 60] } df = pl.DataFrame(data) result = df.groupby(['Category', 'SubCategory']).sum() print(result)
The groupby(['Category', 'SubCategory']).sum()
groups the data by
both 'Category' and 'SubCategory' and calculates the sum of the 'Values' column.
This is useful for multi-level analysis.
GroupBy: Filter Groups
This example shows how to filter groups based on a condition.
import polars as pl data = { 'Category': ['A', 'B', 'A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40, 50, 60] } df = pl.DataFrame(data) result = df.groupby('Category').filter( pl.col('Values').sum() > 100 ) print(result)
The groupby('Category').filter()
filters groups where the sum of
'Values' is greater than 100. This is useful for conditional analysis.
GroupBy: Sort Groups
This example demonstrates how to sort groups based on a column.
import polars as pl data = { 'Category': ['A', 'B', 'A', 'B', 'A', 'B'], 'Values': [10, 20, 30, 40, 50, 60] } df = pl.DataFrame(data) result = df.groupby('Category').agg([ pl.col('Values').sum().alias('Sum') ]).sort('Sum', descending=True) print(result)
The groupby('Category').agg().sort()
sorts the groups by the sum of
'Values' in descending order. This is useful for ranking groups.
Best Practices for GroupBy
- Understand Data: Analyze data structure before grouping.
- Choose Appropriate Columns: Select columns that align with your analysis goals.
- Handle Missing Data: Use
fill_null
to handle missing values. - Validate Results: Check grouped data for accuracy and consistency.
Source
In this article, we have explored how to use the groupby
function in Polars.
Author
List all Polars tutorials.