ZetCode

Polars GroupBy Function

last modified March 1, 2025

Polars is a fast, efficient DataFrame library for Python. The groupby function is used to group data based on one or more columns. This tutorial covers how to use the groupby function in Polars, with practical examples.

Grouping data is essential for summarizing, aggregating, and analyzing datasets. Polars provides a powerful groupby method for these tasks.

Basic GroupBy: Count

This example shows how to group data and count the number of rows in each group.

groupby_count.py
import polars as pl

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Values': [10, 20, 30, 40, 50, 60]
}

df = pl.DataFrame(data)
result = df.groupby('Category').count()

print(result)

The groupby('Category').count() groups the data by 'Category' and counts the number of rows in each group. This is useful for summarizing data.

GroupBy: Sum

This example demonstrates how to group data and calculate the sum of a column.

groupby_sum.py
import polars as pl

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Values': [10, 20, 30, 40, 50, 60]
}

df = pl.DataFrame(data)
result = df.groupby('Category').sum()

print(result)

The groupby('Category').sum() groups the data by 'Category' and calculates the sum of the 'Values' column. This is useful for aggregating data.

GroupBy: Mean

This example shows how to group data and calculate the mean of a column.

groupby_mean.py
import polars as pl

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Values': [10, 20, 30, 40, 50, 60]
}

df = pl.DataFrame(data)
result = df.groupby('Category').mean()

print(result)

The groupby('Category').mean() groups the data by 'Category' and calculates the mean of the 'Values' column. This is useful for analyzing trends.

GroupBy: Multiple Aggregations

This example demonstrates how to apply multiple aggregation functions to a group.

groupby_multiple.py
import polars as pl

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Values': [10, 20, 30, 40, 50, 60]
}

df = pl.DataFrame(data)
result = df.groupby('Category').agg([
    pl.col('Values').sum().alias('Sum'),
    pl.col('Values').mean().alias('Mean')
])

print(result)

The groupby('Category').agg() applies multiple aggregation functions to the 'Values' column. This is useful for detailed analysis.

GroupBy: Custom Aggregation

This example shows how to apply a custom aggregation function to a group.

groupby_custom.py
import polars as pl

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Values': [10, 20, 30, 40, 50, 60]
}

df = pl.DataFrame(data)

def custom_agg(x):
    return x.max() - x.min()

result = df.groupby('Category').agg([
    pl.col('Values').apply(custom_agg).alias('Range')
])

print(result)

The groupby('Category').agg() applies a custom function to calculate the range (max - min) within each group. This is useful for custom calculations.

GroupBy: Multiple Columns

This example demonstrates how to group data by multiple columns.

groupby_multiple_columns.py
import polars as pl

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'SubCategory': ['X', 'X', 'Y', 'Y', 'X', 'Y'],
    'Values': [10, 20, 30, 40, 50, 60]
}

df = pl.DataFrame(data)
result = df.groupby(['Category', 'SubCategory']).sum()

print(result)

The groupby(['Category', 'SubCategory']).sum() groups the data by both 'Category' and 'SubCategory' and calculates the sum of the 'Values' column. This is useful for multi-level analysis.

GroupBy: Filter Groups

This example shows how to filter groups based on a condition.

groupby_filter.py
import polars as pl

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Values': [10, 20, 30, 40, 50, 60]
}

df = pl.DataFrame(data)
result = df.groupby('Category').filter(
    pl.col('Values').sum() > 100
)

print(result)

The groupby('Category').filter() filters groups where the sum of 'Values' is greater than 100. This is useful for conditional analysis.

GroupBy: Sort Groups

This example demonstrates how to sort groups based on a column.

groupby_sort.py
import polars as pl

data = {
    'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
    'Values': [10, 20, 30, 40, 50, 60]
}

df = pl.DataFrame(data)
result = df.groupby('Category').agg([
    pl.col('Values').sum().alias('Sum')
]).sort('Sum', descending=True)

print(result)

The groupby('Category').agg().sort() sorts the groups by the sum of 'Values' in descending order. This is useful for ranking groups.

Best Practices for GroupBy

Source

Polars Documentation

In this article, we have explored how to use the groupby function in Polars.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Polars tutorials.