ZetCode

Polars Filtering Tutorial

last modified March 1, 2025

Polars is a fast and efficient DataFrame library for Python. It is designed for high-performance data manipulation and analysis. Filtering is a common operation in data analysis, allowing you to extract specific rows based on conditions.

This tutorial covers how to filter data in Polars DataFrames. We will explore basic and advanced filtering techniques with practical examples.

Basic Filtering

Filtering in Polars is done using the filter method. This method takes a condition and returns a new DataFrame with rows that satisfy the condition.

basic_filter.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pl.DataFrame(data)
filtered_df = df.filter(pl.col('Age') > 30)

print(filtered_df)

The filter(pl.col('Age') > 30) filters rows where the 'Age' column is greater than 30. This is useful for extracting specific subsets of data.

Multiple Conditions

You can combine multiple conditions using logical operators like & (AND) and | (OR). This allows for more complex filtering.

multiple_conditions.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pl.DataFrame(data)
filtered_df = df.filter((pl.col('Age') > 30) & (pl.col('City') == 'Chicago'))

print(filtered_df)

The (pl.col('Age') > 30) & (pl.col('City') == 'Chicago') filters rows where the 'Age' is greater than 30 and the 'City' is 'Chicago'. This is useful for precise data extraction.

Filtering with String Operations

Polars supports string operations for filtering. You can use methods like str.contains to filter rows based on string patterns.

string_filter.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pl.DataFrame(data)
filtered_df = df.filter(pl.col('City').str.contains('New'))

print(filtered_df)

The str.contains('New') filters rows where the 'City' column contains the substring 'New'. This is useful for pattern-based filtering.

Filtering with Null Values

Polars provides methods to handle null values during filtering. You can use is_null and is_not_null to filter rows with null or non-null values.

null_filter.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, None, 35, 40],
    'City': ['New York', 'Los Angeles', None, 'Houston']
}

df = pl.DataFrame(data)
filtered_df = df.filter(pl.col('Age').is_not_null())

print(filtered_df)

The is_not_null filters rows where the 'Age' column is not null. This is useful for cleaning data with missing values.

Filtering with Custom Functions

You can apply custom functions to filter rows. This allows for flexible and complex filtering logic.

custom_filter.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pl.DataFrame(data)

def custom_filter(age):
    return age > 30 and age < 40

filtered_df = df.filter(pl.col('Age').apply(custom_filter))

print(filtered_df)

The apply(custom_filter) applies a custom function to filter rows where the 'Age' is between 30 and 40. This is useful for advanced filtering.

Best Practices for Filtering

Source

Polars Documentation

In this article, we have explored how to filter data in Polars DataFrames.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Polars tutorials.