ZetCode

Polars DataFrame Tutorial

last modified March 1, 2025

Polars is a fast, efficient DataFrame library for Python. It is designed for high-performance data manipulation and analysis. This tutorial covers how to use Polars DataFrames with practical examples.

Polars provides a DataFrame API similar to Pandas but with better performance. It is optimized for large datasets and supports lazy evaluation for efficient query execution.

Creating a DataFrame

This example shows how to create a Polars DataFrame from a dictionary.

create_df.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pl.DataFrame(data)
print(df)

The pl.DataFrame() function creates a DataFrame from a dictionary. This is useful for small datasets or quick prototyping.

Reading a CSV File

This example demonstrates how to read a CSV file into a Polars DataFrame.

read_csv.py
import polars as pl

df = pl.read_csv('data.csv')
print(df)

The pl.read_csv() function reads a CSV file into a DataFrame. This is useful for loading large datasets from files.

Filtering Rows

This example shows how to filter rows in a Polars DataFrame.

filter_rows.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pl.DataFrame(data)
filtered_df = df.filter(pl.col('Age') > 30)
print(filtered_df)

The filter() method filters rows based on a condition. This is useful for selecting specific subsets of data.

Selecting Columns

This example demonstrates how to select specific columns in a Polars DataFrame.

select_columns.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pl.DataFrame(data)
selected_df = df.select(['Name', 'City'])
print(selected_df)

The select() method selects specific columns from the DataFrame. This is useful for focusing on relevant data.

Adding a New Column

This example shows how to add a new column to a Polars DataFrame.

add_column.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pl.DataFrame(data)
df = df.with_column((pl.col('Age') * 2).alias('DoubleAge'))
print(df)

The with_column() method adds a new column to the DataFrame. This is useful for creating derived data.

Grouping and Aggregating

This example demonstrates how to group data and calculate aggregate statistics.

group_agg.py
import polars as pl

data = {
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago'],
    'Sales': [100, 200, 150, 300]
}

df = pl.DataFrame(data)
grouped_df = df.groupby('City').agg([
    pl.col('Sales').sum().alias('TotalSales')
])
print(grouped_df)

The groupby() and agg() methods group data and calculate aggregate statistics. This is useful for summarizing data.

Sorting Data

This example shows how to sort a Polars DataFrame by a column.

sort_data.py
import polars as pl

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pl.DataFrame(data)
sorted_df = df.sort('Age', reverse=True)
print(sorted_df)

The sort() method sorts the DataFrame by a column. This is useful for organizing data in a specific order.

Joining DataFrames

This example demonstrates how to join two Polars DataFrames.

join_df.py
import polars as pl

data1 = {
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
}

data2 = {
    'ID': [1, 2, 4],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df1 = pl.DataFrame(data1)
df2 = pl.DataFrame(data2)
joined_df = df1.join(df2, on='ID', how='inner')
print(joined_df)

The join() method joins two DataFrames based on a common column. This is useful for combining related datasets.

Best Practices for Polars DataFrames

Source

Polars Documentation

In this article, we have explored how to use Polars DataFrames for data analysis.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Polars tutorials.