Python Polars Tutorial
last modified March 1, 2025
Polars is a fast DataFrame library in Python designed for efficient data manipulation and analysis. It is built for performance, leveraging Rust under the hood. This tutorial introduces Polars with practical examples.
Polars supports lazy and eager execution modes, making it ideal for large datasets. It provides a Pandas-like API with additional optimizations.
Creating a DataFrame
This example shows how to create a Polars DataFrame from a dictionary.
import polars as pl data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pl.DataFrame(data) print(df)
The pl.DataFrame
function creates a DataFrame from a dictionary.
This is the simplest way to initialize a Polars DataFrame.
Reading a CSV File
This example demonstrates reading a CSV file into a Polars DataFrame.
import polars as pl df = pl.read_csv('data.csv') print(df)
The pl.read_csv
function reads a CSV file into a DataFrame. Polars
supports various file formats, including Parquet and JSON.
Filtering Rows
This example shows how to filter rows based on a condition.
import polars as pl df = pl.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35] }) filtered_df = df.filter(pl.col('Age') > 30) print(filtered_df)
The filter
method filters rows where the 'Age' column is greater
than 30. Polars uses expressions for efficient filtering.
Selecting Columns
This example demonstrates selecting specific columns from a DataFrame.
import polars as pl df = pl.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] }) selected_df = df.select(['Name', 'City']) print(selected_df)
The select
method selects specific columns from the DataFrame.
This is useful for focusing on relevant data.
Adding a New Column
This example shows how to add a new column to a DataFrame.
import polars as pl df = pl.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35] }) df = df.with_column((pl.col('Age') * 2).alias('DoubleAge')) print(df)
The with_column
method adds a new column 'DoubleAge', which is
twice the 'Age' column. Polars supports column-wise operations.
Grouping and Aggregating
This example demonstrates grouping data and calculating aggregate statistics.
import polars as pl df = pl.DataFrame({ 'City': ['New York', 'Los Angeles', 'New York', 'Chicago'], 'Sales': [100, 200, 150, 300] }) grouped_df = df.groupby('City').agg([ pl.col('Sales').sum().alias('TotalSales') ]) print(grouped_df)
The groupby
and agg
methods group data by 'City'
and calculate the total sales for each city. Polars supports efficient grouping.
Sorting Data
This example shows how to sort a DataFrame by a specific column.
import polars as pl df = pl.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35] }) sorted_df = df.sort('Age', reverse=True) print(sorted_df)
The sort
method sorts the DataFrame by the 'Age' column in
descending order. Polars provides efficient sorting algorithms.
Lazy Execution
This example demonstrates lazy execution for optimizing performance.
import polars as pl df = pl.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35] }) lazy_df = df.lazy().filter(pl.col('Age') > 30).collect() print(lazy_df)
The lazy
method enables lazy execution, which optimizes queries
before execution. Use collect
to trigger computation.
Best Practices for Using Polars
- Use Lazy Execution: Optimize queries with lazy execution for large datasets.
- Leverage Expressions: Use Polars expressions for efficient data manipulation.
- Choose Appropriate Data Types: Use correct data types to improve performance.
- Profile Queries: Profile queries to identify bottlenecks.
Source
In this article, we have explored the basics of Polars with practical examples.
Author
List all Polars tutorials.