Polars DataFrame Tutorial
last modified March 1, 2025
Polars is a fast, efficient DataFrame library for Python. It is designed for high-performance data manipulation and analysis. This tutorial covers how to use Polars DataFrames with practical examples.
Polars provides a DataFrame API similar to Pandas but with better performance. It is optimized for large datasets and supports lazy evaluation for efficient query execution.
Creating a DataFrame
This example shows how to create a Polars DataFrame from a dictionary.
import polars as pl data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pl.DataFrame(data) print(df)
The pl.DataFrame()
function creates a DataFrame from a dictionary.
This is useful for small datasets or quick prototyping.
Reading a CSV File
This example demonstrates how to read a CSV file into a Polars DataFrame.
import polars as pl df = pl.read_csv('data.csv') print(df)
The pl.read_csv()
function reads a CSV file into a DataFrame. This
is useful for loading large datasets from files.
Filtering Rows
This example shows how to filter rows in a Polars DataFrame.
import polars as pl data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pl.DataFrame(data) filtered_df = df.filter(pl.col('Age') > 30) print(filtered_df)
The filter()
method filters rows based on a condition. This is
useful for selecting specific subsets of data.
Selecting Columns
This example demonstrates how to select specific columns in a Polars DataFrame.
import polars as pl data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pl.DataFrame(data) selected_df = df.select(['Name', 'City']) print(selected_df)
The select()
method selects specific columns from the DataFrame.
This is useful for focusing on relevant data.
Adding a New Column
This example shows how to add a new column to a Polars DataFrame.
import polars as pl data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pl.DataFrame(data) df = df.with_column((pl.col('Age') * 2).alias('DoubleAge')) print(df)
The with_column()
method adds a new column to the DataFrame. This
is useful for creating derived data.
Grouping and Aggregating
This example demonstrates how to group data and calculate aggregate statistics.
import polars as pl data = { 'City': ['New York', 'Los Angeles', 'New York', 'Chicago'], 'Sales': [100, 200, 150, 300] } df = pl.DataFrame(data) grouped_df = df.groupby('City').agg([ pl.col('Sales').sum().alias('TotalSales') ]) print(grouped_df)
The groupby()
and agg()
methods group data and
calculate aggregate statistics. This is useful for summarizing data.
Sorting Data
This example shows how to sort a Polars DataFrame by a column.
import polars as pl data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pl.DataFrame(data) sorted_df = df.sort('Age', reverse=True) print(sorted_df)
The sort()
method sorts the DataFrame by a column. This is useful
for organizing data in a specific order.
Joining DataFrames
This example demonstrates how to join two Polars DataFrames.
import polars as pl data1 = { 'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie'] } data2 = { 'ID': [1, 2, 4], 'City': ['New York', 'Los Angeles', 'Chicago'] } df1 = pl.DataFrame(data1) df2 = pl.DataFrame(data2) joined_df = df1.join(df2, on='ID', how='inner') print(joined_df)
The join()
method joins two DataFrames based on a common column.
This is useful for combining related datasets.
Best Practices for Polars DataFrames
- Use Lazy Evaluation: Leverage lazy evaluation for efficient query execution.
- Optimize Data Types: Use appropriate data types to reduce memory usage.
- Batch Operations: Perform batch operations to minimize overhead.
- Profile Performance: Profile your code to identify bottlenecks.
Source
In this article, we have explored how to use Polars DataFrames for data analysis.
Author
List all Polars tutorials.