Polars Select Function
last modified March 1, 2025
Polars is a fast, efficient DataFrame library in Python. The select
function is used to choose specific columns from a DataFrame. This tutorial
covers how to use the select
function with practical examples.
The select
function is essential for data manipulation tasks like
filtering columns, renaming, and applying transformations. Polars provides a
simple and intuitive API for these operations.
Basic Column Selection
This example shows how to select specific columns from a DataFrame.
import polars as pl df = pl.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] }) selected = df.select(['A', 'B']) print(selected)
The select(['A', 'B'])
selects columns 'A' and 'B' from the
DataFrame. This is useful for focusing on specific data.
Select with Renaming
This example demonstrates renaming columns during selection.
import polars as pl df = pl.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] }) selected = df.select([ pl.col('A').alias('Column1'), pl.col('B').alias('Column2') ]) print(selected)
The alias
function renames columns during selection. This is useful
for creating more readable column names.
Select with Expression
This example shows how to apply an expression during column selection.
import polars as pl df = pl.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] }) selected = df.select([ (pl.col('A') + pl.col('B')).alias('Sum') ]) print(selected)
The pl.col('A') + pl.col('B')
expression calculates the sum of
columns 'A' and 'B'. This is useful for creating derived columns.
Select with Filter
This example demonstrates filtering rows during column selection.
import polars as pl df = pl.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] }) selected = df.select([ pl.col('A').filter(pl.col('A') > 1) ]) print(selected)
The filter(pl.col('A') > 1)
filters rows where column 'A' is greater
than 1. This is useful for conditional data selection.
Select with Aggregation
This example shows how to aggregate data during column selection.
import polars as pl df = pl.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] }) selected = df.select([ pl.col('A').sum().alias('Total') ]) print(selected)
The sum
function calculates the total of column 'A'. This is useful
for summarizing data.
Select with Multiple Expressions
This example demonstrates selecting multiple columns with different expressions.
import polars as pl df = pl.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] }) selected = df.select([ pl.col('A').alias('Column1'), (pl.col('B') * 2).alias('DoubleB') ]) print(selected)
The pl.col('B') * 2
expression doubles the values in column 'B'. This
is useful for applying multiple transformations.
Select with Conditional Logic
This example shows how to use conditional logic during column selection.
import polars as pl df = pl.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9] }) selected = df.select([ pl.when(pl.col('A') > 1).then(pl.col('B')).otherwise(0).alias('Result') ]) print(selected)
The when
and then
functions apply conditional logic.
This is useful for creating dynamic columns.
Select with String Operations
This example demonstrates string operations during column selection.
import polars as pl df = pl.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'] }) selected = df.select([ pl.col('Name').str.to_uppercase().alias('UppercaseName') ]) print(selected)
The str.to_uppercase
function converts names to uppercase. This is
useful for text manipulation.
Select with Date Operations
This example shows how to perform date operations during column selection.
import polars as pl df = pl.DataFrame({ 'Date': ['2023-01-01', '2023-02-01', '2023-03-01'] }) selected = df.select([ pl.col('Date').str.strptime(pl.Date, '%Y-%m-%d').dt.month().alias('Month') ]) print(selected)
The dt.month
function extracts the month from the date. This is
useful for time-based analysis.
Select with Nested Data
This example demonstrates selecting nested data from a DataFrame.
import polars as pl df = pl.DataFrame({ 'A': [1, 2, 3], 'B': [[4, 5], [6, 7], [8, 9]] }) selected = df.select([ pl.col('B').arr.get(0).alias('FirstElement') ]) print(selected)
The arr.get(0)
function extracts the first element from nested lists.
This is useful for working with complex data structures.
Best Practices for Using Select
- Plan Column Selection: Identify required columns before applying
select
. - Use Aliases: Rename columns for better readability.
- Combine Expressions: Apply multiple transformations in a single
select
call. - Optimize Performance: Avoid unnecessary column selections to improve speed.
Source
In this article, we have explored how to use the select
function in Polars.
Author
List all Polars tutorials.