Polars String Operations
last modified March 1, 2025
Polars is a fast DataFrame library in Rust with Python bindings. It is designed for efficient data manipulation and analysis. String operations are essential for cleaning and transforming text data in DataFrames. This tutorial covers common string operations in Polars with practical examples.
String operations include tasks like substring extraction, case conversion, and pattern matching. Polars provides a rich set of string methods for these tasks, making it a powerful tool for text data processing.
Convert to Lowercase
This example shows how to convert a column of strings to lowercase.
import polars as pl df = pl.DataFrame({ "text": ["Hello", "WORLD", "Polars", "Tutorial"] }) df = df.with_column(pl.col("text").str.to_lowercase().alias("lowercase_text")) print(df)
The str.to_lowercase
method converts all strings in the 'text'
column to lowercase. This is useful for standardizing text data.
Extract Substrings
This example demonstrates how to extract substrings from a column.
import polars as pl df = pl.DataFrame({ "text": ["Hello World", "Polars Tutorial", "Data Science"] }) df = df.with_column(pl.col("text").str.slice(0, 5).alias("substring")) print(df)
The str.slice(0, 5)
method extracts the first 5 characters from
each string in the 'text' column. This is useful for extracting fixed-length
data.
Replace Substrings
This example shows how to replace substrings in a column.
import polars as pl df = pl.DataFrame({ "text": ["Hello World", "Polars Tutorial", "Data Science"] }) df = df.with_column(pl.col("text").str.replace("World", "Universe").alias("replaced_text")) print(df)
The str.replace("World", "Universe")
method replaces 'World' with
'Universe' in the 'text' column. This is useful for correcting or updating text.
Split Strings
This example demonstrates how to split strings into lists based on a delimiter.
import polars as pl df = pl.DataFrame({ "text": ["Hello,World", "Polars,Tutorial", "Data,Science"] }) df = df.with_column(pl.col("text").str.split(",").alias("split_text")) print(df)
The str.split(",")
method splits each string in the 'text' column
by commas. This is useful for parsing CSV-like data.
Concatenate Strings
This example shows how to concatenate strings from multiple columns.
import polars as pl df = pl.DataFrame({ "first_name": ["John", "Jane", "Alice"], "last_name": ["Doe", "Smith", "Johnson"] }) df = df.with_column((pl.col("first_name") + " " + pl.col("last_name")).alias("full_name")) print(df)
The +
operator concatenates 'first_name' and 'last_name' columns
with a space in between. This is useful for creating full names or combined text.
Check for Substrings
This example demonstrates how to check if a substring exists in a column.
import polars as pl df = pl.DataFrame({ "text": ["Hello World", "Polars Tutorial", "Data Science"] }) df = df.with_column(pl.col("text").str.contains("World").alias("contains_world")) print(df)
The str.contains("World")
method checks if 'World' exists in each
string of the 'text' column. This is useful for filtering or flagging data.
Trim Whitespace
This example shows how to trim leading and trailing whitespace from strings.
import polars as pl df = pl.DataFrame({ "text": [" Hello ", " Polars ", " Data Science "] }) df = df.with_column(pl.col("text").str.strip().alias("trimmed_text")) print(df)
The str.strip
method removes leading and trailing whitespace
from each string in the 'text' column. This is useful for cleaning messy data.
Regular Expression Matching
This example demonstrates how to use regular expressions to extract patterns.
import polars as pl df = pl.DataFrame({ "text": ["Hello123", "Polars456", "Data789"] }) df = df.with_column(pl.col("text").str.extract(r"\d+").alias("extracted_numbers")) print(df)
The str.extract(r"\d+")
method extracts numeric sequences from
each string in the 'text' column. This is useful for pattern-based extraction.
Best Practices for String Operations
- Standardize Text: Use lowercase or uppercase for consistency.
- Handle Missing Data: Check for null values before operations.
- Use Regular Expressions Wisely: Test regex patterns thoroughly.
- Optimize Performance: Use vectorized operations for large datasets.
Source
In this article, we have explored how to perform string operations in Polars.
Author
List all Polars tutorials.