Python DataFrame
last modified February 25, 2025
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure in Python. It is one of the most commonly used data structures in data analysis and is provided by the Pandas library. This tutorial covers the basics of creating, manipulating, and analyzing DataFrames with practical examples.
Installing Pandas
You can install the Pandas library using pip
:
$ pip install pandas
Creating a DataFrame
This example demonstrates how to create a DataFrame from a dictionary.
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pd.DataFrame(data) print(df)
The DataFrame is created from a dictionary where keys are column names and values are lists of data. The output is a tabular representation of the data.
Reading Data from a CSV File
This example demonstrates how to read data from a CSV file into a DataFrame.
import pandas as pd df = pd.read_csv('data.csv') print(df)
The read_csv
function reads data from a CSV file and creates a
DataFrame. This is one of the most common ways to load data into a DataFrame.
Selecting Columns
This example demonstrates how to select specific columns from a DataFrame.
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pd.DataFrame(data) print(df['Name']) # Select a single column print(df[['Name', 'City']]) # Select multiple columns
Columns can be selected using their names. Single columns are returned as Series, while multiple columns are returned as a DataFrame.
Filtering Rows
This example demonstrates how to filter rows based on a condition.
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pd.DataFrame(data) filtered_df = df[df['Age'] > 30] print(filtered_df)
Rows can be filtered using a boolean condition. In this example, only rows where the age is greater than 30 are selected.
Adding a New Column
This example demonstrates how to add a new column to a DataFrame.
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pd.DataFrame(data) df['Salary'] = [70000, 80000, 90000] print(df)
A new column can be added to a DataFrame by assigning a list of values to a new column name.
Grouping and Aggregating Data
This example demonstrates how to group data and perform aggregations.
import pandas as pd data = { 'City': ['New York', 'Los Angeles', 'Chicago', 'New York', 'Chicago'], 'Sales': [200, 150, 300, 250, 400] } df = pd.DataFrame(data) grouped_df = df.groupby('City').sum() print(grouped_df)
The groupby
function groups data by a specific column, and the
sum
function aggregates the grouped data. In this example, sales
are summed by city.
Best Practices for Working with DataFrames
- Use Vectorized Operations: Avoid loops and use built-in Pandas functions for better performance.
- Handle Missing Data: Use methods like
dropna
orfillna
to handle missing values. - Optimize Memory Usage: Use appropriate data types to reduce memory consumption.
- Leverage Indexing: Use indexes for faster data retrieval and manipulation.
Source
Pandas DataFrame Documentation
In this article, we have explored Python DataFrames and demonstrated their use cases through practical examples.
Author
List all Python tutorials.