C# DataFrame
last modified August 23, 2023
In this article we work with a DataFrame
in C#. The
DataFrame
is located in the Microsoft.Data.Analysis namespace.
A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet.
$ dotnet add package Microsoft.Data.Analysis
We add the library to the project.
id,product_name,category,unit_price,units_in_stock 1,Chai,Beverages,18.0,39 2,Chang,Beverages,19.0,17 3,Aniseed Syrup,Condiments,10.0,13 4,Chef Anton's Cajun Seasoning,Condiments,22.0,53 5,Chef Anton's Gumbo Mix,Condiments,21.35,0 6,Grandma's Boysenberry Spread,Condiments,25.0,120 7,Uncle Bob's Organic Dried Pears,Produce,30.0,15 8,Northwoods Cranberry Sauce,Condiments,40.0,6 9,Mishi Kobe Niku,Meat/Poultry,97.0,29 10,Ikura,Seafood,31.0,31 11,Queso Cabrales,Dairy Products,21.0,22 12,Queso Manchego La Pastora,Dairy Products,38.0,86 13,Konbu,Seafood,6.0,24 14,Tofu,Produce,23.25,35 15,Genen Shouyu,Condiments,15.5,39 16,Pavlova,Confections,17.45,29 17,Alice Mutton,Meat/Poultry,39.0,0 18,Carnarvon Tigers,Seafood,62.5,42 19,Teatime Chocolate Biscuits,Confections,9.2,25 20,Sir Rodney's Marmalade,Confections,81.0,40 21,Sir Rodney's Scones,Confections,10.0,3 22,Gustaf's Knäckebröd,Grains/Cereals,21.0,104 23,Tunnbröd,Grains/Cereals,9.0,61 24,Guaraná Fantástica,Beverages,4.5,20 25,NuNuCa Nuß-Nougat-Creme,Confections,14.0,76 26,Gumbär Gummibärchen,Confections,31.23,15 27,Schoggi Schokolade,Confections,43.9,49 28,Rössle Sauerkraut,Produce,45.6,26 29,Thüringer Rostbratwurst,Meat/Poultry,123.79,0 30,Nord-Ost Matjeshering,Seafood,25.89,10 31,Gorgonzola Telino,Dairy Products,12.5,0 32,Mascarpone Fabioli,Dairy Products,32.0,9 33,Geitost,Dairy Products,2.5,112 34,Sasquatch Ale,Beverages,14.0,111 35,Steeleye Stout,Beverages,18.0,20 36,Inlagd Sill,Seafood,19.0,112 37,Gravad lax,Seafood,26.0,11 38,Côte de Blaye,Beverages,263.5,17 39,Chartreuse verte,Beverages,18.0,69 40,Boston Crab Meat,Seafood,18.4,123 41,Jack'England Clam Chowder,Seafood,9.65,85 42,Singaporean Hokkien Fried Mee,Grains/Cereals,14.0,26 43,Ipoh Coffee,Beverages,46.0,17 44,Gula Malacca,Condiments,19.45,27 45,Rogede sild,Seafood,9.5,5 46,Spegesild,Seafood,12.0,95 47,Zaanse koeken,Confections,9.5,36 48,Chocolade,Confections,12.75,15 49,Maxilaku,Confections,20.0,10 50,Valkoinen suklaa,Confections,16.25,65 51,Manjimup Dried Apples,Produce,53.0,20 52,Filo Mix,Grains/Cereals,7.0,38 53,Perth Pasties,Meat/Poultry,32.8,0 54,Tourtière,Meat/Poultry,7.45,21 55,Pâté chinois,Meat/Poultry,24.0,115 56,Gnocchi di nonna Alice,Grains/Cereals,38.0,21 57,Ravioli Angelo,Grains/Cereals,19.5,36 58,Escargots de Bourgogne,Seafood,13.25,62 59,Raclette Courdavault,Dairy Products,55.0,79 60,Camembert Pierrot,Dairy Products,34.0,19 61,Sirop d'érable,Condiments,28.5,113 62,Tarte au sucre,Confections,49.3,17 63,Vegie-spread,Condiments,43.9,24 64,Wimmers gute Semmelknödel,Grains/Cereals,33.25,22 65,Louisiana Fiery Hot Pepper Sauce,Condiments,21.05,76 66,Louisiana Hot Spiced Okra,Condiments,17.0,4 67,Laughing Lumberjack Lager,Beverages,14.0,52 68,Scottish Longbreads,Confections,12.5,6 69,Gudbrandsdalsost,Dairy Products,36.0,26 70,Outback Lager,Beverages,15.0,15 71,Flotemysost,Dairy Products,21.5,26 72,Mozzarella di Giovanni,Dairy Products,34.8,14 73,Röd Kaviar,Seafood,15.0,101 74,Longlife Tofu,Produce,10.0,4 75,Rhönbräu Klosterbier,Beverages,7.75,125 76,Lakkalikööri,Beverages,18.0,57 77,Original Frankfurter grüne Soße,Condiments,13.0,32
In the examples, we use this CSV file.
C# DataFrame simple example
The following is a simple DataFrame example.
using Microsoft.Data.Analysis; var fname = Path.GetFullPath("products.csv"); var df = DataFrame.LoadCsv(fname); Console.WriteLine(df.Info()); Console.WriteLine(df.Description());
We load data from the products.csv
file and call Info
and Description
methods on the dataframe.
var df = DataFrame.LoadCsv(fname);
The data is loaded from the CSV file using DataFrame.LoadCsv
method.
The DataFrame.LoadCsv
infers the column types automatically.
We can explicitly specify the column data types with the dataTypes
option.
using Microsoft.Data.Analysis; var fname = Path.GetFullPath("products.csv"); var df = DataFrame.LoadCsv(fname, dataTypes: new Type[] { typeof(int), typeof(string), typeof(string), typeof(decimal), typeof(decimal) }); Console.WriteLine(df.Info());
In the example, we specify the data types for the five columns.
C# DataFrame Head/Tail
The Head
method returns the first n rows, while the
Tail
the last n rows.
using Microsoft.Data.Analysis; var fname = Path.GetFullPath("products.csv"); var df = DataFrame.LoadCsv(fname); Console.WriteLine(df.Head(4)); Console.WriteLine(df.Tail(4));
The example shows the first four and the last four rows from the dataframe.
C# DataFrame Filter
We can use the Filter
method to filter data.
using Microsoft.Data.Analysis; var fname = Path.GetFullPath("products.csv"); var df = DataFrame.LoadCsv(fname); PrimitiveDataFrameColumn<bool> fil = df["unit_price"].ElementwiseGreaterThan(100); Console.WriteLine(df.Filter(fil)); Console.WriteLine(df.Filter(df.Columns[4].ElementwiseLessThan(10)));
In the program, we apply two filters.
PrimitiveDataFrameColumn<bool> fil = df["unit_price"].ElementwiseGreaterThan(100); Console.WriteLine(df.Filter(fil));
In the first filter, we show all products whose price is greater than 100.
Console.WriteLine(df.Filter(df.Columns[4].ElementwiseLessThan(10)));
The second filter is used to display products with stocks supply less than 10.
Console output with Spectre.Console
In the following example, we use the Spectre.Console
library to
output the data in a nice table.
$ dotnet add project Spectre.Console
We add the library to the project.
using Microsoft.Data.Analysis; using Spectre.Console; var fname = Path.GetFullPath("products.csv"); var df = DataFrame.LoadCsv(fname); var table = new Table() .Border(TableBorder.Ascii) .BorderColor(Color.SteelBlue) .AddColumn(new TableColumn("Id").RightAligned()) .AddColumn(new TableColumn("Product name")) .AddColumn(new TableColumn("Category").LeftAligned()) .AddColumn(new TableColumn("Unit price").RightAligned()) .AddColumn(new TableColumn("Units in stock").RightAligned()); foreach (var e in df.Rows) { string[] row = { $"{e[0]}", $"{e[1]}", $"{e[2]}", $"{e[3]:0.00}", $"{e[4]:0.00}" }; table.AddRow(row); } AnsiConsole.Write(table);
In the program we display all products in a neat console table.
var table = new Table() .Border(TableBorder.Ascii) .BorderColor(Color.SteelBlue) .AddColumn(new TableColumn("Id").RightAligned()) .AddColumn(new TableColumn("Product name")) .AddColumn(new TableColumn("Category").LeftAligned()) .AddColumn(new TableColumn("Unit price").RightAligned()) .AddColumn(new TableColumn("Units in stock").RightAligned());
We build up the table. We set the border and the columns.
foreach (var e in df.Rows) { string[] row = { $"{e[0]}", $"{e[1]}", $"{e[2]}", $"{e[3]:0.00}", $"{e[4]:0.00}" }; table.AddRow(row); }
We go over the rows of the dataframe and add each row to the console table.
AnsiConsole.Write(table);
We output the table with AnsiConsole.Write
.
$ dotnet run +--------------------------------------------------------------------------------------+ | Id | Product name | Category | Unit price | Units in stock | |----+----------------------------------+----------------+------------+----------------| | 1 | Chai | Beverages | 18.00 | 39.00 | | 2 | Chang | Beverages | 19.00 | 17.00 | | 3 | Aniseed Syrup | Condiments | 10.00 | 13.00 | | 4 | Chef Anton's Cajun Seasoning | Condiments | 22.00 | 53.00 | | 5 | Chef Anton's Gumbo Mix | Condiments | 21.35 | 0.00 | | 6 | Grandma's Boysenberry Spread | Condiments | 25.00 | 120.00 | | 7 | Uncle Bob's Organic Dried Pears | Produce | 30.00 | 15.00 | | 8 | Northwoods Cranberry Sauce | Condiments | 40.00 | 6.00 | | 9 | Mishi Kobe Niku | Meat/Poultry | 97.00 | 29.00 | ... | 71 | Flotemysost | Dairy Products | 21.50 | 26.00 | | 72 | Mozzarella di Giovanni | Dairy Products | 34.80 | 14.00 | | 73 | Röd Kaviar | Seafood | 15.00 | 101.00 | | 74 | Longlife Tofu | Produce | 10.00 | 4.00 | | 75 | Rhönbräu Klosterbier | Beverages | 7.75 | 125.00 | | 76 | Lakkalikööri | Beverages | 18.00 | 57.00 | | 77 | Original Frankfurter grüne Soße | Condiments | 13.00 | 32.00 | +--------------------------------------------------------------------------------------+
C# DataFrame sort
We sort data with OrderBy
.
using Microsoft.Data.Analysis; using Spectre.Console; var file = Path.GetFullPath("products.csv"); var df = DataFrame.LoadCsv(file); var table = new Table() .Border(TableBorder.Ascii) .BorderColor(Color.SteelBlue) .AddColumn(new TableColumn("Id").RightAligned()) .AddColumn(new TableColumn("Product name")) .AddColumn(new TableColumn("Category").LeftAligned()) .AddColumn(new TableColumn("Unit price").RightAligned()) .AddColumn(new TableColumn("Units in stock").RightAligned()); foreach (var e in df.OrderBy("unit_price").Rows) { string[] row = { $"{e[0]}", $"{e[1]}", $"{e[2]}", $"{e[3]:0.00}", $"{e[4]:0.00}" }; table.AddRow(row); } AnsiConsole.Write(table);
In the program, we sort data by unit price.
$ dotnet run +--------------------------------------------------------------------------------------+ | Id | Product name | Category | Unit price | Units in stock | |----+----------------------------------+----------------+------------+----------------| | 33 | Geitost | Dairy Products | 2.50 | 112.00 | | 24 | Guaraná Fantástica | Beverages | 4.50 | 20.00 | | 13 | Konbu | Seafood | 6.00 | 24.00 | | 52 | Filo Mix | Grains/Cereals | 7.00 | 38.00 | | 54 | Tourtière | Meat/Poultry | 7.45 | 21.00 | | 75 | Rhönbräu Klosterbier | Beverages | 7.75 | 125.00 | | 23 | Tunnbröd | Grains/Cereals | 9.00 | 61.00 | ... | 9 | Mishi Kobe Niku | Meat/Poultry | 97.00 | 29.00 | | 29 | Thüringer Rostbratwurst | Meat/Poultry | 123.79 | 0.00 | | 38 | Côte de Blaye | Beverages | 263.50 | 17.00 | +--------------------------------------------------------------------------------------+
C# DataFrame GroupBy
The GroupBy
method groups the rows of the DataFrame
by
unique values in the column name. It returns a GroupBy
object that
stores the group information.
using Microsoft.Data.Analysis; using Spectre.Console; var file = Path.GetFullPath("products.csv"); var df = DataFrame.LoadCsv(file); var table = new Table() .Border(TableBorder.Ascii) .BorderColor(Color.SteelBlue) .AddColumn(new TableColumn("Id").RightAligned()) .AddColumn(new TableColumn("Product name")) .AddColumn(new TableColumn("Category")) .AddColumn(new TableColumn("Unit price").RightAligned()) .AddColumn(new TableColumn("Units in stock").RightAligned()); var g = df.GroupBy("category"); foreach (var e in g.Head(100).Rows) { string[] row = { $"{e[1]}", $"{e[2]}", $"{e[0]}", $"{e[3]:0.00}", $"{e[4]:0.00}" }; table.AddRow(row); } AnsiConsole.Write(table);
In the example, we group products by category and display them in a table.
$ dotnet run +--------------------------------------------------------------------------------------+ | Id | Product name | Category | Unit price | Units in stock | |----+----------------------------------+----------------+------------+----------------| | 1 | Chai | Beverages | 18.00 | 39.00 | | 2 | Chang | Beverages | 19.00 | 17.00 | | 24 | Guaraná Fantástica | Beverages | 4.50 | 20.00 | | 34 | Sasquatch Ale | Beverages | 14.00 | 111.00 | | 35 | Steeleye Stout | Beverages | 18.00 | 20.00 | | 38 | Côte de Blaye | Beverages | 263.50 | 17.00 | | 39 | Chartreuse verte | Beverages | 18.00 | 69.00 | | 43 | Ipoh Coffee | Beverages | 46.00 | 17.00 | | 67 | Laughing Lumberjack Lager | Beverages | 14.00 | 52.00 | | 70 | Outback Lager | Beverages | 15.00 | 15.00 | | 75 | Rhönbräu Klosterbier | Beverages | 7.75 | 125.00 | | 76 | Lakkalikööri | Beverages | 18.00 | 57.00 | | 3 | Aniseed Syrup | Condiments | 10.00 | 13.00 | | 4 | Chef Anton's Cajun Seasoning | Condiments | 22.00 | 53.00 | | 5 | Chef Anton's Gumbo Mix | Condiments | 21.35 | 0.00 | | 6 | Grandma's Boysenberry Spread | Condiments | 25.00 | 120.00 | | 8 | Northwoods Cranberry Sauce | Condiments | 40.00 | 6.00 | ... | 52 | Filo Mix | Grains/Cereals | 7.00 | 38.00 | | 56 | Gnocchi di nonna Alice | Grains/Cereals | 38.00 | 21.00 | | 57 | Ravioli Angelo | Grains/Cereals | 19.50 | 36.00 | | 64 | Wimmers gute Semmelknödel | Grains/Cereals | 33.25 | 22.00 | +--------------------------------------------------------------------------------------+
On the grouped data, we can calculate aggregate operations such as max, min, or sum.
using Microsoft.Data.Analysis; using Spectre.Console; var file = Path.GetFullPath("products.csv"); var df = DataFrame.LoadCsv(file); var table = new Table() .Border(TableBorder.Ascii) .BorderColor(Color.SteelBlue) .AddColumn(new TableColumn("Category")) .AddColumn(new TableColumn("Max price").RightAligned()); var g = df.GroupBy("category"); var df2 = g.Max("unit_price"); foreach (var e in df2.Rows) { string[] row = { $"{e[0]}", $"{e[1]:0.00}" }; table.AddRow(row); } AnsiConsole.Write(table);
In the example, we calculate the max price for each category.
var g = df.GroupBy("category");
We group the data by category.
var df2 = g.Max("unit_price");
We calculate the max price for each category.
$ dotnet run +----------------------------+ | Category | Max price | |----------------+-----------| | Beverages | 263.50 | | Condiments | 43.90 | | Produce | 53.00 | | Meat/Poultry | 123.79 | | Seafood | 62.50 | | Dairy Products | 55.00 | | Confections | 81.00 | | Grains/Cereals | 38.00 | +----------------------------+
Source
Getting started with DataFrames
In this article we have worked with Microsoft Data Analysis DataFrame in C#.
Author
List all C# tutorials.