C# Encoding
last modified July 5, 2023
In this article we show how to encode and decode data in C#.
Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
In C#, a string is a sequence of Unicode characters. It is a data type which stores a sequence of data values, usually bytes, in which elements usually stand for characters according to a character encoding. C# internally uses UTF-16 encoding.
Encoding is the process of transforming a set of Unicode characters into a sequence of bytes. Decoding is the opposite process; it is transforming of a sequence of encoded bytes into a set of Unicode characters.
There standard character encodings available in .NET: ASCII, UTF-7 (deprecated), UTF-8, UTF-16, and UTF-32.
The System.Text.Encoding
class is used in .NET for encoding and
decoding processes. .NET internally uses the UTF-16 character encoding. It is
available under the Encoding.Unicode
.
C# Encoding GetByteCount
The GetByteCount
method returns the number of bytes produced by
encoding the specified characters.
using System.Text; string text = "one 🐘 and three 🐋"; int n = Encoding.UTF8.GetByteCount(text); Console.WriteLine($"UTF-8: {n}"); n = Encoding.UTF32.GetByteCount(text); Console.WriteLine($"UTF-32: {n}"); n = Encoding.Unicode.GetByteCount(text); Console.WriteLine($"UTF-16: {n}"); n = Encoding.BigEndianUnicode.GetByteCount(text); Console.WriteLine($"UTF-16BE: {n}"); n = Encoding.Latin1.GetByteCount(text); Console.WriteLine($"Latin1: {n}"); n = Encoding.ASCII.GetByteCount(text); Console.WriteLine($"ASCII: {n}");
The example prints the number of bytes produces when the given string is encoded in the specified encoding.
$ dotnet run UTF-8: 23 UTF-32: 68 UTF-16: 38 UTF-16BE: 38 Latin1: 19 ASCII: 19
C# Encoding GetBytes
The GetBytes
method returns a byte array containing the results of
encoding the specified set of characters.
using System.Text; string text = "one 🐘 and three 🐋"; Console.WriteLine("UTF-8 bytes"); byte[] uft8Data = Encoding.UTF8.GetBytes(text); showBytes(uft8Data); Console.WriteLine("UTF-16 bytes"); byte[] uft16Data = Encoding.Unicode.GetBytes(text); showBytes(uft16Data); Console.WriteLine("UTF-16BE bytes"); byte[] uft16BEData = Encoding.BigEndianUnicode.GetBytes(text); showBytes(uft16BEData); Console.WriteLine("Latin1 bytes"); byte[] latin1Data = Encoding.Latin1.GetBytes(text); showBytes(latin1Data); void showBytes(byte[] data) { int i = 0; foreach (var e in data) { Console.Write($"{e.ToString("X4")} "); i++; if (i % 10 == 0) { Console.WriteLine(); } } Console.WriteLine(); }
The example encodes the given string into bytes of UTF-8, UTF-16, UTF-16BE, and Latin1 encodings.
$ dotnet run UTF-8 bytes 006F 006E 0065 0020 00F0 009F 0090 0098 0020 0061 006E 0064 0020 0074 0068 0072 0065 0065 0020 00F0 009F 0090 008B UTF-16 bytes 006F 0000 006E 0000 0065 0000 0020 0000 003D 00D8 0018 00DC 0020 0000 0061 0000 006E 0000 0064 0000 0020 0000 0074 0000 0068 0000 0072 0000 0065 0000 0065 0000 0020 0000 003D 00D8 000B 00DC UTF-16BE bytes 0000 006F 0000 006E 0000 0065 0000 0020 00D8 003D 00DC 0018 0000 0020 0000 0061 0000 006E 0000 0064 0000 0020 0000 0074 0000 0068 0000 0072 0000 0065 0000 0065 0000 0020 00D8 003D 00DC 000B Latin1 bytes 006F 006E 0065 0020 003F 003F 0020 0061 006E 0064 0020 0074 0068 0072 0065 0065 0020 003F 003F
C# Encoding GetString
The GetString
method builds a string that contains the results of
decoding the specified sequence of bytes.
using System.Text; string text = "one 🐘 and three 🐋"; Console.WriteLine("UTF-8 bytes"); byte[] uft8Data = Encoding.UTF8.GetBytes(text); string output = Encoding.UTF8.GetString(uft8Data); Console.WriteLine(output); Console.WriteLine("UTF-16 bytes"); byte[] uft16Data = Encoding.Unicode.GetBytes(text); output = Encoding.Unicode.GetString(uft16Data); Console.WriteLine(output); Console.WriteLine("UTF-16BE bytes"); byte[] uft16BEData = Encoding.BigEndianUnicode.GetBytes(text); output = Encoding.BigEndianUnicode.GetString(uft16BEData); Console.WriteLine(output); Console.WriteLine("Latin1 bytes"); byte[] latin1Data = Encoding.Latin1.GetBytes(text); output = Encoding.Latin1.GetString(latin1Data); Console.WriteLine(output);
In the example, we first encode the given string into an array of bytes with
GetBytes
. Later, we decode the bytes into strings with
GetString
. We use four different encodings.
$ dotnet run UTF-8 bytes one 🐘 and three 🐋 UTF-16 bytes one 🐘 and three 🐋 UTF-16BE bytes one 🐘 and three 🐋 Latin1 bytes one ?? and three ??
The Latin1 encoding is not able to work with emoticons.
C# Encoding.Convert
The Encoding.Convert
method converts an entire byte array from one
encoding to another.
using System.Text; string text = "one 🐘 and three 🐋"; byte[] utf16Data = Encoding.Unicode.GetBytes(text); byte[] utf8Data = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Data); Console.WriteLine("UTF-16 bytes"); showBytes(utf16Data); Console.WriteLine(); Console.WriteLine("UTF-8 bytes"); showBytes(utf8Data); Console.WriteLine(); string output = Encoding.UTF8.GetString(utf8Data); Console.WriteLine(output); void showBytes(byte[] data) { int i = 0; foreach (var e in data) { Console.Write($"{e.ToString("X4")} "); i++; if (i % 10 == 0) { Console.WriteLine(); } } Console.WriteLine(); }
In the example, we convert UTF-16 bytes into UTF-8 bytes.
$ dotnet run UTF-16 bytes 006F 0000 006E 0000 0065 0000 0020 0000 003D 00D8 0018 00DC 0020 0000 0061 0000 006E 0000 0064 0000 0020 0000 0074 0000 0068 0000 0072 0000 0065 0000 0065 0000 0020 0000 003D 00D8 000B 00DC UTF-8 bytes 006F 006E 0065 0020 00F0 009F 0090 0098 0020 0061 006E 0064 0020 0074 0068 0072 0065 0065 0020 00F0 009F 0090 008B one 🐘 and three 🐋
C# read/write data with Encoding
Next, we write data to a file and read from it using specified encoding.
using System.Text; string text = "one 🐘 and three 🐋"; using var fs = new FileStream("data.txt", FileMode.OpenOrCreate); using var sw = new StreamWriter(fs, Encoding.UTF8); sw.Write(text);
In the example, we write text into a file using Encoding.UTF8
.
using var sw = new StreamWriter(fs, Encoding.UTF8);
The second parameter of StreamWriter
is the character encoding to
use.
$ dotnet run $ file data.txt data.txt: Unicode text, UTF-8 (with BOM) text, with no line terminators $ cat data.txt one 🐘 and three 🐋
Next, we read the data from the file.
using System.Text; using var fs = new FileStream("data.txt", FileMode.Open); using var sr = new StreamReader(fs, Encoding.UTF8); string? text = sr.ReadLine(); Console.WriteLine(text);
We use StreamReader
to read the data; we specify the character
encoding in the second parameter.
$ dotnet run one 🐘 and three 🐋
Source
Encoding class - language reference
In this article we were encoding and decoding data in C#.
Author
List all C# tutorials.