C# Encoding - Encode and Decode Text
last modified May 11, 2025
This article provides a comprehensive guide on encoding and decoding data in C#. Understanding how text is processed and stored is essential for handling different character sets and ensuring seamless data exchange across applications.
Unicode is a universal computing industry standard designed to provide consistent encoding, representation, and handling of text in most of the world's writing systems. It ensures that characters from different languages and symbols are properly represented across platforms, making globalized applications more reliable.
In C#, a string is a collection of Unicode characters. It is a data type that stores sequences of characters, typically as bytes, where each element represents a character according to a specific character encoding. Internally, C# uses UTF-16 encoding, ensuring that characters are stored in a standardized manner while supporting a vast range of symbols and languages.
Encoding is the process of converting a set of Unicode characters into a sequence of bytes, allowing data to be stored or transmitted efficiently. Decoding reverses this process, transforming encoded bytes back into readable Unicode characters. These processes play a critical role in handling text input, file operations, network communications, and data serialization.
.NET provides several standard character encodings to accommodate different use cases. These include ASCII, UTF-7 (now deprecated), UTF-8, UTF-16, and UTF-32. Among them, UTF-8 is widely used due to its efficient handling of various characters while minimizing storage requirements. UTF-16 remains the default encoding within the C# environment.
The System.Text.Encoding
class in .NET offers robust functionality
for encoding and decoding text. It enables developers to convert between
different character sets, ensuring data integrity when interacting with diverse
systems. The default encoding used internally in .NET is UTF-16, which can be
accessed via Encoding.Unicode
.
C# Encoding GetByteCount
The GetByteCount
method returns the number of bytes produced by
encoding the specified characters.
using System.Text; string text = "one 🐘 and three 🐋"; int n = Encoding.UTF8.GetByteCount(text); Console.WriteLine($"UTF-8: {n}"); n = Encoding.UTF32.GetByteCount(text); Console.WriteLine($"UTF-32: {n}"); n = Encoding.Unicode.GetByteCount(text); Console.WriteLine($"UTF-16: {n}"); n = Encoding.BigEndianUnicode.GetByteCount(text); Console.WriteLine($"UTF-16BE: {n}"); n = Encoding.Latin1.GetByteCount(text); Console.WriteLine($"Latin1: {n}"); n = Encoding.ASCII.GetByteCount(text); Console.WriteLine($"ASCII: {n}");
The example prints the number of bytes produced when the given string is encoded with each specified encoding. This output helps you understand how different encodings represent the same string in memory. Some encodings, like UTF-8, use variable-length bytes for characters, while others, such as UTF-32, use a fixed length. The difference in byte counts shows how much space each encoding requires for the same text, which is important when optimizing for storage or compatibility.
$ dotnet run UTF-8: 23 UTF-32: 68 UTF-16: 38 UTF-16BE: 38 Latin1: 19 ASCII: 19
C# Encoding GetBytes
The GetBytes
method returns a byte array containing the results of
encoding the specified set of characters.
using System.Text; string text = "one 🐘 and three 🐋"; Console.WriteLine("UTF-8 bytes"); byte[] uft8Data = Encoding.UTF8.GetBytes(text); showBytes(uft8Data); Console.WriteLine("UTF-16 bytes"); byte[] uft16Data = Encoding.Unicode.GetBytes(text); showBytes(uft16Data); Console.WriteLine("UTF-16BE bytes"); byte[] uft16BEData = Encoding.BigEndianUnicode.GetBytes(text); showBytes(uft16BEData); Console.WriteLine("Latin1 bytes"); byte[] latin1Data = Encoding.Latin1.GetBytes(text); showBytes(latin1Data); void showBytes(byte[] data) { int i = 0; foreach (var e in data) { Console.Write($"{e.ToString("X4")} "); i++; if (i % 10 == 0) { Console.WriteLine(); } } Console.WriteLine(); }
The example encodes the given string into bytes using UTF-8, UTF-16, UTF-16BE, and Latin1 encodings. By examining the byte arrays, you can see how each encoding handles Unicode characters, especially those outside the basic ASCII range. For example, UTF-8 and UTF-16 can represent emojis and special symbols, while Latin1 replaces unsupported characters with placeholders. This demonstrates why choosing the right encoding is crucial for preserving data integrity.
$ dotnet run UTF-8 bytes 006F 006E 0065 0020 00F0 009F 0090 0098 0020 0061 006E 0064 0020 0074 0068 0072 0065 0065 0020 00F0 009F 0090 008B UTF-16 bytes 006F 0000 006E 0000 0065 0000 0020 0000 003D 00D8 0018 00DC 0020 0000 0061 0000 006E 0000 0064 0000 0020 0000 0074 0000 0068 0000 0072 0000 0065 0000 0065 0000 0020 0000 003D 00D8 000B 00DC UTF-16BE bytes 0000 006F 0000 006E 0000 0065 0000 0020 00D8 003D 00DC 0018 0000 0020 0000 0061 0000 006E 0000 0064 0000 0020 0000 0074 0000 0068 0000 0072 0000 0065 0000 0065 0000 0020 00D8 003D 00DC 000B Latin1 bytes 006F 006E 0065 0020 003F 003F 0020 0061 006E 0064 0020 0074 0068 0072 0065 0065 0020 003F 003F
C# Encoding GetString
The GetString
method builds a string by decoding a given sequence
of bytes.
using System.Text; string text = "one 🐘 and three 🐋"; Console.WriteLine("UTF-8 bytes"); byte[] uft8Data = Encoding.UTF8.GetBytes(text); string output = Encoding.UTF8.GetString(uft8Data); Console.WriteLine(output); Console.WriteLine("UTF-16 bytes"); byte[] uft16Data = Encoding.Unicode.GetBytes(text); output = Encoding.Unicode.GetString(uft16Data); Console.WriteLine(output); Console.WriteLine("UTF-16BE bytes"); byte[] uft16BEData = Encoding.BigEndianUnicode.GetBytes(text); output = Encoding.BigEndianUnicode.GetString(uft16BEData); Console.WriteLine(output); Console.WriteLine("Latin1 bytes"); byte[] latin1Data = Encoding.Latin1.GetBytes(text); output = Encoding.Latin1.GetString(latin1Data); Console.WriteLine(output);
In this example, we first encode the string into a byte array using
GetBytes
. Then, we decode the bytes back into strings with
GetString
. Four different encodings are demonstrated.
This process shows how encoding and decoding work together. If the same encoding is used for both operations, the original string is preserved. However, if the encoding cannot represent certain characters (as with Latin1 and emojis), those characters are lost or replaced. This highlights the importance of matching encodings when transferring or storing text data.
$ dotnet run UTF-8 bytes one 🐘 and three 🐋 UTF-16 bytes one 🐘 and three 🐋 UTF-16BE bytes one 🐘 and three 🐋 Latin1 bytes one ?? and three ??
The Latin1 encoding cannot represent emoticons, so they appear as question marks.
C# Base64 Encoding and Decoding
Base64 encoding is commonly used to encode binary data as text, making it safe
for transmission over protocols that only support text. In C#, you can use the
Convert.ToBase64String
and Convert.FromBase64String
methods to encode and decode data.
using System.Text; string text = "Encode this string to Base64!"; // Encode to Base64 byte[] bytes = Encoding.UTF8.GetBytes(text); string base64 = Convert.ToBase64String(bytes); Console.WriteLine($"Base64: {base64}"); // Decode from Base64 byte[] decodedBytes = Convert.FromBase64String(base64); string decodedText = Encoding.UTF8.GetString(decodedBytes); Console.WriteLine($"Decoded: {decodedText}");
This example shows how to encode a string to Base64 and then decode it back to the original string. Base64 is useful for encoding data that needs to be stored or transferred as plain text, such as in XML or JSON files, or when embedding binary data in text-based formats.
C# Encoding.Convert
The Encoding.Convert
method converts a byte array from one
encoding to another.
using System.Text; string text = "one 🐘 and three 🐋"; byte[] utf16Data = Encoding.Unicode.GetBytes(text); byte[] utf8Data = Encoding.Convert(Encoding.Unicode, Encoding.UTF8, utf16Data); Console.WriteLine("UTF-16 bytes"); showBytes(utf16Data); Console.WriteLine(); Console.WriteLine("UTF-8 bytes"); showBytes(utf8Data); Console.WriteLine(); string output = Encoding.UTF8.GetString(utf8Data); Console.WriteLine(output); void showBytes(byte[] data) { int i = 0; foreach (var e in data) { Console.Write($"{e.ToString("X4")} "); i++; if (i % 10 == 0) { Console.WriteLine(); } } Console.WriteLine(); }
In this example, we convert UTF-16 bytes to UTF-8 bytes. The
Encoding.Convert
method is useful when you need to change the
encoding of data, such as when reading from a file in one encoding and saving it
in another. This ensures compatibility between systems that expect different
encodings, and helps prevent data corruption or loss during conversion.
$ dotnet run UTF-16 bytes 006F 0000 006E 0000 0065 0000 0020 0000 003D 00D8 0018 00DC 0020 0000 0061 0000 006E 0000 0064 0000 0020 0000 0074 0000 0068 0000 0072 0000 0065 0000 0065 0000 0020 0000 003D 00D8 000B 00DC UTF-8 bytes 006F 006E 0065 0020 00F0 009F 0090 0098 0020 0061 006E 0064 0020 0074 0068 0072 0065 0065 0020 00F0 009F 0090 008B one 🐘 and three 🐋
C# read/write data with Encoding
Next, we write data to a file and read it back using a specified encoding.
using System.Text; string text = "one 🐘 and three 🐋"; using var fs = new FileStream("data.txt", FileMode.OpenOrCreate); using var sw = new StreamWriter(fs, Encoding.UTF8); sw.Write(text);
In this example, we write text to a file using Encoding.UTF8
.
Specifying the encoding when writing to a file ensures that the text is stored correctly and can be read by other applications expecting the same encoding. UTF-8 is a common choice for its compatibility and efficiency, especially when working with international text or sharing files across platforms.
using var sw = new StreamWriter(fs, Encoding.UTF8);
The second parameter of StreamWriter
specifies the character
encoding to use.
$ dotnet run $ file data.txt data.txt: Unicode text, UTF-8 (with BOM) text, with no line terminators $ cat data.txt one 🐘 and three 🐋
Next, we read the data from the file.
using System.Text; using var fs = new FileStream("data.txt", FileMode.Open); using var sr = new StreamReader(fs, Encoding.UTF8); string? text = sr.ReadLine(); Console.WriteLine(text);
We use StreamReader
to read the data, specifying the encoding as
the second parameter. By providing the correct encoding to
StreamReader
, you ensure that the bytes in the file are interpreted
properly as characters. This is essential for reading files created with
different encodings, and helps avoid issues with misinterpreted or corrupted
text.
$ dotnet run one 🐘 and three 🐋
Source
Encoding class - language reference
In this article, we explored encoding and decoding data in C#.
Author
List all C# tutorials.