Reading a web page in C#
last modified July 5, 2020
In this article, we show how to scrape a web page in C#. C# tutorial is a comprehensive tutorial on C# language.
The tutorial shows how to read a page using HttpWebRequest
, WebClient
,
HttpClient
, Flurl.Http
, and RestSharp
.
In the examples of this tutorial, we read a web page from a small webpage webcode.me.
C# read web page with HttpClient
HttpClient
provides a base class for sending HTTP requests and
receiving HTTP responses from a resource identified by a URI.
using System; using System.Net.Http; using System.Threading.Tasks; namespace DownloadPageHttpClient { class Program { static async Task Main(string[] args) { using var client = new HttpClient(); client.DefaultRequestHeaders.Add("User-Agent", "C# console program"); var content = await client.GetStringAsync("http://webcode.me"); Console.WriteLine(content); } } }
The code example scrapes a web page asynchronously using the HttpClient
.
var content = await client.GetStringAsync("http://webcode.me");
The await
operator takes an awaitable as an argument; it examines
if awaitable has already completed; if the awaitable has already completed, then
the method continues running. The GetStringAsync()
reads the
content to a string as an asynchronous operation.
$ dotnet run <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>My html page</title> </head> <body> <p> Today is a beautiful day. We go swimming and fishing. </p> <p> Hello there. How are you? </p> </body> </html>
This is the output.
C# reading web page with WebClient
WebClient
provides common methods for sending data to
and receiving data from a resource identified by a URI.
using System; using System.Net; namespace DownloadPageWebClient { class Program { static void Main(string[] args) { using var client = new WebClient(); client.Headers.Add("User-Agent", "C# console program"); string url = "http://webcode.me"; string content = client.DownloadString(url); Console.WriteLine(content); } } }
The code example fetches a web page with the WebClient
.
string content = client.DownloadString(url);
The DownloadString()
method retrieves the specified resource. This
method blocks while downloading the resource.
In the second example, we provide a non-blocking approach with the
WebClient
.
using System; using System.Net; using System.Threading.Tasks; namespace DownloadPageWebClientAsync { class Program { static void Main(string[] args) { using var client = new WebClient(); client.DownloadStringCompleted += (sender, e) => { Console.WriteLine(e.Result); }; string url = "http://www.webcode.me"; client.DownloadStringAsync(new Uri(url)); Console.ReadLine(); } } }
The code example gets the HTML code of a web page with the
WebClient
. This time the operation is asynchronous.
client.DownloadStringCompleted += (sender, e) => { Console.WriteLine(e.Result); };
The DownloadStringCompleted
event occurs when an asynchronous
resource-download operation completes.
client.DownloadStringAsync(new Uri(url));
The DownloadStringAsync
method downloads the resource specified as
a String or a Uri. The methods does not block the calling thread.
C# read web page with HttpWebRequest
The HttpWebRequest
class provides support for the properties and
methods that enable the user to interact directly with servers using HTTP.
This API is now marked as obsolete.
using System; using System.Net; using System.IO; namespace DownloadPageHttpWebRequest { class Program { static void Main(string[] args) { string html = string.Empty; string url = "http://webcode.me"; HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url); request.UserAgent = "C# console client"; using (HttpWebResponse response = (HttpWebResponse) request.GetResponse()) using (Stream stream = response.GetResponseStream()) using (StreamReader reader = new StreamReader(stream)) { html = reader.ReadToEnd(); } Console.WriteLine(html); } } }
The example reads the contents of a site and prints it into the console.
HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
A HttpWebRequest
is created with the WebRequest.Create()
method. It takes a URL as a parameter.
using (HttpWebResponse response = (HttpWebResponse) request.GetResponse())
From the request, we get an HttpWebResponse
with the GetResponse()
method.
using (Stream stream = response.GetResponseStream()) using (StreamReader reader = new StreamReader(stream)) { html = reader.ReadToEnd(); }
We read the contents of the web page into a string.
Console.WriteLine(html);
The data is printed to the console.
C# read web page with Flurl.Http
Flurl.Http is a fluent, portable, testable HTTP, third-party client library for the C# language.
$ dotnet add package Flurl.Http
We install the Flurl.Http
package.
using System; using System.Threading.Tasks; using Flurl.Http; namespace DownloadPageFlurl { class Program { static async Task Main(string[] args) { string result = await "http://webcode.me".GetStringAsync(); Console.WriteLine(result); } } }
The example reads a small web page and prints its contents to the terminal.
string result = await "http://webcode.me".GetStringAsync();
The await
operator is applied to a task in an asynchronous method
to suspend the execution of the method until the awaited task completes. The
task represents ongoing work. The data is retrieved with the GetStringAsync()
extention method.
Reading a web page with RestSharp
RestSharp is a simple REST and HTTP API client for .NET. It is a third-party library.
$ dotnet add package RestSharp
We install the RestSharp
package.
using System; using RestSharp; namespace DownloadPageRestSharp { class Program { static void Main(string[] args) { var client = new RestClient("http://webcode.me"); var request = new RestRequest("", Method.GET); client.ExecuteAsync(request, response => { Console.WriteLine(response.Content); }); Console.ReadLine(); } } }
The code example gets the contents of a web page using RestSharp library. The web page is downloaded asynchronously.
var client = new RestClient("http://webcode.me");
A rest client is created with the RestClient
class.
var request = new RestRequest("", Method.GET);
A GET request is created with RestRequest
.
client.ExecuteAsync(request, response => { Console.WriteLine(response.Content); });
The request is executed asynchronously using the ExecuteAsync()
method.
In this article, we have shown how to read a web page in C#.
Read C# tutorial or list all C# tutorials.