Reading a web page in C#

In this article, we show how to scrape a web page in C#. C# tutorial is a comprehensive tutorial on C# language.

The tutorial shows how to read a page using HttpWebRequest, WebClient, HttpClient, Flurl.Http, and RestSharp.

In the examples of this tutorial, we read a web page from a tiny website called something.com.

Reading a web page with HttpWebRequest

The HttpWebRequest class provides support for the properties and methods that enable the user to interact directly with servers using HTTP. This API is now marked as obsolete.

DownloadPageHttpWebRequest.cs
using System;
using System.Net;
using System.IO;

class DownloadPageHttpWebRequest
{
    static void Main()
    {
        string html = string.Empty;
        string url = "http://www.something.com";

        HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);

        using (HttpWebResponse response = (HttpWebResponse) request.GetResponse())
        using (Stream stream = response.GetResponseStream())
        using (StreamReader reader = new StreamReader(stream))
        {
            html = reader.ReadToEnd();
        }

        Console.WriteLine(html);
    }
}

The example reads the contents of a site and prints it into the console.

HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);

A HttpWebRequest is created with the WebRequest.Create() method. It takes a URL as a parameter.

using (HttpWebResponse response = (HttpWebResponse) request.GetResponse())

From the request, we get an HttpWebResponse with the GetResponse() method.

using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
    html = reader.ReadToEnd();
}

We read the contents of the web page into a string.

Console.WriteLine(html);

The data is printed to the console.

$ dmcs get_page.cs 
$ ./get_page.exe 
<html><head><title>Something.</title></head>
<body>Something.</body>
</html>

We compile the example using Mono compiler and run it.

Reading a web page with WebClient

WebClient provides common methods for sending data to and receiving data from a resource identified by a URI.

DownloadPageWebClient.cs
using System;
using System.Net;

class DownloadPageWebClient
{
    static void Main()
    {
        using (WebClient client = new WebClient()) {
        
            string url = "http://www.something.com";
            string content = client.DownloadString(url);
            Console.WriteLine(content);
        }
    }
}

The code example fetches a web page with the WebClient.

string content = client.DownloadString(url);

The DownloadString() method retrieves the specified resource. This method blocks while downloading the resource.

In the second example, we provide a non-blocking approach with the WebClient.

DownloadPageWebClientAsync.cs
using System;
using System.Net;
using System.Threading.Tasks;

class DownloadPageWebClientAsync
{
    static void Main()
    {        
        using (WebClient client = new WebClient()) {
        
            client.DownloadStringCompleted += (sender, e) => 
            {
                Console.WriteLine(e.Result);
            };        
        
            string url = "http://www.something.com";
            client.DownloadStringAsync(new Uri(url));
        }
        
        Console.ReadLine();        
    }    
}

The code example gets the HTML code of a web page with the WebClient. This time the operation is asynchronous.

client.DownloadStringCompleted += (sender, e) => 
{
    Console.WriteLine(e.Result);
};        

The DownloadStringCompleted event occurs when an asynchronous resource-download operation completes.

client.DownloadStringAsync(new Uri(url));

The DownloadStringAsync method downloads the resource specified as a String or a Uri. The methods does not block the calling thread.

HttpClient

HttpClient provides a base class for sending HTTP requests and receiving HTTP responses from a resource identified by a URI.

DownloadPageHttpClient.cs
using System;
using System.Net.Http;
using System.Threading.Tasks;

class DownloadPageHttpClient
{
    static void Main()
    {
        Task t = new Task(DownloadPageAsync);
        t.Start();
        Console.WriteLine("Downloading page...");
        Console.ReadLine();
    }

    static async void DownloadPageAsync()
    {
        string page = "http://www.something.com";

        using (HttpClient client = new HttpClient())
        using (HttpResponseMessage response = await client.GetAsync(page))
        using (HttpContent content = response.Content)
        {
            string result = await content.ReadAsStringAsync();

            Console.WriteLine(result);
        }
    }
}

The code example scrapes a web page asynchronously using the HttpClient.

Task t = new Task(DownloadPageAsync);
t.Start();

A Task is used for an asynchronous operation.

static async void DownloadPageAsync()

Asynchronous methods must have the async modifier.

string result = await content.ReadAsStringAsync();

The await operator takes an awaitable as an argument; it examines if awaitable has already completed; if the awaitable has already completed, then the method continues running. The ReadAsStringAsync() reads the content to a string as an asynchronous operation.

Reading a web page with Flurl.Http

Flurl.Http is a fluent, portable, testable HTTP, third-party client library for the C# language.

DownloadPageFlurl.cs
using System;
using System.Threading.Tasks;
using Flurl.Http;

class DownloadPageFlurl
{
    static void Main(string[] args)
    {
        Task task = new Task(DownloadPageAsync);
        task.Start();

        Console.WriteLine("Downloading page...");
        Console.ReadLine();
    }

    static async void DownloadPageAsync()
    {
        string result = await "http://www.something.com".GetStringAsync();
        Console.WriteLine(result);
    }
}

The example reads a small web page and prints its contents to the terminal.

Task task = new Task(DownloadPageAsync);
task.Start();

The page is downloaded asynchronously. A Task class represents an asynchronous operation.

string result = await "http://www.something.com".GetStringAsync();

The await operator is applied to a task in an asynchronous method to suspend the execution of the method until the awaited task completes. The task represents ongoing work.

Reading a web page with RestSharp

RestSharp is a simple REST and HTTP API client for .NET. It is a third-party library.

DownloadPageRestSharp.cs
using System;
using RestSharp;

class DownloadPageRestSharp
{
    static void Main(string[] args)
    {
        var client = new RestClient("http://www.something.com");
        var request = new RestRequest("", Method.GET);

        client.ExecuteAsync(request, response => {
            Console.WriteLine(response.Content);
        });

        Console.ReadLine();
    }
}

The code example gets the contents of a web page using RestSharp library. The web page is downloaded asynchronously.

var client = new RestClient("http://www.something.com");

A rest client is created with the RestClient class.

var request = new RestRequest("", Method.GET);

A GET request is created with RestRequest.

client.ExecuteAsync(request, response => {
    Console.WriteLine(response.Content);
});

The request is executed asynchronously using the ExecuteAsync() method.

In this article, we have shown how to read a web page in C#. You might also be interested in the following related tutorials: MySQL C# tutorial, Date and time in C#, Reading text files in C#, or C# Winforms tutorial.