Ebooks

Reading a web page in C#

In this article, we show how to scrape a web page in C#. C# tutorial is a comprehensive tutorial on C# language.

The tutorial shows how to read a page using HttpWebRequest, WebClient, HttpClient, Flurl.Http, and RestSharp.

In the examples of this tutorial, we read a web page from a small webpage webcode.me.

C# read web page with HttpClient

HttpClient provides a base class for sending HTTP requests and receiving HTTP responses from a resource identified by a URI.

Program.cs
using System;
using System.Net.Http;
using System.Threading.Tasks;

namespace DownloadPageHttpClient
{
    class Program
    {
        static async Task Main(string[] args)
        {
            using var client = new HttpClient();
            client.DefaultRequestHeaders.Add("User-Agent", "C# console program");

            var content = await client.GetStringAsync("http://webcode.me");

            Console.WriteLine(content);
        }
    }
}

The code example scrapes a web page asynchronously using the HttpClient.

var content = await client.GetStringAsync("http://webcode.me");

The await operator takes an awaitable as an argument; it examines if awaitable has already completed; if the awaitable has already completed, then the method continues running. The GetStringAsync() reads the content to a string as an asynchronous operation.

$ dotnet run
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>My html page</title>
</head>
<body>

    <p>
        Today is a beautiful day. We go swimming and fishing.
    </p>

    <p>
            Hello there. How are you?
    </p>

</body>
</html>

This is the output.

C# reading web page with WebClient

WebClient provides common methods for sending data to and receiving data from a resource identified by a URI.

Program.cs
using System;
using System.Net;

namespace DownloadPageWebClient
{
    class Program
    {
        static void Main(string[] args)
        {
            using var client = new WebClient();
            client.Headers.Add("User-Agent", "C# console program");
            
            string url = "http://webcode.me";
            string content = client.DownloadString(url);

            Console.WriteLine(content);
        }
    }
}

The code example fetches a web page with the WebClient.

string content = client.DownloadString(url);

The DownloadString() method retrieves the specified resource. This method blocks while downloading the resource.

In the second example, we provide a non-blocking approach with the WebClient.

Program.cs
using System;
using System.Net;
using System.Threading.Tasks;

namespace DownloadPageWebClientAsync
{
    class Program
    {
        static void Main(string[] args)
        {
            using var client = new WebClient();

            client.DownloadStringCompleted += (sender, e) =>
            {
                Console.WriteLine(e.Result);
            };

            string url = "http://www.webcode.me";
            client.DownloadStringAsync(new Uri(url));

            Console.ReadLine();
        }
    }
}

The code example gets the HTML code of a web page with the WebClient. This time the operation is asynchronous.

client.DownloadStringCompleted += (sender, e) =>
{
    Console.WriteLine(e.Result);
};

The DownloadStringCompleted event occurs when an asynchronous resource-download operation completes.

client.DownloadStringAsync(new Uri(url));

The DownloadStringAsync method downloads the resource specified as a String or a Uri. The methods does not block the calling thread.

C# read web page with HttpWebRequest

The HttpWebRequest class provides support for the properties and methods that enable the user to interact directly with servers using HTTP. This API is now marked as obsolete.

Program.cs
using System;
using System.Net;
using System.IO;

namespace DownloadPageHttpWebRequest
{
    class Program
    {
        static void Main(string[] args)
        {
            string html = string.Empty;
            string url = "http://webcode.me";

            HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);
            request.UserAgent = "C# console client";

            using (HttpWebResponse response = (HttpWebResponse) request.GetResponse())
            using (Stream stream = response.GetResponseStream())
            using (StreamReader reader = new StreamReader(stream))
            {
                html = reader.ReadToEnd();
            }

            Console.WriteLine(html);
        }
    }
}

The example reads the contents of a site and prints it into the console.

HttpWebRequest request = (HttpWebRequest) WebRequest.Create(url);

A HttpWebRequest is created with the WebRequest.Create() method. It takes a URL as a parameter.

using (HttpWebResponse response = (HttpWebResponse) request.GetResponse())

From the request, we get an HttpWebResponse with the GetResponse() method.

using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
    html = reader.ReadToEnd();
}

We read the contents of the web page into a string.

Console.WriteLine(html);

The data is printed to the console.

C# read web page with Flurl.Http

Flurl.Http is a fluent, portable, testable HTTP, third-party client library for the C# language.

$ dotnet add package Flurl.Http

We install the Flurl.Http package.

DownloadPageFlurl.cs
using System;
using System.Threading.Tasks;
using Flurl.Http;

namespace DownloadPageFlurl
{
    class Program
    {
        static async Task Main(string[] args)
        {
            string result = await "http://webcode.me".GetStringAsync();
            Console.WriteLine(result);
        }
    }
}

The example reads a small web page and prints its contents to the terminal.

string result = await "http://webcode.me".GetStringAsync();

The await operator is applied to a task in an asynchronous method to suspend the execution of the method until the awaited task completes. The task represents ongoing work. The data is retrieved with the GetStringAsync() extention method.

Reading a web page with RestSharp

RestSharp is a simple REST and HTTP API client for .NET. It is a third-party library.

$ dotnet add package RestSharp

We install the RestSharp package.

Program.cs
using System;
using RestSharp;

namespace DownloadPageRestSharp
{
    class Program
    {
        static void Main(string[] args)
        {
            var client = new RestClient("http://webcode.me");
            var request = new RestRequest("", Method.GET);

            client.ExecuteAsync(request, response =>
            {
                Console.WriteLine(response.Content);
            });

            Console.ReadLine();
        }
    }
}

The code example gets the contents of a web page using RestSharp library. The web page is downloaded asynchronously.

var client = new RestClient("http://www.something.com");

A rest client is created with the RestClient class.

var request = new RestRequest("", Method.GET);

A GET request is created with RestRequest.

client.ExecuteAsync(request, response => {
    Console.WriteLine(response.Content);
});

The request is executed asynchronously using the ExecuteAsync() method.

In this article, we have shown how to read a web page in C#. You might also be interested in the following related tutorials: MySQL C# tutorial, Date and time in C#, Reading text files in C#, or C# Winforms tutorial.