ZetCode

Perl LWP programming

last modified July 8, 2021

Perl LWP tutorial shows how to do WWW programming in Perl with LWP module.

LWP is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions to write WWW clients. LWP is short for Library for WWW in Perl.

LWP::Simple

LWP::Simple is a simple procedural interface to LWP. It contains a few functions for easy working with web pages. The LWP::Simple module is handy for simple cases but it does not support more advanced features such as cookies or authorization.

The get function

The get function fetches the document identified by the given URL and returns it. It returns undef if it fails. The $url argument can be either a string or a reference to a URI object.

simple_get.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::Simple;

my $cont = get('http://webcode.me') or die 'Unable to get page';

say $cont;

The script grabs the content of the http://webcode.me web page.

$ ./simple_get.pl
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>My html page</title>
</head>
<body>

    <p>
        Today is a beautiful day. We go swimming and fishing.
    </p>

    <p>
         Hello there. How are you?
    </p>

</body>
</html>

This is the output of the simple_get.pl script.

The following program gets a small web page and strips its HTML tags.

strip_tags.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::Simple;
 
my $cont = get('http://webcode.me');

foreach ($cont) {
    s/<[^>]*>//g;
    print;
}

The script strips the HTML tags of the http://webcode.me web page.

The head function

The head function retrieves document headers. On success, it returns the following five values: the content type, document length, modification time, expiration time, and server. It returns an empty list if it fails.

head_fun.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::Simple;

my ($content_type, $doc_length, 
    $mod_time, $expires, $server) = head("http://webcode.me");

say "Content type: $content_type";
say "Document length: $doc_length";
say "Modification time: $mod_time";
say "Server: $server";

The example prints the content type, document length, modification time, and server of the http://webcode.me web page.

$ ./simple_head.pl
Content type: text/html
Document length: 348
Modification time: 1563623365
Server: nginx/1.6.2

The getstore function

The getstore function retrieves a document identified by a URL and stores it in the file. The return value is the HTTP response code.

get_store.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::Simple;

my $r = getstore('http://webcode.me', 'webcode.html') 
    or die 'Unable to get page';

say "Response code: $r"; 

The script grabs the contents of the http://webcode.me web page and stores it in the webcode.html file.

$ ./get_store.pl
Response code: 200
$ cat webcode.html
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>My html page</title>
</head>
<body>

    <p>
        Today is a beautiful day. We go swimming and fishing.
    </p>

    <p>
         Hello there. How are you?
    </p>

</body>
</html>

It is possible to check the return code with the is_success function.

check_return_code.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::Simple;

my $url = 'http://webcode.mee';

my $r = getstore($url, 'webcode.html') 
    or die 'Unable to get page';
    
die "Error $r on $url" unless is_success($r); 

In the example, we intentionally misspell the web page URL.

$ ./check_return_code.pl
Error 500 on http://webcode.mee at ./check_return_code.pl line 11.

The LWP Class Model

The LWP Class Model contains classes for more complex work with the World-Wide Web.

The LWP::UserAgent is a class implementing a web user agent. In the application, we create and configure a LWP::UserAgent object. Then we create an instance of the HTTP::Request for the request that needs to be performed. This request is then passed to one of the request methods of the user agent, which dispatches it using the relevant protocol, and returns a HTTP::Response object. There are convenience methods for sending the most common request types: get, head, post, put, and delete.

User agent

The LWP::UserAgent is a web user agent class.

$ cpanm Mojolicious::Lite

We install Mojolicious framework.

server.php
#!/usr/bin/perl

use Mojolicious::Lite -signatures;

get '/' => sub ($c) {

    my $ua = $c->req->headers->user_agent;

    $c->render(text => $ua);
};

app->start;

The server processes the client request, determines the user agent, and returns the user agent back to the client.

$ perl server.pl daemon
[2021-07-08 13:02:55.63239] [49095] [info] Listening at "http://*:3000"
Web application available at http://127.0.0.1:3000

We start our server; it listens on port 3000.

agent.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::UserAgent;

my $ua = new LWP::UserAgent;
$ua->agent('Perl script');

my $req = new HTTP::Request 'GET' => 'http://localhost:3000';
my $res = $ua->request($req);

if ($res->is_success) {

    say $res->content;
} else {

    say $res->status_line;
}

This script creates a simple GET request to the localhost.

my $ua = new LWP::UserAgent;

An instance of the LWP::UserAgent is created.

$ua->agent("Perl script");

With the agent method, we set the name of the agent.

my $req = new HTTP::Request 'GET' => 'http://localhost:3000';

A GET request to the localhost is created.

my $res = $ua->request($req);

The request method dispatches the request object. The return value is a response object.

if ($res->is_success) {

    say $res->content;
} else {

    say $res->status_line;
}

The is_success method checks if the response has a success return code. The content method returns the raw content. The status_line the status code and message of the response.

$ ./agent.pl
Perl script

The server responded with the name of the agent that we have sent with the request.

The get method

The user agent's get method is a convenience method to execute an HTTP request. It saves some typing.

get_page.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::UserAgent;
      
my $ua = new LWP::UserAgent;
$ua->agent("Perl script");

my $res = $ua->get('http://webcode.me');

if ($res->is_success) {

    say $res->content;
} else {

    say $res->status_line;
}

The script gets the contents of the webcode.me page. We utilize the convenience get method.

In the following example, we find definitions of a term on the urbandictionary.com.

get_definition.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::UserAgent;
use HTML::TreeBuilder;

my $word = shift || 'dog';

my %parameters = (term => $word);
my $url = URI->new('https://www.urbandictionary.com/define.php');
$url->query_form(%parameters);

my $ua = LWP::UserAgent->new;
my $res = $ua->get($url);

my $tree = HTML::TreeBuilder->new_from_content($res->decoded_content);
my @meanings = $tree->look_down(_tag => q{div}, 'class' => 'meaning');

foreach my $el (@meanings) {

    say $el->as_text;
}

die "Error: ", $res->status_line unless $res->is_success;

In this script, we find the definitions of the term dog on urbandictionary.com. We display the definitions from the first page. The HTML::TreeBuilder is used to parse the HTML code.

Posting a value

The post method dispatches a POST request on the given URL, providing the key/value pairs for the fill-in form content.

server2.pl
#!/usr/bin/perl

use Mojolicious::Lite -signatures;

post '/' => sub ($c) {

    my $name = $c->param('name');

    $c->render(text => "Hello $name!");
};

app->start;

In the handler, we get the name parameter. From the parameter, we build a message and send it to the client.

post_value.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;

my $res = $ua->post('http://localhost:3000/',
    ['name'  =>  'Jan']);

if ($res->is_success) {

    say $res->content;
} else {

    say $res->status_line;
}

The script sends a request with a name key having Jan value.

$ ./post_value.pl
Hello Jan!

Credentials

The user agent's credentials method sets the name and password to be used for a realm. A security realm is a mechanism used for protecting web application resources.

$ sudo apt-get install apache2-utils
$ sudo htpasswd -c /etc/nginx/.htpasswd user7
New password:
Re-type new password:
Adding password for user user7

We use the htpasswd tool to create a user name and a password for basic HTTP authentication.

location /secure {

    auth_basic "Restricted Area";
    auth_basic_user_file /etc/nginx/.htpasswd;
}

Inside the nginx /etc/nginx/sites-available/default configuration file, we create a secured page. The name of the realm is "Restricted Area".

index.html
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

Inside the /usr/share/nginx/html/secure directory, we have this HTML file.

credentials.pl
#!/usr/bin/perl

use 5.30.0;
use warnings;
use LWP::UserAgent;

my $ua = new LWP::UserAgent;
$ua->agent("Perl script");

$ua->credentials('localhost:80', 'Restricted Area', 'user7' => 's$cret');

my $res = $ua->get('http://localhost/secure/');

if ($res->is_success) {

    say $res->content;
} else {

    say $res->status_line;
}

The script connects to the secure webpage; it provides the user name and the password necessary to access the page.

$ ./credentials.pl
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

With the right credentials, the credentials.pl script returns the secured page.

In this tutorial, we have worked with the Perl LWP module.

List all Perl tutorials.