Perl LWP programming

In this article, we show how to work with the Perl LWP module. We grab data, post data, and connect to secure web pages.

LWP is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions to write WWW clients. LWP is short for Library for WWW in Perl.

LWP::Simple

LWP::Simple is a simple procedural interface to LWP. It contains a few functions for easy working with web pages. The LWP::Simple module is handy for simple cases but it does not support more advanced features such as cookies or authorization.

The get function

The get function fetches the document identified by the given URL and returns it. It returns undef if it fails. The $url argument can be either a string or a reference to a URI object.

simple_get.pl
#!/usr/bin/perl -w

use strict;
use LWP::Simple;

my $cont = get('http://www.something.com') or die 'Unable to get page';

print $cont;

The script grabs the content of the www.something.com web page.

$ ./simple_get.pl 
<html><head><title>Something.</title></head>
<body>Something.</body>
</html>

This is the output of the simple_get.pl script.

The following program gets a small web page and strips its HTML tags.

simple_strip_html.pl
#!/usr/bin/perl -w

use strict;
use LWP::Simple;
 
my $cont = get("http://www.something.com");

foreach ($cont) {
    s/<[^>]*>//g;
    print;
}

The script strips the HTML tags of the www.something.com web page.

$ ./simple_strip_html.pl 
Something.
Something.

The script prints the web page's title and content.

The head function

The head function retrieves document headers. On success, it returns the following five values: the content type, document length, modification time, expiration time, and server. It returns an empty list if it fails.

simple_head.pl
#!/usr/bin/perl -w

use strict;
use LWP::Simple;

my ($content_type, $doc_length, 
    $mod_time, $expires, $server) = head("http://www.something.com");

print "Content type: $content_type\n";
print "Document length: $doc_length\n";
print "Modification time: $mod_time\n";
print "Server: $server\n";

The example prints the content type, document length, modification time, and server of the www.something.com web page.

$ ./simple_head.pl 
Content type: text/html
Document length: 77
Modification time: 940865762
Server: Apache/2.4.12 (FreeBSD) OpenSSL/1.0.1l-freebsd mod_fastcgi/mod_fastcgi-SNAP-0910052141

This is the output of the simple_head.pl program.

The getstore function

The getstore function retrieves a document identified by a URL and stores it in the file. The return value is the HTTP response code.

simple_getstore.pl
#!/usr/bin/perl -w

use strict;
use LWP::Simple;

my $r = getstore('http://www.something.com', 'something.html') 
    or die 'Unable to get page';

print "Response code: $r\n"; 

The script grabs the contents of the www.something.com web page and stores it in the something.html file.

$ ./simple_getstore.pl 
Response code: 200
$ cat something.html 
<html><head><title>Something.</title></head>
<body>Something.</body>
</html>

We run the code example and check the something.html file.

It is possible to check the return code with the is_success function.

simple_getstore2.pl
#!/usr/bin/perl -w

use strict;
use LWP::Simple;

my $url = 'http://www.something.comm';

my $r = getstore($url, 'something.html') 
    or die 'Unable to get page';
    
die "Error $r on $url" unless is_success($r); 

In the example, we intentionally misspell the web page URL.

$ ./simple_getstore2.pl 
Error 500 on http://www.something.comm at ./simple_getstore2.pl line 11.

The script ends with error 500.

The LWP Class Model

The LWP Class Model contains classes for more complex work with the World-Wide Web.

The LWP::UserAgent is a class implementing a web user agent. In the application, we create and configure a LWP::UserAgent object. Then we create an instance of the HTTP::Request for the request that needs to be performed. This request is then passed to one of the request methods of the user agent, which dispatches it using the relevant protocol, and returns a HTTP::Response object. There are convenience methods for sending the most common request types: get, head, post, put, and delete.

User agent

The LWP::UserAgent is a web user agent class.

index.php
<?php 

echo $_SERVER['HTTP_USER_AGENT'];

?>

On our local machine, we have this simple PHP file. It returns the name of the user agent. For the nginx server, the location of the file can be /usr/share/nginx/html/.

agent.pl
#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;
      
my $ua = new LWP::UserAgent;
$ua->agent("My Perl script");

my $req = new HTTP::Request 'GET' => 'http://localhost/';
my $res = $ua->request($req);

if ($res->is_success) {

    print $res->content . "\n";
} else {

    print $res->status_line . "\n";
}

This script creates a simple GET request to the localhost.

my $ua = new LWP::UserAgent;

An instance of the LWP::UserAgent is created.

$ua->agent("My Perl script");

With the agent method, we set the name of the agent.

my $req = new HTTP::Request 'GET' => 'http://localhost/';

A GET request to the localhost is created.

my $res = $ua->request($req);

The request method dispatches the request object. The return value is a response object.

if ($res->is_success) {

    print $res->content . "\n";
} else {

    print $res->status_line . "\n";
}

The is_success method checks if the response has a success return code. The content method returns the raw content. The status_line the status code and message of the response.

$ ./agent.pl 
My Perl script

The server responded with the name of the agent that we have sent with the request.

The get method

The user agent's get method is a convenience method to execute an HTTP request. It saves some typing.

get_page.pl
#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;
      
my $ua = new LWP::UserAgent;
$ua->agent("My Perl script");

my $res = $ua->get('http://www.something.com');

if ($res->is_success) {

    print $res->content . "\n";
} else {

    print $res->status_line . "\n";
}

The script gets the contents of the www.something.com page. We utilize the convenience get method.

Posting a value

The post method dispatches a POST request on the given URL, providing the key/value pairs for the fill-in form content.

target.php
<?php

echo "Hello " . htmlspecialchars($_POST['name']);

?>

On our local web server, we have this target.php file. It simply prints the posted value back to the client. The htmlspecialchars() function convert special characters to HTML entities. This is for security reasons.

post_value.pl
#!/usr/bin/perl -w

use strict;
use LWP::UserAgent; 

my $ua = LWP::UserAgent->new;
my $res = $ua->post('http://localhost/target.php', 
    ['name'  =>  'Jan']);
    
if ($res->is_success) {    

    print $res->content . "\n";
} else {

    print $res->status_line . "\n";
}

The script sends a request with a name key having Jan value.

$ ./post_value.pl 
Hello Jan

This is the output of the post_value.pl script.

In the following example, we find definitions of a term on the urbandictionary.com.

post_value2.pl
#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;
use HTML::TreeBuilder;

my $ua = LWP::UserAgent->new;
my $res = $ua->post('http://www.urbandictionary.com/define.php',
    ['term'  =>  'dog'] );

my $tree = HTML::TreeBuilder->new_from_content($res->decoded_content);
my @meanings = $tree->look_down(_tag => q{div}, 'class' => 'meaning');

foreach my $el (@meanings) {
    print $el->as_text . "\n";
}

die "Error: ", $res->status_line unless $res->is_success;

In this script, we find the definitions of the term dog on urbandictionary.com. We display the definitions from the first page. The HTML::TreeBuilder is used to parse the HTML code.

$ ./post_value2.pl 
 Not a cat. Gotta love Blackadder. 
 Man's best friend, next to TV. 
...

This is a partial output of the post_value.pl2 script.

Credentials

The user agent's credentials method sets the name and password to be used for a realm. A security realm is a mechanism used for protecting web application resources.

$ sudo apt-get install apache2-utils
$ sudo htpasswd -c /etc/nginx/.htpasswd user7
New password: 
Re-type new password: 
Adding password for user user7

We use the htpasswd tool to create a user name and a password for basic HTTP authentication.

location /secure {

        auth_basic "Restricted Area";
        auth_basic_user_file /etc/nginx/.htpasswd;
}

Inside the nginx /etc/nginx/sites-available/default configuration file, we create a secured page. The name of the realm is "Restricted Area".

index.html
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

Inside the /usr/share/nginx/html/secure directory, we have this HTML file.

credentials.pl
#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;
      
my $ua = new LWP::UserAgent;
$ua->agent("My Perl script");

$ua->credentials('localhost:80', 'Restricted Area', 'user7' => '7user');

my $res = $ua->get('http://localhost/secure/');

if ($res->is_success) {

    print $res->content . "\n";
} else {

    print $res->status_line . "\n";
}

The script connects to the secure webpage; it provides the user name and the password necessary to access the page.

$ ./credentials.pl 
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

With the right credentials, the credentials.pl script returns the secured page.

In this article, we have worked with the Perl LWP module. MySQL Perl tutorial covers programming MySQL in Perl, and SQLite Perl tutorial covers programming SQLite in Perl.