Perl LWP programming
last modified August 24, 2023
In this article we show how to do WWW programming in Perl with LWP module.
LWP is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions to write WWW clients. LWP is short for Library for WWW in Perl.
LWP::Simple
LWP::Simple
is a simple procedural interface to LWP. It contains a
few functions for easy working with web pages. The LWP::Simple
module is handy for simple cases but it does not support more advanced features
such as cookies or authorization.
The get function
The get
function fetches the document identified by the given URL
and returns it. It returns undef
if it fails. The $url
argument can be either a string or a reference to a URI object.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::Simple; my $cont = get('http://webcode.me') or die 'Unable to get page'; say $cont;
The script grabs the content of the http://webcode.me
web page.
$ ./simple_get.pl <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>My html page</title> </head> <body> <p> Today is a beautiful day. We go swimming and fishing. </p> <p> Hello there. How are you? </p> </body> </html>
This is the output of the simple_get.pl
script.
The following program gets a small web page and strips its HTML tags.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::Simple; my $cont = get('http://webcode.me'); foreach ($cont) { s/<[^>]*>//g; print; }
The script strips the HTML tags of the http://webcode.me
web page.
The head function
The head
function retrieves document headers. On success, it returns
the following five values: the content type, document length, modification time,
expiration time, and server. It returns an empty list if it fails.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::Simple; my ($content_type, $doc_length, $mod_time, $expires, $server) = head("http://webcode.me"); say "Content type: $content_type"; say "Document length: $doc_length"; say "Modification time: $mod_time"; say "Server: $server";
The example prints the content type, document length, modification time, and
server of the http://webcode.me
web page.
$ ./simple_head.pl Content type: text/html Document length: 348 Modification time: 1563623365 Server: nginx/1.6.2
The getstore function
The getstore
function retrieves a document identified by a URL and
stores it in the file. The return value is the HTTP response code.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::Simple; my $r = getstore('http://webcode.me', 'webcode.html') or die 'Unable to get page'; say "Response code: $r";
The script grabs the contents of the http://webcode.me
web page
and stores it in the webcode.html
file.
$ ./get_store.pl Response code: 200 $ cat webcode.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>My html page</title> </head> <body> <p> Today is a beautiful day. We go swimming and fishing. </p> <p> Hello there. How are you? </p> </body> </html>
It is possible to check the return code with the is_success
function.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::Simple; my $url = 'http://webcode.mee'; my $r = getstore($url, 'webcode.html') or die 'Unable to get page'; die "Error $r on $url" unless is_success($r);
In the example, we intentionally misspell the web page URL.
$ ./check_return_code.pl Error 500 on http://webcode.mee at ./check_return_code.pl line 11.
The LWP Class Model
The LWP Class Model contains classes for more complex work with the World-Wide Web.
The LWP::UserAgent
is a class implementing a web user agent. In the
application, we create and configure a LWP::UserAgent
object. Then
we create an instance of the HTTP::Request
for the request that
needs to be performed. This request is then passed to one of the request methods
of the user agent, which dispatches it using the relevant protocol, and returns
a HTTP::Response object
. There are convenience methods for sending
the most common request types: get
, head
,
post
, put
, and delete
.
User agent
The LWP::UserAgent
is a web user agent class.
$ cpanm Mojolicious::Lite
We install Mojolicious framework.
#!/usr/bin/perl use Mojolicious::Lite -signatures; get '/' => sub ($c) { my $ua = $c->req->headers->user_agent; $c->render(text => $ua); }; app->start;
The server processes the client request, determines the user agent, and returns the user agent back to the client.
$ perl server.pl daemon [2021-07-08 13:02:55.63239] [49095] [info] Listening at "http://*:3000" Web application available at http://127.0.0.1:3000
We start our server; it listens on port 3000.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->agent('Perl script'); my $req = new HTTP::Request 'GET' => 'http://localhost:3000'; my $res = $ua->request($req); if ($res->is_success) { say $res->content; } else { say $res->status_line; }
This script creates a simple GET request to the localhost.
my $ua = new LWP::UserAgent;
An instance of the LWP::UserAgent
is created.
$ua->agent("Perl script");
With the agent
method, we set the name of the
agent.
my $req = new HTTP::Request 'GET' => 'http://localhost:3000';
A GET request to the localhost is created.
my $res = $ua->request($req);
The request
method dispatches the request object. The return value
is a response object.
if ($res->is_success) { say $res->content; } else { say $res->status_line; }
The is_success
method checks if the response has a success return
code. The content
method returns the raw content. The
status_line
the status code and message of the response.
$ ./agent.pl Perl script
The server responded with the name of the agent that we have sent with the request.
The get method
The user agent's get
method is a convenience method to execute an
HTTP request. It saves some typing.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->agent("Perl script"); my $res = $ua->get('http://webcode.me'); if ($res->is_success) { say $res->content; } else { say $res->status_line; }
The script gets the contents of the webcode.me
page. We utilize the
convenience get
method.
In the following example, we find definitions of a term on the urbandictionary.com.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::UserAgent; use HTML::TreeBuilder; my $word = shift || 'dog'; my %parameters = (term => $word); my $url = URI->new('https://www.urbandictionary.com/define.php'); $url->query_form(%parameters); my $ua = LWP::UserAgent->new; my $res = $ua->get($url); my $tree = HTML::TreeBuilder->new_from_content($res->decoded_content); my @meanings = $tree->look_down(_tag => q{div}, 'class' => 'meaning'); foreach my $el (@meanings) { say $el->as_text; } die "Error: ", $res->status_line unless $res->is_success;
In this script, we find the definitions of the term dog on
urbandictionary.com
. We display the definitions from the first
page. The HTML::TreeBuilder
is used to parse the HTML code.
Posting a value
The post
method dispatches a POST request on the given
URL, providing the key/value pairs for the fill-in form content.
#!/usr/bin/perl use Mojolicious::Lite -signatures; post '/' => sub ($c) { my $name = $c->param('name'); $c->render(text => "Hello $name!"); }; app->start;
In the handler, we get the name parameter. From the parameter, we build a message and send it to the client.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::UserAgent; my $ua = LWP::UserAgent->new; my $res = $ua->post('http://localhost:3000/', ['name' => 'Jan']); if ($res->is_success) { say $res->content; } else { say $res->status_line; }
The script sends a request with a name
key having Jan
value.
$ ./post_value.pl Hello Jan!
Credentials
The user agent's credentials
method sets the name and password
to be used for a realm. A security realm is a mechanism used for protecting
web application resources.
$ sudo apt-get install apache2-utils $ sudo htpasswd -c /etc/nginx/.htpasswd user7 New password: Re-type new password: Adding password for user user7
We use the htpasswd
tool to create a user name and a password
for basic HTTP authentication.
location /secure { auth_basic "Restricted Area"; auth_basic_user_file /etc/nginx/.htpasswd; }
Inside the nginx /etc/nginx/sites-available/default
configuration
file, we create a secured page. The name of the realm is "Restricted Area".
<!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> <p> This is a secure page. </p> </body> </html>
Inside the /usr/share/nginx/html/secure
directory, we have
this HTML file.
#!/usr/bin/perl use 5.30.0; use warnings; use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->agent("Perl script"); $ua->credentials('localhost:80', 'Restricted Area', 'user7' => 's$cret'); my $res = $ua->get('http://localhost/secure/'); if ($res->is_success) { say $res->content; } else { say $res->status_line; }
The script connects to the secure webpage; it provides the user name and the password necessary to access the page.
$ ./credentials.pl <!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> <p> This is a secure page. </p> </body> </html>
With the right credentials, the credentials.pl
script returns
the secured page.
In this article we have worked with the Perl LWP module.
Author
List all Perl tutorials.