Perl LWP programming
last modified September 13, 2020
In this article, we show how to work with the Perl LWP module. We grab data, post data, and connect to secure web pages.
LWP is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions to write WWW clients. LWP is short for Library for WWW in Perl.
LWP::Simple
LWP::Simple
is a simple procedural interface to LWP. It contains a few functions for
easy working with web pages. The LWP::Simple
module is handy for simple cases but
it does not support more advanced features such as cookies or authorization.
The get function
The get
function fetches the document identified by the given URL and returns it.
It returns undef
if it fails. The $url
argument can be either a string
or a reference to a URI object.
#!/usr/bin/perl -w use strict; use LWP::Simple; my $cont = get('http://webcode.me') or die 'Unable to get page'; print $cont;
The script grabs the content of the http://webcode.me
web page.
$ ./simple_get.pl <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>My html page</title> </head> <body> <p> Today is a beautiful day. We go swimming and fishing. </p> <p> Hello there. How are you? </p> </body> </html>
This is the output of the simple_get.pl
script.
The following program gets a small web page and strips its HTML tags.
#!/usr/bin/perl -w use strict; use LWP::Simple; my $cont = get('http://webcode.me'); foreach ($cont) { s/<[^>]*>//g; print; }
The script strips the HTML tags of the http://webcode.me
web page.
The head function
The head
function retrieves document headers. On success, it returns
the following five values: the content type, document length, modification time,
expiration time, and server. It returns an empty list if it fails.
#!/usr/bin/perl -w use strict; use LWP::Simple; my ($content_type, $doc_length, $mod_time, $expires, $server) = head('http://webcode.me'); print "Content type: $content_type\n"; print "Document length: $doc_length\n"; print "Modification time: $mod_time\n"; print "Server: $server\n";
The example prints the content type, document length, modification time, and
server of the http://webcode.me
web page.
$ ./simple_head.pl Content type: text/html Document length: 348 Modification time: 1563623365 Server: nginx/1.6.2
This is the output of the simple_head.pl
program.
The getstore function
The getstore
function retrieves a document identified by a
URL and stores it in the file. The return value is the HTTP response code.
#!/usr/bin/perl -w use strict; use LWP::Simple; my $r = getstore('http://webcode.me', 'webcode.html') or die 'Unable to get page'; print "Response code: $r\n";
The script grabs the contents of the http://webcode.me
web page
and stores it in the webcode.html
file.
$ ./simple_get_store.pl Response code: 200 $ cat webcode.html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>My html page</title> </head> <body> <p> Today is a beautiful day. We go swimming and fishing. </p> <p> Hello there. How are you? </p> </body> </html>
We run the code example and check the webcode.html
file.
It is possible to check the return code with the is_success
function.
#!/usr/bin/perl -w use strict; use LWP::Simple; my $url = 'http://webcode.mee'; my $r = getstore($url, 'webcode.html') or die 'Unable to get page'; die "Error $r on $url" unless is_success($r);
In the example, we intentionally misspell the web page URL.
$ ./check_return_code.pl Error 500 on http://webcode.mee at ./check_return_code.pl line 11.
The script ends with error 500.
The LWP Class Model
The LWP Class Model contains classes for more complex work with the World-Wide Web.
The LWP::UserAgent
is a class implementing a web user agent. In the
application, we create and configure a LWP::UserAgent
object. Then
we create an instance of the HTTP::Request
for the request that
needs to be performed. This request is then passed to one of the request methods
of the user agent, which dispatches it using the relevant protocol, and returns
a HTTP::Response object
. There are convenience methods for sending
the most common request types: get
, head
,
post
, put
, and delete
.
User agent
The LWP::UserAgent
is a web user agent class.
<?php echo $_SERVER['HTTP_USER_AGENT'];
On our local machine, we have this simple PHP file.
It returns the name of the user agent. For the nginx server,
the location of the file can be /usr/share/nginx/html/
.
#!/usr/bin/perl -w use strict; use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->agent("My Perl script"); my $req = new HTTP::Request 'GET' => 'http://localhost/'; my $res = $ua->request($req); if ($res->is_success) { print $res->content . "\n"; } else { print $res->status_line . "\n"; }
This script creates a simple GET request to the localhost.
my $ua = new LWP::UserAgent;
An instance of the LWP::UserAgent
is created.
$ua->agent("My Perl script");
With the agent
method, we set the name of the
agent.
my $req = new HTTP::Request 'GET' => 'http://localhost/';
A GET request to the localhost is created.
my $res = $ua->request($req);
The request
method dispatches the request object. The return value
is a response object.
if ($res->is_success) { print $res->content . "\n"; } else { print $res->status_line . "\n"; }
The is_success
method checks if the response has a success return
code. The content
method returns the raw content. The
status_line
the status code and message of the response.
$ ./agent.pl My Perl script
The server responded with the name of the agent that we have sent with the request.
The get method
The user agent's get
method is a convenience method to execute an
HTTP request. It saves some typing.
#!/usr/bin/perl -w use strict; use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->agent("My Perl script"); my $res = $ua->get('http://webcode.me'); if ($res->is_success) { print $res->content . "\n"; } else { print $res->status_line . "\n"; }
The script gets the contents of the webcode.me
page. We utilize the convenience get
method.
In the following example, we find definitions of a term on the urbandictionary.com.
#!/usr/bin/perl -w use strict; use LWP::UserAgent; use HTML::TreeBuilder; my %parameters = (term => 'dog'); my $url = URI->new('https://www.urbandictionary.com/define.php'); $url->query_form(%parameters); my $ua = LWP::UserAgent->new; my $res = $ua->get($url); my $tree = HTML::TreeBuilder->new_from_content($res->decoded_content); my @meanings = $tree->look_down(_tag => q{div}, 'class' => 'meaning'); foreach my $el (@meanings) { print $el->as_text . "\n"; } die "Error: ", $res->status_line unless $res->is_success;
In this script, we find the definitions of the term dog on urbandictionary.com
.
We display the definitions from the first page. The HTML::TreeBuilder
is
used to parse the HTML code.
Posting a value
The post
method dispatches a POST request on the given
URL, providing the key/value pairs for the fill-in form content.
<?php echo "Hello " . htmlspecialchars($_POST['name']);
On our local web server, we have this target.php
file. It simply
prints the posted value back to the client. The htmlspecialchars()
function convert special characters to HTML entities. This is for security reasons.
#!/usr/bin/perl -w use strict; use LWP::UserAgent; my $ua = LWP::UserAgent->new; my $res = $ua->post('http://localhost/target.php', ['name' => 'Jan']); if ($res->is_success) { print $res->content . "\n"; } else { print $res->status_line . "\n"; }
The script sends a request with a name
key having Jan
value.
$ ./post_value.pl Hello Jan
This is the output of the post_value.pl
script.
Credentials
The user agent's credentials
method sets the name and password
to be used for a realm. A security realm is a mechanism used for protecting
web application resources.
$ sudo apt-get install apache2-utils $ sudo htpasswd -c /etc/nginx/.htpasswd user7 New password: Re-type new password: Adding password for user user7
We use the htpasswd
tool to create a user name and a password
for basic HTTP authentication.
location /secure { auth_basic "Restricted Area"; auth_basic_user_file /etc/nginx/.htpasswd; }
Inside the nginx /etc/nginx/sites-available/default
configuration file,
we create a secured page. The name of the realm is "Restricted Area".
<!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> <p> This is a secure page. </p> </body> </html>
Inside the /usr/share/nginx/html/secure
directory, we have
this HTML file.
#!/usr/bin/perl -w use strict; use LWP::UserAgent; my $ua = new LWP::UserAgent; $ua->agent("My Perl script"); $ua->credentials('localhost:80', 'Restricted Area', 'user7' => 's$cret'); my $res = $ua->get('http://localhost/secure/'); if ($res->is_success) { print $res->content . "\n"; } else { print $res->status_line . "\n"; }
The script connects to the secure webpage; it provides the user name and the password necessary to access the page.
$ ./credentials.pl <!DOCTYPE html> <html lang="en"> <head> <title>Secure page</title> </head> <body> <p> This is a secure page. </p> </body> </html>
With the right credentials, the credentials.pl
script returns
the secured page.
In this article, we have worked with the Perl LWP module.