Ruby HTTPClient tutorial

In this tutorial, we show how to work with the Ruby HTTPClient module. We grab data, post data, work with cookies, and connect to secure web pages. ZetCode has also a concise Ruby tutorial.

The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web.

Ruby HTTPClient provides methods for accessing Web resources via HTTP. It gives functionality of libwww-perl (LWP) in Ruby. (See ZetCode's article for Perl LWP.) The gem was created by Hiroshi NAKAMURA.

$ sudo gem install httpclient

The module is installed with the sudo gem install httpclient command.

$ service nginx status
 * nginx is running

We run nginx web server on localhost. Some of our examples will connect to PHP scripts on a locally running nginx server.

Version

The first program prints the version of the library and of the Ruby language.

version.rb
#!/usr/bin/ruby

require 'httpclient'

puts HTTPClient::LIB_NAME
puts HTTPClient::RUBY_VERSION_STRING
puts HTTPClient::VERSION

These three constants provide the library and Ruby version numbers.

$ ./version.rb 
(2.8.0, ruby 1.9.3 (2013-11-22))
ruby 1.9.3 (2013-11-22)
2.8.0

This is a sample output of the example.

The get_content function

The get_content is a high-level method for fetching documents identified by the given URL.

get_content.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new
cont = client.get_content 'http://www.something.com'

puts cont

The script grabs the content of the www.something.com web page.

cont = client.get_content 'http://www.something.com'

The get_content method returns the result as one string.

$ ./get_content.rb 
<html><head><title>Something.</title></head>
<body>Something.</body>
</html>

This is the output of the get_content.rb script.

The following program gets a small web page and strips its HTML tags.

strip_tags.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

client.get_content('http://www.something.com') do |chunk|
    puts chunk.gsub(%r{</?[^>]+?>}, '')
end

The script strips the HTML tags of the www.something.com web page.

client.get_content('http://www.something.com') do |chunk|
    puts chunk.gsub(%r{</?[^>]+?>}, '')
end

A simple regular expression is used to strip the HTML tags. In this context the get_content method returns the content in chunks of strings.

$ ./strip_tags.rb 
Something.
Something.

The script prints the web page's title and content.

Request

An HTTP request is a message send from the client to the browser to retrieve some information or to make some action.

HTTPClient's request method creates a new request. Note that the HTTPClient class has methods, such as get, post, or put, which save some typing for us.

create_request.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new
method = 'GET'
url = URI.parse 'http://www.something.com'

res = client.request method, url
puts res.body

The example creates a GET request and sends it to http://www.something.com.

method = 'GET'
url = URI.parse 'http://www.something.com'

We create a request method and URL.

res = client.request method, url

A request is made with the request method.

puts res.body

The body attribute of the message response contains the body of the message.

$ ./create_request.rb 
<html><head><title>Something.</title></head>
<body>Something.</body>
</html>

This is the output of the example.

Status

HTTP::Message represents an HTTP request or response. Its status method returns HTTP status code of the response.

status.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.get 'http://www.something.com'
puts res.status
puts HTTP::Status::successful? res.status

res = client.get 'http://www.something.com/news/'
puts res.status
puts HTTP::Status::successful? res.status

res = client.get 'http://www.urbandicionary.com/define.php?term=Dog'
puts res.status
puts HTTP::Status::successful? res.status

We perform three HTTP requests with the get method and check for the returned status.

puts HTTP::Status::successful? res.status

The HTTP::Status::successful? method tells whether the status code was successful.

$ ./status.rb 
200
true
404
false
302
false

200 is a standard response for successful HTTP requests, 404 tells that the requested resource could not be found, and 302 tells that the resource was temporarily redirected.

The head method

The head method retrieves document headers. The headers consist of fields, including date, server, content type, or last modification time.

head.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.head 'http://www.something.com'

puts "Server: " + res.header['Server'][0]
puts "Last modified: " + res.header['Last-Modified'][0]
puts "Content type: " + res.header['Content-Type'][0]
puts "Content length: " + res.header['Content-Length'][0]

The example prints the server, last modification time, content type, and content length of the www.something.com web page.

$ ./head.rb 
Server: Apache/2.4.12 (FreeBSD) OpenSSL/1.0.1l-freebsd mod_fastcgi/mod_fastcgi-SNAP-0910052141
Last modified: Mon, 25 Oct 1999 15:36:02 GMT
Content type: text/html
Content length: 77

This is the output of the head.rb program.

The get method

The get method issues a GET request to the server. The GET method requests a representation of the specified resource.

greet.php
<?php

echo "Hello " . htmlspecialchars($_GET['name']);

?>

Inside the /usr/share/nginx/html/ directory, we have this greet.php file. The script returns the value of the name variable, which was retrieved from the client. The htmlspecialchars() function converts special characters to HTML entities; e.g. & to &amp.

mget.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.get 'http://localhost/greet.php?name=Jan'

puts res.body

The script sends a variable with a value to the PHP script on the server. The variable is specified directly in the URL.

$ ./mget.rb 
Hello Jan

This is the output of the example.

$ tail -1 /var/log/nginx/access.log
127.0.0.1 - - [08/May/2016:13:15:31 +0200] "GET /greet.php?name=Jan HTTP/1.1" 200 19 "-" 
    "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))"

We examine the nginx access log.

The get method takes a second parameter where we can specify the query parameters.

mget2.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

query = {'name' => 'Jan'}
res = client.get 'http://localhost/greet.php', query

puts res.body

The example is essentially the same as the previous one.

$ ./mget2.rb 
Hello Jan

This is the output of the example.

Redirection

Redirection is the process of forwarding one URL to a different URL. The HTTP response status code 301 Moved Permanently is used for permanent URL redirection.

location = /oldpage.html {
        
        return 301 /files/newpage.html;
}

Add these lines to the nginx configuration file, which is located at /etc/nginx/sites-available/default on Debian.

$ sudo service nginx restart

After the file has been edited, we must restart nginx to apply the changes.

newpage.html
<!DOCTYPE html>
<html>
<head>
<title>New page</title>
</head>
<body>
<p>
This is a new page
</p>
</body>
</html>

This is the newpage.html file located in the nginx document root.

redirect.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.get 'http://localhost/oldpage.html', :follow_redirect => true
puts res.body

This script accesses the old page and follows the redirect.

res = client.get 'http://localhost/oldpage.html', :follow_redirect => true

The :follow_redirect option is used to follow the redirects.

$ ./redirect.rb 
<!DOCTYPE html>
<html>
<head>
<title>New page</title>
</head>
<body>
<p>
This is a new page
</p>
</body>
</html>

This is the output of the example.

$ tail -2 /var/log/nginx/access.log
127.0.0.1 - - [09/May/2016:14:08:50 +0200] "GET /oldpage.html HTTP/1.1" 301 193 "-" 
    "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))"
127.0.0.1 - - [09/May/2016:14:08:50 +0200] "GET /files/newpage.html HTTP/1.1" 200 113 "-" 
    "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))"

As we can see from the access.log file, the request was redirected to a new file name. The communication consisted of two GET messages.

User agent

In this section, we specify the name of the user agent.

agent.php
<?php 

echo $_SERVER['HTTP_USER_AGENT'];

?>

Inside the nginx document root, we have this simple PHP file. It returns the name of the user agent.

agent.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new default_header: {"User-Agent" => "Ruby script"}

res = client.get 'http://localhost/agent.php'
puts res.body

This script creates a simple GET request to the agent.php script.

client = HTTPClient.new default_header: {"User-Agent" => "Ruby script"}

In the constructor of the HTTPClient, we specify the user agent.

$ ./agent.rb 
Ruby script

The server responded with the name of the agent that we have sent with the request.

Posting a value

The post method dispatches a POST request on the given URL, providing the key/value pairs for the fill-in form content.

target.php
<?php

echo "Hello " . htmlspecialchars($_POST['name']);

?>

On our local web server, we have this target.php file. It simply prints the posted value back to the client.

post_value.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

query = {"name" => "Jan"}
res = client.post 'http://localhost/target.php', query

puts res.body

The script sends a request with a name key having Jan value. The POST request is issued with the post method.

$ ./mpost.rb 
Hello Jan

This is the output of the mpost.rb script.

$ tail -1 /var/log/nginx/access.log
127.0.0.1 - - [08/May/2016:13:38:57 +0200] "POST /target.php HTTP/1.1" 200 19 "-" 
    "HTTPClient/1.0 (2.8.0, ruby 1.9.3 (2013-11-22))"

With the POST method, the value is not send in the request URL.

Retrieving definitions from a dictionary

In the following example, we find definitions of a term on the www.dictionary.com. To parse HTML, we use the nokogiri gem. It can be installed with the sudo gem install nokogiri command.

get_term.rb
#!/usr/bin/ruby

require 'httpclient'
require 'nokogiri'

client = HTTPClient.new

term = 'dog'
res = client.get 'http://www.dictionary.com/browse/'+term

doc = Nokogiri::HTML res.body
doc.css("div.def-content").map do |node|
    puts node.text.strip!.gsub(/\s{3,}/, " ")
end

In this script, we find the definitions of the term dog on www.dictionary.com. The Nokogiri::HTML is used to parse the HTML code.

res = client.get 'http://www.dictionary.com/browse/'+term

To perform a search, we append the term at the end of the URL.

doc = Nokogiri::HTML res.body
doc.css("div.def-content").map do |node|
    puts node.text.strip!.gsub(/\s{3,}/, " ")
end

We parse the content with the Nokogiri::HTML class. The definitions are located inside the <div class="def-content"> tag. We improve the formatting by removing excessive white space.

Cookies

An HTTP cookie is a small piece of data sent from a website and stored in the user's web browser or program data subfolder while the user is browsing. When the user accesses a web page, the browser/program sends the cookie back to the server to notify the user's previous activity. Cookies have expiration dates during which they are valid.

When receiving an HTTP request, a server can send a Set-Cookie header with the response. Afterward, the cookie value is sent along with every request made to the same server in the form of a Cookie HTTP header.

cookies.php
<?php

$theme = $_COOKIE['theme'];

if (isset($theme)) {

    echo "Your theme is $theme";
} else {

    echo "You are using default theme";
    setcookie('theme', 'black-and-white', time() + (86400 * 7));
}

?>

This PHP file reads a cookie. If the cookie does not exist, it is created. The cookie stores a theme for a user.

send_cookie.rb
#!/usr/bin/ruby

require 'httpclient'

url = URI.parse "http://localhost/cookies.php"

cookie = WebAgent::Cookie.new
cookie.name = "theme"
cookie.value = "green-and-black"
cookie.url = url

client = HTTPClient.new
client.cookie_manager.add cookie

res = client.get url
puts res.body

We create a custom cookie and send it to the cookies.php page.

cookie = WebAgent::Cookie.new
cookie.name = "theme"
cookie.value = "green-and-black"
cookie.url = url

A cookie is created with the WebAgent::Cookie class.

client = HTTPClient.new
client.cookie_manager.add cookie

The cookie is added to the cookie manager.

$ ./send_cookie.rb 
Your theme is green-and-black

This is the output of the example.

Next, we are going to read a cookie and store it locally in a file.

read_cookie.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

res = client.get 'http://localhost/cookies.php'

client.set_cookie_store 'cookie.dat'
p res.header["Set-Cookie"]

client.save_cookie_store

This script reads a cookie from the PHP file and stores it locally in the cookie.dat file.

Finally, we read the stored cookie and send it to the same PHP file.

send_cookie2.rb
#!/usr/bin/ruby

require 'httpclient'

client = HTTPClient.new

cm = HTTPClient::CookieManager.new 'cookie.dat'
cm.load_cookies
client.cookie_manager = cm

res = client.get 'http://localhost/cookies.php'
p res.body

The HTTPClient::CookieManager is used to read the cookie.

$ ./send_cookie.rb 
Unknown key: Max-Age = 604800
"You are using default theme"
$ ./read_cookie.rb 
Unknown key: Max-Age = 604800
["theme=black-and-white; expires=Sun, 15-May-2016 16:00:08 GMT; Max-Age=604800"]
$ ./send_cookie.rb 
"Your theme is black-and-white"

We run the scripts. The warning message should be ignored according to the author.

Credentials

The client's set_auth method sets the name and password to be used for a realm. A security realm is a mechanism used for protecting web application resources.

$ sudo apt-get install apache2-utils
$ sudo htpasswd -c /etc/nginx/.htpasswd user7
New password: 
Re-type new password: 
Adding password for user user7

We use the htpasswd tool to create a user name and a password for basic HTTP authentication.

location /secure {

        auth_basic "Restricted Area";
        auth_basic_user_file /etc/nginx/.htpasswd;
}

Inside the nginx /etc/nginx/sites-available/default configuration file, we create a secured page. The name of the realm is "Restricted Area".

index.html
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

Inside the /usr/share/nginx/html/secure directory, we have this HTML file.

credentials.rb
#!/usr/bin/ruby

require 'httpclient'

user = 'user7'
passwd = '7user'

client = HTTPClient.new
client.set_auth 'http://localhost/secure/', user, passwd
cont = client.get_content 'http://localhost/secure/'

puts cont

The script connects to the secure webpage; it provides the user name and the password necessary to access the page.

$ ./credentials.rb 
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

With the right credentials, the credentials.rb script returns the secured page.

In this tutorial, we have worked with the Ruby HTTPClient module. There are similar Ruby Faraday tutorial and Ruby Net::HTTP tutorial on ZetCode.