Python Requests tutorial

In this tutorial, we show how to work with the Python Requests module. We grab data, post data, stream data, and connect to secure web pages. ZetCode has also a concise Python tutorial.

The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web.

Python requests

Requests is a simple and elegant Python HTTP library. It provides methods for accessing Web resources via HTTP. Requests is a built-in Python module.

$ sudo service nginx start

We run nginx web server on localhost. Some of our examples will connect to PHP scripts on a locally running nginx server.

Python requests version

The first program prints the version of the Requests library.

version.py
#!/usr/bin/python3

import requests

print(requests.__version__)
print(requests.__copyright__)

The program prints the version and copyright of Requests.

$ ./version.py 
2.2.1
Copyright 2014 Kenneth Reitz

This is a sample output of the example.

Python requests reading a web page

The get() method issues a GET request; it fetches documents identified by the given URL.

read_webpage.py
#!/usr/bin/python3

import requests as req

resp = req.get("http://www.something.com")

print(resp.text)

The script grabs the content of the www.something.com web page.

resp = req.get("http://www.something.com")

The get() method returns a response object.

print(resp.text)

The text attribute contains the content of the response, in Unicode.

$ ./read_webpage.py 
<html><head><title>Something.</title></head>
<body>Something.</body>
</html>

This is the output of the read_webpage.py script.

The following program gets a small web page and strips its HTML tags.

strip_tags.py
#!/usr/bin/python3

import requests as req
import re

resp = req.get("http://www.something.com")

content = resp.text

stripped = re.sub('<[^<]+?>', '', content)
print(stripped)

The script strips the HTML tags of the www.something.com web page.

stripped = re.sub('<[^<]+?>', '', content)

A simple regular expression is used to strip the HTML tags.

$ ./strip_tags.py 
Something.
Something.

The script prints the web page title and content.

Request

An HTTP request is a message send from the client to the browser to retrieve some information or to make some action.

Request's request method creates a new request. Note that the request module has some higher-level methods, such as get(), post(), or put(), which save some typing for us.

create_request.py
#!/usr/bin/python3

import requests as req

resp = req.request(method='GET', url="http://www.something.com")
print(resp.text)

The example creates a GET request and sends it to http://www.something.com.

Python requests getting status

The Response object contains a server's response to an HTTP request. Its status_code attribute returns HTTP status code of the response, such as 200 or 404.

get_status.py
#!/usr/bin/python3

import requests as req

resp = req.get("http://www.something.com")
print(resp.status_code)

resp = req.get("http://www.something.com/news/")
print(resp.status_code)

We perform two HTTP requests with the get() method and check for the returned status.

$ ./get_status.py 
200
404

200 is a standard response for successful HTTP requests and 404 tells that the requested resource could not be found.

Python requests head method

The head() method retrieves document headers. The headers consist of fields, including date, server, content type, or last modification time.

head_request.py
#!/usr/bin/python3

import requests as req

resp = req.head("http://www.something.com")

print("Server: " + resp.headers['server'])
print("Last modified: " + resp.headers['last-modified'])
print("Content type: " + resp.headers['content-type'])
print("Content length: " + resp.headers['content-length'])

The example prints the server, last modification time, content type, and content length of the www.something.com web page.

$ ./head_request.py 
Server: Apache/2.4.12 (FreeBSD) OpenSSL/1.0.1l-freebsd mod_fastcgi/mod_fastcgi-SNAP-0910052141
Last modified: Mon, 25 Oct 1999 15:36:02 GMT
Content type: text/html
Content length: 72

This is the output of the head_request.py program.

Python requests get method

The get() method issues a GET request to the server. The GET method requests a representation of the specified resource.

greet.php
<?php

echo "Hello " . htmlspecialchars($_GET['name']);

?>

Inside the /usr/share/nginx/html/ directory, we have this greet.php file. The script returns the value of the name variable, which was retrieved from the client. The htmlspecialchars() function converts special characters to HTML entities; e.g. & to &amp.

mget.py
#!/usr/bin/python3

import requests as req

resp = req.get("http://localhost/greet.php?name=Peter")
print(resp.text)

The script sends a variable with a value to the PHP script on the server. The variable is specified directly in the URL.

$ ./mget.py 
Hello Peter

This is the output of the example.

$ tail -1 /var/log/nginx/access.log
127.0.0.1 - - [16/Oct/2016:21:12:16 +0200] "GET /greet.php?name=Peter HTTP/1.1" 200 42 "-" 
"python-requests/2.2.1 CPython/3.4.3 Linux/3.13.0-98-generic"

We examine the nginx access log.

The get() method takes a params parameter where we can specify the query parameters.

mget2.py
#!/usr/bin/python3

import requests as req

payload = {'name': 'Peter', 'age': 23}
resp = req.get("http://httpbin.org/get", params=payload)

print(resp.url)
print(resp.text)

The httpbin.org is a freely available HTTP Request & Response Service.

payload = {'name': 'Peter', 'age': 23}

The data is sent in a Python dictionary.

resp = req.get("http://httpbin.org/get", params=payload)

We send a GET request to the httpbin.org site and pass the data, which is specified in the params parameter.

print(resp.url)
print(resp.text)

We print the URL and the response content to the console.

$ ./mget2.py 
http://httpbin.org/get?age=23&name=Peter
{
  "args": {
    "age": "23", 
    "name": "Peter"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate, compress", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.2.1 CPython/3.4.3 Linux/3.13.0-98-generic"
  }, 
  "origin": "89.173.201.81", 
  "url": "http://httpbin.org/get?age=23&name=Peter"
}

This is the output of the example.

Python requests redirection

Redirection is a process of forwarding one URL to a different URL. The HTTP response status code 301 Moved Permanently is used for permanent URL redirection; 302 Found for a temporary redirection.

redirect.py
#!/usr/bin/python3

import requests as req

resp = req.get("http://www.google.com")

print(resp.status_code)
print(resp.history)
print(resp.url)

In the example, we issue a GET request to the www.google.com page. This page redirects to another page; redirect responses are stored in the history attribute of the response.

$ ./redirect.py 
200
(<Response [302]>,)
http://www.google.sk/?gfe_rd=cr&ei=y7cEWJPAFbPb8AfeqLSAAg

A GET request to www.google.com was 302 redirected to another web page, whose URL we see in the last row.

In the second example, we do not follow a redirect.

redirect2.py
#!/usr/bin/python3


import requests as req

resp = req.get("http://www.google.com", allow_redirects=False)

print(resp.status_code)
print(resp.url)

The allow_redirects parameter specifies whether the redirect is followed; the redirects are followed by default.

$ ./redirect2.py 
302
http://www.google.com/

This is the output of the example.

In the third example, we show how to set up a page redirect in nginx server.

location = /oldpage.html {
        
        return 301 /files/newpage.html;
}

Add these lines to the nginx configuration file, which is located at /etc/nginx/sites-available/default on Debian.

$ sudo service nginx restart

After the file has been edited, we must restart nginx to apply the changes.

newpage.html
<!DOCTYPE html>
<html>
<head>
<title>New page</title>
</head>
<body>
<p>
This is a new page
</p>
</body>
</html>

This is the newpage.html file located in the nginx document root.

redirect3.py
#!/usr/bin/python3

import requests as req

resp = req.get("http://localhost/oldpage.html")

print(resp.status_code)
print(resp.history)
print(resp.url)

print(resp.text)

This script accesses the old page and follows the redirect. As we already mentioned, Requests follows redirects by default.

$ ./redirect3.py 
200
(<Response [301]>,)
http://localhost/files/newpage.html
<!DOCTYPE html>
<html>
<head>
<title>New page</title>
</head>
<body>
<p>
This is a new page
</p>
</body>
</html>

This is the output of the example.

$ tail -2 /var/log/nginx/access.log
127.0.0.1 - - [17/Oct/2016:13:45:39 +0200] "GET /oldpage.html HTTP/1.1" 301 193 "-" 
"python-requests/2.2.1 CPython/3.4.3 Linux/3.13.0-98-generic"
127.0.0.1 - - [17/Oct/2016:13:45:39 +0200] "GET /files/newpage.html HTTP/1.1" 200 109 "-" 
"python-requests/2.2.1 CPython/3.4.3 Linux/3.13.0-98-generic"

As we can see from the access.log file, the request was redirected to a new file name. The communication consisted of two GET messages.

User agent

In this section, we specify the name of the user agent.

agent.php
<?php 

echo $_SERVER['HTTP_USER_AGENT'];

?>

Inside the nginx document root, we have this simple PHP file. It returns the name of the user agent.

user_agent.py
#!/usr/bin/python3

import requests as req

h = {'user-agent': 'Python script'}

resp = req.get("http://localhost/agent.php", headers=h)
print(resp.text)

This script creates a simple GET request to the agent.php script. To add HTTP headers to a request, we pass in a dictionary to the headers parameter.

h = {'user-agent': 'Python script'}

The header values are placed in a Python dictionary.

resp = req.get("http://localhost/agent.php", headers=h)

The values are passed to the headers parameter.

$ ./user_agent.py 
Python script

The server responded with the name of the agent that we have sent with the request.

Posting a value

The post method dispatches a POST request on the given URL, providing the key/value pairs for the fill-in form content.

target.php
<?php

echo "Hello " . htmlspecialchars($_POST['name']);

?>

On our local web server, we have this target.php file. It simply prints the posted value back to the client.

mpost.py
#!/usr/bin/python3

import requests as req

d = {'name': 'Peter'}

resp = req.post("http://localhost/target.php", d)
print(resp.text)

The script sends a request with a name key having Peter value. The POST request is issued with the post method.

$ ./mpost.py 
Hello Peter

This is the output of the mpost.py script.

$ tail -1 /var/log/nginx/access.log
127.0.0.1 - - [16/Oct/2016:21:22:50 +0200] "POST /target.php HTTP/1.1" 200 42 "-" 
"python-requests/2.2.1 CPython/3.4.3 Linux/3.13.0-98-generic"

With the POST method, the value is not send in the request URL.

JSON

JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write and for machines to parse and generate.

JSON data is a collection of key/value pairs; in Python, it is realized by a dictionary.

send_json.php
<?php

$data = [ 'name' => 'Jane', 'age' => 17 ];
header('Content-Type: application/json');

echo json_encode($data);

?>

The PHP script sends JSON data. It uses the json_encode() function to do the job.

read_json.py
#!/usr/bin/python3

import requests as req

resp = req.get("http://localhost/send_json.php")
print(resp.json())

The read_json.py reads JSON data sent by the PHP script.

print(resp.json())

The json() method returns the json-encoded content of a response, if any.

$ ./read_json.py 
{'age': 17, 'name': 'Jane'}

This is the output of the example.

Next, we send JSON data to a PHP script from a Python script.

parse_json.php
<?php

$data = file_get_contents("php://input");

$json = json_decode($data , true);

foreach ($json as $key => $value) {

    if (!is_array($value)) {
        echo "The $key is $value\n";
    } else {
        foreach ($value as $key => $val) {
            echo "The $key is $value\n";
        }
    }
}
?>

This PHP script reads JSON data and sends back a message with the parsed values.

send_json.py
#!/usr/bin/python3

import requests as req

data = {'name': 'Jane', 'age': 17}

resp = req.post("http://localhost/parse_json.php", json=data)
print(resp.text)

This script sends JSON data to the PHP application and reads its response.

data = {'name': 'Jane', 'age': 17}

This is the data to be sent.

resp = req.post("http://localhost/parse_json.php", json=data)

The dictionary containing JSON data is passed to the json parameter.

$ ./send_json.py 
The name is Jane
The age is 17

This is the example output.

Retrieving definitions from a dictionary

In the following example, we find definitions of a term on the www.dictionary.com. To parse HTML, we use the lxml module. It can be installed with the sudo apt-get install python3-lxml command, or with the Python pip tool.

get_term.py
#!/usr/bin/python3

import requests as req
from lxml import html
import textwrap

term = "dog"

resp = req.get("http://www.dictionary.com/browse/" + term)
root = html.fromstring(resp.content)

for sel in root.xpath("//div[@class='def-content']"):
    
    s = sel.text.strip()
    
    if (len(s) > 3):
        
        print(textwrap.fill(s, width=50))   

In this script, we find the definitions of the term dog on www.dictionary.com. The lxml module is used to parse the HTML code.

from lxml import html

The lxml module can be used to parse HTML.

import textwrap

The textwrap module is used to wrap text to a certain width.

resp = req.get("http://www.dictionary.com/browse/" + term)

To perform a search, we append the term at the end of the URL.

root = html.fromstring(resp.content)

We need to use resp.content rather than resp.content because html.fromstring() implicitly expects bytes as input. (The resp.content returns content in bytes whereas resp.text as Unicode text.

for sel in root.xpath("//div[@class='def-content']"):
    
    s = sel.text.strip()
    
    if (len(s) > 3):
        
        print(textwrap.fill(s, width=50))    

We parse the content. The main definitions are located inside the <div class="def-content"> tag. We improve the formatting by removing excessive white space and stray characters. The text width has maximum of 50 characters. Note that such parsing is subject to change.

$ ./get_term.py 
a domesticated canid,
any carnivore of the dogfamily Canidae, having
prominent canine teeth and, in the wild state, a
long and slender muzzle, a deep-chested muscular
body, a bushy tail, and large, erect ears.
...

This is a partial list of the definitions.

Python requests streaming requests

Streaming is transmitting a continuous flow of audio and/or video data while earlier parts are being used. The Requests.iter_lines() iterates over the response data, one line at a time. Setting stream=True on the request avoids reading the content at once into memory for large responses.

streaming.py
#!/usr/bin/python3

import requests as req

url = "https://docs.oracle.com/javase/specs/jls/se8/jls8.pdf"

local_filename = url.split('/')[-1]

r = req.get(url, stream=True)

with open(local_filename, 'wb') as f:

    for chunk in r.iter_content(chunk_size=1024): 
    
        f.write(chunk)  

The example streams a PDF file and writes it on the disk.

r = req.get(url, stream=True)

Setting stream to True when making a request, Requests cannot release the connection back to the pool unless we consume all the data or call Response.close().

with open(local_filename, 'wb') as f:

    for chunk in r.iter_content(chunk_size=1024): 
    
        f.write(chunk) 

We read the resource by 1 KB chunks and write them to a local file.

Python requests credentials

The auth parameter provides a basic HTTP authentication; it takes a tuple of a name and a password to be used for a realm. A security realm is a mechanism used for protecting web application resources.

$ sudo apt-get install apache2-utils
$ sudo htpasswd -c /etc/nginx/.htpasswd user7
New password: 
Re-type new password: 
Adding password for user user7

We use the htpasswd tool to create a user name and a password for basic HTTP authentication.

location /secure {

        auth_basic "Restricted Area";
        auth_basic_user_file /etc/nginx/.htpasswd;
}

Inside the nginx /etc/nginx/sites-available/default configuration file, we create a secured page. The name of the realm is "Restricted Area".

index.html
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

Inside the /usr/share/nginx/html/secure directory, we have this HTML file.

credentials.py
#!/usr/bin/python3


import requests as req

user = 'user7'
passwd = '7user'

resp = req.get("http://localhost/secure/", auth=(user, passwd))
print(resp.text)

The script connects to the secure webpage; it provides the user name and the password necessary to access the page.

$ ./credentials.py
<!DOCTYPE html>
<html lang="en">
<head>
<title>Secure page</title>
</head>

<body>

<p>
This is a secure page.
</p>

</body>

</html>

With the right credentials, the credentials.py script returns the secured page.

In this tutorial, we have worked with the Python Requests module. You might be interested in the following related tutorials: Python list comprehensions, Python simplejson tutorial, Openpyxl tutorial, Python CSV tutorial, and Python tutorial on ZetCode.