Python pickle
last modified January 29, 2024
Python pickle tutorial shows how to do data serialization in Python with the pickle module.
The pickle module
The pickle module implements binary protocols for serializing and deserializing a Python object structure. Serialization is the process of converting an object in memory to a byte stream that can be stored on disk or sent over a network. Deserialization is the process of converting a byte stream to Python object.
This process is also called pickling/unpickling or marshalling/unmarshalling.
The pickletools
module contains tools for analyzing data streams
generated by pickle.
Python pickle serialize
The following example serializes data into a binary file.
#!/usr/bin/python import pickle data = { 'a': [1, 4.0, 3, 4+6j], 'b': ("a red fox", b"and old falcon"), 'c': {None, True, False} } with open('data.bin', 'wb') as f: pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
We have a dictionary of different data types. The data is pickled into a binary file.
with open('data.bin', 'wb') as f: pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
The dump
function writes the pickled representation of the object
to the file object. Over the time several protocols have been developed. The
protocol version determines the capabilities of the serialization process. In
our code, we choose the highest protocol version.
$ hexdump -C data.bin 00000000 80 05 95 75 00 00 00 00 00 00 00 7d 94 28 8c 01 |...u.......}.(..| 00000010 61 94 5d 94 28 4b 01 47 40 10 00 00 00 00 00 00 |a.].(K.G@.......| 00000020 4b 03 8c 08 62 75 69 6c 74 69 6e 73 94 8c 07 63 |K...builtins...c| 00000030 6f 6d 70 6c 65 78 94 93 94 47 40 10 00 00 00 00 |omplex...G@.....| 00000040 00 00 47 40 18 00 00 00 00 00 00 86 94 52 94 65 |..G@.........R.e| 00000050 8c 01 62 94 8c 09 61 20 72 65 64 20 66 6f 78 94 |..b...a red fox.| 00000060 43 0e 61 6e 64 20 6f 6c 64 20 66 61 6c 63 6f 6e |C.and old falcon| 00000070 94 86 94 8c 01 63 94 8f 94 28 89 88 4e 90 75 2e |.....c...(..N.u.| 00000080
Binary files cannot be read with simple text editors; we need tools that can work with hexadecimal data.
Python pickle deserialize
In the next example, we unpickle data from a binary file.
#!/usr/bin/python import pickle with open('data.bin', 'rb') as f: data = pickle.load(f) print(data)
The load
function reads the pickled representation of an object from
the file object and returns the reconstituted object.
$ ./simple_read.py {'a': [1, 4.0, 3, (4+6j)], 'b': ('a red fox', b'and old falcon'), 'c': {False, True, None}}
We have successfully recreated the dictionary object.
Python pickle dumps/loads
The dumps
function returns the pickled representation of the object
as a bytes object, instead of writing it to a file. The loads
function returns the reconstituted object hierarchy of the pickled representation
data of an object. The data must be a bytes-like object.
#!/usr/bin/python import pickle data = [1, 2, 3, 4, 5] dumped = pickle.dumps(data) print(dumped) loaded = pickle.loads(dumped) print(loaded)
In the example, we serialize and deserialize a Python list with dumps
and loads
.
$ ./dumps_loads.py b'\x80\x04\x95\x0f\x00\x00\x00\x00\x00\x00\x00]\x94(K\x01K\x02K\x03K\x04K\x05e.' [1, 2, 3, 4, 5]
Python pickle __getstate__/__setstate__
The process of pickling and unpickling can be influenced with the __getstate__
and __setstate__
functions. The __getstate__
function
is called upon pickling and the __setstate__
function upon unpickling.
blue, rock, water, sky, cloud, forest, hawk, falcon
This is the words.txt
file
red, green, blue, pink, orange
This is the colours.txt
file
#!/usr/bin/python import pickle class MyData: def __init__(self, filename): self.name = filename self.fh = open(filename) def __getstate__(self): odict = self.__dict__.copy() print(odict) del odict['fh'] return odict def __setstate__(self, dict): fh = open(dict['name']) self.name = dict['name'] self.fh = fh obj = MyData('words.txt') res = pickle.loads(pickle.dumps(obj)) print(res.fh.read()) obj2 = MyData('colours.txt') res = pickle.loads(pickle.dumps(obj2)) print(res.fh.read())
In the example, we store and remove the file handle in the
__setstate__
and __getstate__
member functions.
$ ./state.py {'name': 'words.txt', 'fh': <_io.TextIOWrapper name='words.txt' mode='r' encoding='UTF-8'>} blue, rock, water, sky, cloud, forest, hawk, falcon {'name': 'colours.txt', 'fh': <_io.TextIOWrapper name='colours.txt' mode='r' encoding='UTF-8'>} red, green, blue, pink, orange
Python pickle is insecure
The pickle
module is insecure. The module is a virtual machine
which uses predefined opcodes to do its work. By using specially crafted
binary strings the attacker can launch system commands which can damage data
or launch reverse shells.
#!/usr/bin/python import pickle pickle.loads(b"cos\nsystem\n(S'ls -l'\ntR.")
This example launches the Linux ls
command.
$ ./insec.py total 36 drwxr-xr-x 2 user2 user2 4096 Aug 13 16:16 Desktop drwxr-xr-x 2 user2 user2 4096 Aug 13 16:18 Documents drwxr-xr-x 2 user2 user2 4096 Aug 13 16:16 Downloads -rwxr-xr-x 1 user2 user2 79 Aug 29 11:08 insec.py drwxr-xr-x 2 user2 user2 4096 Aug 13 16:16 Music drwxr-xr-x 2 user2 user2 4096 Aug 13 16:16 Pictures drwxr-xr-x 2 user2 user2 4096 Aug 13 16:16 Public drwxr-xr-x 2 user2 user2 4096 Aug 13 16:16 Templates drwxr-xr-x 2 user2 user2 4096 Aug 13 16:16 Videos
Source
pickle — Python object serialization
In this article we have worked with the Python pickle module.
Author
List all Python tutorials.