Python hashing
last modified January 29, 2024
Python hashing tutorial explains the hashing concept in Python. We explain hash tables and Python hashable objects.
Hash table
Hash tables are used to implement map and set data structures in many common programming languages, such as C++, Java, and Python. Python uses hash tables for dictionaries and sets. A hash table is an unordered collection of key-value pairs, where each key is unique. Hash tables offer a combination of efficient lookup, insert and delete operations. These are the best properties of arrays and linked lists.
Hashing
Hashing is the process of using an algorithm to map data of any size to a fixed length. This is called a hash value. Hashing is used to create high performance, direct access data structures where large amount of data is to be stored and accessed quickly. Hash values are computed with hash functions.
Python hashable
An object is hashable if it has a hash value which never changes during its
lifetime. (It can have different values during multiple invocations of Python
programs.) A hashable object needs a __hash__
method. In order to
perform comparisons, a hashable needs an __eq__
method.
Hashability makes an object usable as a dictionary key and a set member, because
these data structures use the hash value internally. Python immutable built-in
objects are hashable; mutable containers (such as lists or dictionaries) are
not. Objects which are instances of user-defined classes are hashable by
default. They all compare unequal (except with themselves), and their hash value
is derived from their id
.
__eq__
method
it should not define a __hash__
operation either; if it defines __eq__
but not __hash__
, its instances will not be usable as items in hashable collections.
Python hash function
The hash
function returns the hash value of the object if it has one.
Hash values are integers. They are used to quickly compare dictionary keys during
a dictionary lookup. Objects can implement the __hash__
method.
Python immutable builtins are hashable
Python immutable builtins, such as integers, strings, or tuples, are hashable.
#!/usr/bin/python val = 100 print(val.__hash__()) print("falcon".__hash__()) print((1,).__hash__())
The example prints the values of three hashables: an integer, a string, and a tuple.
Python custom hashable example I
Python custom objects are hashable by default. Their hash is derived from their Id.
#!/usr/bin/python class User: def __init__(self, name, occupation): self.name = name self.occupation = occupation u1 = User('John Doe', 'gardener') u2 = User('John Doe', 'gardener') print('hash of user 1') print(hash(u1)) print('hash of user 2') print(hash(u2)) if (u1 == u2): print('same user') else: print('different users')
In the example, we have two instances of a User
.
u1 = User('John Doe', 'gardener') u2 = User('John Doe', 'gardener')
We have two instances with the same data.
print('hash of user 1') print(hash(u1))
The hash
function returns the hash value of the
object. The default implementation is derived from the Id of the object.
$ python custom_object.py hash of user 1 -9223371894419573195 hash of user 2 142435202673 different users
Even though the user details are the same, the comparison yields differet objects.
In order to change it, we need to implement the __eq__
method.
Python custom hashable example II
In the second example, we implement a custom __eq__
method.
#!/usr/bin/python class User: def __init__(self, name, occupation): self.name = name self.occupation = occupation def __eq__(self, other): return self.name == other.name \ and self.occupation == other.occupation def __str__(self): return f'{self.name} {self.occupation}' u1 = User('John Doe', 'gardener') u2 = User('John Doe', 'gardener') if (u1 == u2): print('same user') print(f'{u1} == {u2}') else: print('different users') # users = {u1, u2} # print(len(users))
Now the comparison returns the expected output for us; however, we cannot
insert the objects into a Python set; it would result in
TypeError: unhashable type: 'User'
. In order to change this,
we implement the __hash__
method.
Python custom hashable example III
In the third example, we implement the __eq__
and
the __hash__
methods.
#!/usr/bin/python class User: def __init__(self, name, occupation): self.name = name self.occupation = occupation def __eq__(self, other): return self.name == other.name \ and self.occupation == other.occupation def __hash__(self): return hash((self.name, self.occupation)) def __str__(self): return f'{self.name} {self.occupation}' u1 = User('John Doe', 'gardener') u2 = User('John Doe', 'gardener') users = {u1, u2} print(len(users)) if (u1 == u2): print('same user') print(f'{u1} == {u2}') else: print('different users') print('------------------------------------') u1.occupation = 'programmer' users = {u1, u2} print(len(users)) if (u1 == u2): print('same user') print(f'{u1} == {u2}') else: print('different users')
The example compares two objects that have custom implementation
of the __eq__
and __hash__
methods.
The objects can be inserted into a Python set and when an attribute is later
changed, we get the expected output.
def __hash__(self): return hash((self.name, self.occupation))
The implementation of the __hash__
function returns a hash
value computed with the hash
function from a tuple of attributes.
$ python custom_object3.py 1 same user John Doe gardener == John Doe gardener ------------------------------------ 2 different users
Python @dataclass decorator
Since Python 3.7, we have the dataclass
decorator, which
automatically generates some boilerplate code.
The dataclass decorator has a frozen argument (False
by default).
If specified, the fields will be frozen (i.e. read-only). If
eq
is set to True
, which it is by default then the
__hash__
method is implemented and object instances will be
hashable.
#!/usr/bin/python from dataclasses import dataclass @dataclass(frozen=True) class User: name: str occupation: str u1 = User('John Doe', 'gardener') u2 = User('John Doe', 'gardener') if (u1 == u2): print('same user') print(f'{u1} == {u2}') else: print('different users') users = {u1, u2} print(len(users))
The example uses the @dataclass
decorator.
$ python decorator.py same user User(name='John Doe', occupation='gardener') == User(name='John Doe', occupation='gardener') 1
Source
Python hash function - language reference
In this article we have covered hashing in Python.
Author
List all Python tutorials.