Ebooks

Python pathlib tutorial

Python pathlib tutorial shows how to work with files and directories in Python with pathlib module.

The pathlib is a Python module which provides an object API for working with files and directories. The pathlib is a standard module.

Path is the core object to work with files.

$ pip install prettytable
$ pip install more_itertools

In this tutorial we will also use prettytable and more_itertools.

words.txt
blue
forest
sky
ocean
rabbit
clue

Some examples use this simple text file.

Path cwd and home

We get the current working directory with cwd() and the home directory with home().

cwd_home.py
#!/usr/bin/env python

from pathlib import Path

print(f"Current directory: {Path.cwd()}")
print(f"Home directory: {Path.home()}")

The example prints the current working director and the home directory.

$ cwd_home.py
Current directory: C:\Users\Jano\Documents\pyprogs\pathlib
Home directory: C:\Users\Jano

This is a sample output.

Change directory

We go inside another directory with os' chdir().

change_dir.py
#!/usr/bin/env python

from pathlib import Path
from os import chdir

path = Path('..')

print(f'Current working directory: {path.cwd()}')

chdir(path)

print(f'Current working directory: {path.cwd()}')

chdir('..')

We change the current working directory. Note that the directory is changed only inside the Python program.

$ change_dir.py
Current working directory: C:\Users\Jano\Documents\pyprogs\pathlib
Current working directory: C:\Users\Jano\Documents\pyprogs

This is a sample output.

Path mkdir

A new directory is created with mkdir().

mkdir.py
#!/usr/bin/env python

from pathlib import Path

path = Path.cwd() / 'new'

path.mkdir()

The example creates a new directory inside the current working directory.

Copy file

With the help of the shutil module, we copy a file.

copy_file.py
#!/usr/bin/env python

from pathlib import Path
from shutil import copyfile

source = Path('words.txt')
destination = Path('words_bck.txt')

copyfile(source, destination)

The example makes a copy of a words.txt file.

source = Path('words.txt')

A file object is created by passing the file name to the Path constructor.

Joining paths

Paths can be joined with the / operator or the joinpath() method.

join_path.py
#!/usr/bin/env python

from pathlib import Path

path = Path.home()

docs = path / 'Documents'
pictures = path / 'Pictures'

print(docs)
print(pictures)

In the example, we join two paths with /.

$ join_path.py
C:\Users\Jano\Documents
C:\Users\Jano\Pictures

This is the output.

Path touch

The touch() creates a new empty file; it is an equivalent of the Linux touch command.

touch.py
#!/usr/bin/python3

from pathlib import Path

Path('myfile.txt').touch()

We create a new empty myfile.txt.

Path rename

The rename() renames a file or directory.

rename.py
#!/usr/bin/env python

from pathlib import Path

path = Path('names.txt')

path.rename('mynames.txt')

The example renames the names.txt to mynames.txt in the current working directory.

Path names

We refer to files with their absolute file paths or relative paths. The paths have different representations; Windows uses different file paths than Linux.

path_names.py
#!/usr/bin/env python

from pathlib import Path

path = Path('C:/Users/Jano/Downloads/wordpress-5.1.tar.gz')

print(path)
print(path.as_uri())
print(path.as_posix())

The example shows three different file path structures.

$ path_names.py
C:\Users\Jano\Downloads\wordpress-5.1.tar.gz
file:///C:/Users/Jano/Downloads/wordpress-5.1.tar.gz
C:/Users/Jano/Downloads/wordpress-5.1.tar.gz

The first one is the Windows file path. The second one is an URI style. The third one is the POSIX style.

Path relative_to

A relative path starts from some given working directory, avoiding the need to provide the full absolute path. For instance, data.txt is a relative path to the /home/users/jano/data.txt from the perspective of the /home/users/jano/ directory.

In other words, when we are located in the /home/users/jano/ directory, we can relate to the file simply by its name data.txt, without the need to specify the full path.

relative_path.py
#!/usr/bin/env python

from pathlib import Path

path = Path('C:/Users/Jano/Downloads/wordpress-5.1.tar.gz')

home = Path.home()

relative = path.relative_to(home)
print(relative)

The example prints the relative path of an archive file given the home directory.

$ relative_path.py
Downloads\wordpress-5.1.tar.gz

This is the output.

Path parents

With parent() and parents(), we can get the logical parents of a path.

parents.py
#!/usr/bin/env python

from pathlib import Path

path = Path('C:/Users/Jano/Documents')

print(f"The parent directory of {path} is {path.parent}")
print(f"The parent of the parent of {path} is {path.parent.parent}")

print(f"All the parents of {path.parent}: ")

print(list(path.parents))

The example prints parents of a path.

print(f"The parent of the parent of {path} is {path.parent.parent}")

We can get a parent of a parent.

$ parents.py
The parent directory of C:\Users\Jano\Documents is C:\Users\Jano
The parent of the parent of C:\Users\Jano\Documents is C:\Users
All the parents of C:\Users\Jano:
[WindowsPath('C:/Users/Jano'), WindowsPath('C:/Users'), WindowsPath('C:/')]

This is the output.

Path parts

Path consists of subelements, such as drive or root.

parts.py
#!/usr/bin/env python

from pathlib import Path

path = Path('C:/Users/Jano/Documents')

print(path.parts)
print(path.drive)
print(path.root)

The program prints some parts of a path.

print(path.parts)

The parts gives access to the path’s various components.

print(path.drive)

The drive gives a string representing the drive letter or name, if any.

print(path.root)

The root gives a string representing the (local or global) root, if any.

$ parts.py
('C:\\', 'Users', 'Jano', 'Documents')
C:
\

This is the output.

The following program gives other parts of a path.

parts2.py
#!/usr/bin/env python

from pathlib import Path
import os

path = Path('C:/Users/Jano/Downloads/wordpress-5.1.tar.gz')

print(f"The stem is: {path.stem}")
print(f"The name is: {path.name}")
print(f"The suffix is: {path.suffix}")
print(f"The anchor is: {path.anchor}")

print(f"File name: {os.path.splitext(path.stem)[0]}")

print("The suffixes: ")
print(path.suffixes)

The program prints the stem, name, suffix(es), and the anchor.

$ parts2.py
The stem is: wordpress-5.1.tar
The name is: wordpress-5.1.tar.gz
The suffix is: .gz
The anchor is: C:\
File name: wordpress-5.1
The suffixes:
['.1', '.tar', '.gz']

This is the output.

Path list directories

The iterdir() yields path objects of the directory contents.

list_dirs.py
#!/usr/bin/env python

from pathlib import Path

path = Path('C:/Users/Jano/Documents')

dirs = [e for e in path.iterdir() if e.is_dir()]
print(dirs)

The example prints the subdirectories of the specified directory. We check if the path object is a directory with is_dir().

The following example prints the files inside the specified directory.

list_files.py
#!/usr/bin/env python

from pathlib import Path

path = Path('C:/Users/Jano/Documents')

files = [e for e in path.iterdir() if e.is_file()]
print(files)

We check if a path object is a file with is_file().

Path globbing

Glob patterns specify sets of filenames with wildcard characters. For example, the *.txt represents all files with names ending in .txt. The * is a wildcard standing for any string of characters. The other common wildcard is the question mark (?), which stands for one character.

Path provides glob() and rglob(). The latter is used for recursive globbing. It adds **/ in front of the given pattern.

globbing.py
#!/usr/bin/env python

from pathlib import Path

path = Path('C:/Users/Jano/Documents/pyprogs')

for e in path.rglob('*.py'):
    print(e)
    
# for e in path.glob('**/*.py'):
#     print(e)    

The example prints all Python files in the specified directory and all its subdirectories. Notice that such operations may be very time consuming.

for e in path.rglob('*.py'):
print(e)
    
# for e in path.glob('**/*.py'):
#     print(e)    

Both operations are equivalent.

Path tree

The following example is a practical program which outputs the contents of the specified directory in a hierarchical tree structure.

tree.py
#!/usr/bin/env python

from pathlib import Path

def tree(directory):

    print(f'+ {directory}')

    for path in sorted(directory.rglob('*')):

        depth = len(path.relative_to(directory).parts)
        spacer = '    ' * depth

        # print(f'{spacer}+ {path.name}')

        if path.is_file():
            print(f'{spacer}f {path.name}')
        else:
            print(f'{spacer}d {path.name}')

path = Path.home() / 'Downloads'

tree(path)

The program outputs the contents of the Downloads directory in a tree structure.

Counting files by extension

In the following example, we count all files by their extension. We use the collections's Counter for the task.

count_files.py
#!/usr/bin/env python

import collections
from pathlib import Path

docs = Path.home() / 'Documents'

files = [path.suffix for path in docs.iterdir() if path.is_file() and path.suffix]
data = collections.Counter(files)

print(data)

for key, val in data.items(): 
    print(f'{key}: {val}')

The example counts files grouped by their extension in the Documents directory.

files = [path.suffix for path in docs.iterdir() if path.is_file() and path.suffix]

In the list comprehension, we ensure that the path object is a file with is_file() and that the file has en extension. Files may not have extensions; especially on Unix systems.

$ count_files.py
Counter({'.txt': 7, '.pdf': 3, '.ini': 1, '.zip': 1, '.rtf': 1})
.pdf: 3
.txt: 7
.ini: 1
.zip: 1
.rtf: 1

This is a sample output.

Path read_text

The read_text() reads the contents of the file as a string. The file is opened and then closed. The optional parameters have the same meaning as in open().

read_text.py
#!/usr/bin/env python

from pathlib import Path

path = Path('words.txt')

content = path.read_text()
print(content)

The example reads the contents of the words.txt file with read_text().

$ read_text.py
blue
forest
sky
ocean
rabbit
clue

This is the output.

Path read file with open

The open() opens the file pointed to by the path, like the built-in open() function does.

read_with_open.py
#!/usr/bin/env python

from pathlib import Path

path = Path('words.txt')

with path.open() as f: 
    lines = f.readlines()
    print(lines)

for line in lines:
    print(line.rstrip())

The example opens the words.txt file with open() and reads the contents with readlines().

Path read binary file

Binary files, such as images, can be read with read_bytes().

read_bytes.py
#!/usr/bin/env python

from pathlib import Path
import binascii
from more_itertools import sliced

path = Path('sid.jpg')

hexed = binascii.hexlify(path.read_bytes())
mybytes = list(sliced(hexed, 2))

i = 0

for b in mybytes:

    print(b.decode("utf-8") , end=' ')
    i += 1

    if (i % 30 == 0):
        print()

The example reads a JPEG picture and prints it to the terminal in hexadecimal representation.

Path write_text

The write_text opens the file in text mode, writes data to it, and closes the file.

write_text.py
#!/usr/bin/python3

from pathlib import Path

path = Path('myfile.txt')
path.touch()

path.write_text('This is myfile.txt')

The example creates a new empty file with touch() and writes some text data into the file with write_text().

New article

Content management systems often put their newly created articles in directory structure based on the current year and month. The next example demonstrates this.

new_article.py
#!/usr/bin/env python

from pathlib import Path
import datetime

now = datetime.datetime.now()
year = now.year
month = now.month

name = input('Enter article name:')

path1 = Path('articles') / str(year) / str(month)
path1.mkdir(parents=True)

path2 = path1 /  f'{name}.txt'

path2.touch()

print(f'Article created at: {path2}')

The program ask an input from the users. It creates a new text file based on the current year and month.

PrettyTable example

When we work with files and directories, we can use PrettyTable module for nicer output.

simple_table.py
#!/usr/bin/env python

from pathlib import Path
import datetime
from prettytable import PrettyTable

path = Path('C:/Users/Jano/Documents/')

pt = PrettyTable()
pt.field_names = ["File name", "Size", "Created"]

pt.align["File name"] = "l"
pt.align["Size"] = "r"
pt.align["Created"] = "l"

for e in path.glob('**/*.txt'):

    created = datetime.datetime.fromtimestamp(e.stat().st_ctime)
    size = e.stat().st_size
    pt.add_row([e.name, size, f"{created:%Y-%m-%d}"])
    
print(pt)

The example displays all text files inside Documents in a nice table. The table contains three columns: file name, size, and creation date.

$ simple_table.py
+-------------------------------------------------------+-------+------------+
| File name                                             |  Size | Created    |
+-------------------------------------------------------+-------+------------+
| data.txt                                              |     0 | 2019-02-27 |
| eternal_return.txt                                    | 10467 | 2019-03-03 |
| potvrdenie.txt                                        |   773 | 2019-01-14 |
| text_processing.txt                                   |   750 | 2019-02-18 |
| website-inspire.txt                                   |    62 | 2019-03-03 |
| words.txt                                             |    31 | 2018-12-30 |
| Úvod do Symfony.txt                                   |  7613 | 2019-03-04 |
| robots.txt                                            |   240 | 2019-01-01 |
| robots.txt                                            |   240 | 2019-02-03 |
...

This is a sample partial output.

In this tutorial, we have covered the standard Python pathlib module.

You might also be interested in the following related tutorials: PrettyTable tutorial, Python argparse tutorial, Python tutorial.