ZetCode

Python split string

last modified November 29, 2021

Python split string tutorial shows how to split strings in Python.

We can split strings in Python with the following methods:

Python split/rsplit methods

The split methods cut a string into parts based on the given separator parameter. With the optional second parameter we can control how many times the string is cut.

str.split([sep[, maxsplit]])

The str.split method returns a list of the words in the string, separated by the delimiter string.

The parameters are:


   
str.rsplit([sep[, maxsplit]])

The str.rsplit returns a list of the words in the string, separated by the delimiter string (starting from right).

Python split examples

In the following examples, we cut strings into parts with the previously mentioned methods.

splitting.py
#!/usr/bin/python

line = "sky, club, cpu, cloud, war, pot, rock, water"

words = line.split(',')
print(words)

words2 = line.split(', ')
print(words2)

words3 = line.split(',')
words4 = [e.strip() for e in words3]
print(words4)

In the example, we cut the line of words delimited with a comma into a list of words.

words = line.split(',')

The string is cut by the comma character; however, the words have spaces.

words2 = line.split(', ')

One way to get rid of the spaces is to include a space character in the separator parameter.

words3 = line.split(',')
words4 = [e.strip() for e in words3]

Another solution is to use the strip method.

$ ./splitting.py 
['sky', ' club', ' cpu', ' cloud', ' war', ' pot', ' rock', ' water']
['sky', 'club', 'cpu', 'cloud', 'war', 'pot', 'rock', 'water']
['sky', 'club', 'cpu', 'cloud', 'war', 'pot', 'rock', 'water']

With the maxsplit parameter we can set how many splits will be done.

maxsplit.py
#!/usr/bin/python

line = "sky, club, cpu, cloud, war, pot, rock, water"

words = line.split(', ', 3)

for word in words:
    print(word)

print('-------------------------')

words2 = line.split(', ', 4)

for word in words2:
    print(word)

The rest of the words forms one string.

$ ./maxsplit.py
sky
club
cpu
cloud, war, pot, rock, water
-------------------------
sky
club
cpu
cloud
war, pot, rock, water

In the next example, we get words from the end of the string.

split_right.py
#!/usr/bin/python

line = "sky, club, cpu, cloud, war, pot, rock, water"

words = line.rsplit(', ', 3)[-3:]
print(words)

Using the rsplit method, we get the last three words.

$ ./split_right.py 
['pot', 'rock', 'water']

Python splitlines

The str.splitlines method returns a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is set to True.

The line boundaries are characters including line feed \n, carriage return \r, and carriage return/line feed \r\n.

str.splitlines([keepends])

It is a convenience method to quickly split lines into lists from files.

split_lines.py
#!/usr/bin/python

line = "sky\nclub\ncpu\rcloud\r\nwar\npot\nrock\nwater"

words = line.splitlines()
print(words)

The example turns the string into a list of words.

$ ./split_lines.py
['sky', 'club', 'cpu', 'cloud', 'war', 'pot', 'rock', 'water']

In the next example, we read words from a file.

words.txt
sky
cup
blue
bear
rock
pen
chair
lamp
bowl
rock
falcon

We have a file of words.

split_lines2.py
#!/usr/bin/python

filename = 'words.txt'

with open(filename, 'r') as f:
    
    data = f.read()
    words = data.splitlines()
    
    print(words)

The read method reads the whole file into a string. The string is then split into lines with split_lines.

$ ./split_lines2.py 
['sky', 'cup', 'blue', 'bear', 'rock', 'pen', 'chair', 'lamp', 'bowl', 'rock', 'falcon']

Python re.split

With re.split, we can split strings by using regular expressions.

re.split(pattern, string, maxsplit=0, flags=0)

The method gives us more powerful options to cut strings.

reg_split.py
#!/usr/bin/python

import re

line = "sky, \nclub, \tcpu; cloud,  \n\n\nwar; pot, rock, water"

words = re.split("[;,]\s+", line)
print(words)

In the example, we spit the string into a list of words with re.spit. The words can be separated a comma or a semicolon and multiple white spaces.

$  ./reg_split.py 
['sky', 'club', 'cpu', 'cloud', 'war', 'pot', 'rock', 'water']

Python word frequency

In the following example, we count the word frequency.

$ wget https://raw.githubusercontent.com/janbodnar/data/main/the-king-james-bible.txt

We use the King James Bible.

word_freq.py
#!/usr/bin/python

import collections
import re

filename = 'the-king-james-bible.txt'

def get_words():

    words = []

    with open(filename) as f:

        for line in f:

            fields = re.split("\W+", line)

            for w in fields:

                if w and not w.isdigit():
                    words.append(w)

    return words

words = get_words()

c = collections.Counter(words)
common = c.most_common(10)

for e, i in common:
    print(f'{e}: {i}')

The example prints the ten most common words from the the-king-james-bible.txt file.

fields = re.split("\W+", line)

We split the line into words. The \W character class matches any character which is not a word character.

for w in fields:

if w and not w.isdigit():
     words.append(w)

We skip empty fields and verse notations (they contain digits).

c = collections.Counter(words)
common = c.most_common(10)

We count the occurences and print the top ten frequent words.

$ ./word_freq.py 
the: 62103
and: 38848
of: 34478
to: 13400
And: 12846
that: 12576
in: 12331
shall: 9760
he: 9665
unto: 8942

Python string partition

The partition method splits the sequence at the first occurrence of the given separator and returns a 3-tuple containing the part before the separator, the separator itself, and the part after the separator.

The rpartition method splits the sequence at the last occurrence of the given separator and returns a 3-tuple containing the part before the separator, the separator itself, and the part after the separator.

partition.py
#!/usr/bin/python

import os 

files = os.listdir('.')

for file in files:
    
    data = file.partition('.')
    print(f'{data[0]} has extension {data[2]}')

The example lists the current working directory and cuts each file into its name and extension. It uses partition.

$ ./partition.py 
words has extension txt
split_lines2 has extension py
splitting has extension py
split_lines has extension py
word_freq2 has extension py
split_right has extension py
the-king-james-bible has extension txt
reg_split has extension py
word_freq has extension py
partition has extension py
maxsplit has extension py

In this tutorial, we have showed how to split strings in Python.

Read Python tutorial or list all Python tutorials.