Python XML with SAX
last modified February 15, 2025
In this article, we show how to use the SAX (Simple API for XML) in
Python for event-driven XML parsing. SAX is a memory-efficient approach to
parsing XML documents, making it suitable for large files. Unlike DOM (Document
Object Model), SAX does not load the entire XML document into memory. Instead,
it processes the document sequentially and triggers events as it encounters
elements, attributes, and text.
The xml.sax module is part of Python's standard library, so no
additional installation is required.
Basic SAX Parsing
The following example demonstrates how to parse an XML document using SAX. We create a custom handler class to handle events such as start elements, end elements, and character data.
import xml.sax
from io import StringIO
class MyHandler(xml.sax.ContentHandler):
def __init__(self):
self.current_element = ""
self.current_data = ""
# Called when an element starts
def startElement(self, tag, attributes):
self.current_element = tag
if tag == "book":
print("Book Id:", attributes["id"])
# Called when an element ends
def endElement(self, tag):
if tag == "title":
print("Title:", self.current_data)
elif tag == "author":
print("Author:", self.current_data)
elif tag == "year":
print("Year:", self.current_data)
self.current_data = ""
# Called when character data is found
def characters(self, content):
if self.current_element in ["title", "author", "year"]:
self.current_data += content.strip()
# XML data
xml_data = """
<catalog>
<book id="1">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
</book>
<book id="2">
<title>1984</title>
<author>George Orwell</author>
<year>1949</year>
</book>
</catalog>
"""
# Create a SAX parser
parser = xml.sax.make_parser()
handler = MyHandler()
parser.setContentHandler(handler)
# Parse the XML data
parser.parse(StringIO(xml_data))
In this program, the MyHandler class inherits from
xml.sax.ContentHandler and overrides the startElement,
endElement, and characters methods to handle XML
events.
parser.parse(StringIO(xml_data))
The StringIO is used to create an in-memory file-like object from
the xml_data string. This allows the parser.parse method to read the XML data as
if it were reading from a file.
$ python main.py Book Id: 1 Title: The Great Gatsby Author: F. Scott Fitzgerald Year: 1925 Book Id: 2 Title: 1984 Author: George Orwell Year: 1949
Handling Attributes
The following example demonstrates how to handle attributes in XML elements using SAX.
import xml.sax
from io import StringIO
import xml.sax
class MyHandler(xml.sax.ContentHandler):
def __init__(self):
self.current_element = ""
# Called when an element starts
def startElement(self, tag, attributes):
self.current_element = tag
if tag == "book":
print("Book Id:", attributes["id"])
print("Category:", attributes["category"])
# Called when an element ends
def endElement(self, tag):
pass
# Called when character data is found
def characters(self, content):
pass
# XML data
xml_data = """
<catalog>
<book id="1" category="fiction">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
</book>
<book id="2" category="dystopian">
<title>1984</title>
<author>George Orwell</author>
<year>1949</year>
</book>
<book id="3" category="fiction">
<title>War and Peace</title>
<author>Leo Tolstoy</author>
<year>1869</year>
</book>
</catalog>
"""
# Create a SAX parser
parser = xml.sax.make_parser()
handler = MyHandler()
parser.setContentHandler(handler)
# Parse the XML data
parser.parse(StringIO(xml_data))
In this program, the startElement method is used to handle
attributes of the book element, such as id and
category.
$ python main.py Book Id: 1 Category: fiction Book Id: 2 Category: dystopian Book Id: 3 Category: fiction
Parsing XML Files
The following example demonstrates how to parse an XML file using SAX. This approach is memory-efficient because it processes the file sequentially without loading it entirely into memory.
<products>
<product>
<id>1</id>
<name>Product 1</name>
<price>10.99</price>
<quantity>30</quantity>
</product>
<product>
<id>2</id>
<name>Product 2</name>
<price>20.99</price>
<quantity>130</quantity>
</product>
<product>
<id>4</id>
<name>Product 4</name>
<price>24.59</price>
<quantity>350</quantity>
</product>
<product>
<id>5</id>
<name>Product 5</name>
<price>9.9</price>
<quantity>650</quantity>
</product>
<product>
<id>6</id>
<name>Product 6</name>
<price>45</price>
<quantity>290</quantity>
</product>
</products>
This is the file.
from xml.sax import make_parser, ContentHandler
class ProductHandler(ContentHandler):
def __init__(self):
self.current_data = ""
self.product = {}
def startElement(self, name, attrs):
self.current_data = ""
if name == "product":
self.product = {}
def characters(self, content):
self.current_data += content.strip()
def endElement(self, name):
if name != "product":
self.product[name] = self.current_data
else:
print(f"Id: {self.product['id']}, Name: {self.product['name']}")
parser = make_parser()
parser.setContentHandler(ProductHandler())
parser.parse("products.xml")
In this program, the parser.parse method is used to parse a XML
file named products.xml. The SAX parser processes the file
sequentially, making it suitable for large files.
$ python main.py Id: 1, Name: Product 1 Id: 2, Name: Product 2 Id: 4, Name: Product 4 Id: 5, Name: Product 5 Id: 6, Name: Product 6
Source
In this article, we have shown how to use the SAX API in Python for event-driven XML parsing. The SAX approach is memory-efficient and suitable for large XML files.
Author
List all Python tutorials.