© 2009 David Gustafsson A snipplet of code

SAX XML Parser with Java

The last day I have learnt to use the SAX framework for parsing XML documents. This framework is not as powerful as DOM (document object model) when considering easiness to add data to the document. The framework is based on sequential scanning of the document and hence can’t jump back and forth between children, siblings and parents. The SAX implementer therefore needs to be smart to be able to implement complex logic.

The strengths of the SAX parser are that it is efficient and does not store the document in memory. This is necessary for the DOM framework that parses and stores the document in a tree structure.

Another advantage of the SAX framework is that it is built into java. This is opposite to DOM that requires the user to import a parser from an external source. org.apache.xerces.parsers is one example of a open source DOM parser.

A snipplet of code

A snipplet of code

To parse a document with SAX you implement the ContentHandler interface. This class is called by the parser for advice on what actions to take. The interface has methods such as startElement, endElement, startDocuement, characters and a few more. These methods are called by the parser when an entity of that type is found in the document.