Weekend Projects of a Geek: Rants of a Snake Charmer

XmlParser

This is a minimal XML parser that does what it is supposed to. The implementation is based on regular expressions. It is adapted from REX by Robert D. Camron.

Background

When Google Maps used to send the data objects needed by their AJAX framework in plain XML, I needed an XML parser to get location/directions data from the Google Maps server. In my earlier blog, I mentioned that PyS60 can seamlessly take modules from the desktop Python. But there are certain exceptions to that and the XML framework in Python happens to be one of them. I tried improting Python XML framework to PyS60 without any success and hence needed to find my own solution. As mentioned earlier, this XML Parser implementation is based on REX. I found REX when I was searching for the best approach to my solution. Robert Cameron has a set of regular expressions that can be used - very easily and effectively - to parse XML content. The following copyright notice and the license appear in the code.

# Robert D. Cameron "REX: XML Shallow Parsing with Regular Expressions",
# Technical Report TR 1998-17, School of Computing Science, Simon Fraser
# University, November, 1998.
# Copyright (c) 1998, Robert D. Cameron.
# The following code may be freely used and distributed provided that
# this copyright and citation notice remains intact and that modifications
# or additions are clearly identified.

The regular expressions for parsing the XML file are not modified in any way. Only the language in which the RE are implemented is changed to Python. The RE support is included in the PyS60 package.

The technical report can be found here.

Usage

XmlParser.py: This module can be used to

Parse an XML file
Parse an XML string

XMLNode Class

addProperty(property, value)

Adds a property and its value to the node. The property is added as an entry to the "properties" dictionary

addChild(tag, node)

Adds a child to the current node that can be accessed by the "tag". If there is more than one child to the current node by the same "tag", the children are added as an array in the order each child is encountered

setContent

Sets content of the current node. If the current node already contains content, then this content is appended to it

properties

A dictionary containing properties and their values of the current node

childnodes

A dictionary containing arrays containing children indexed by their tags

content

The actual content inside the tags

XMLParser Class

parseXMLFile(file)

Parses an XML file

parseXML(xmlBuffer)

Parses XML buffer passed as a string

getElementsByTagName(tag)

Traverses the minimal DOM tree PREorder and returns the array containing node(s) having "tag" name

root

Holds the root of the DOM tree

The user can decide to disregard some tags (not include them in the DOM tree) by adding them to the following array:


# Unsupported tags (HTML formatting tags for displaying info)
unSupportedTags = ['b', 'i', 'u']

Example Usage

Example XML file:

<?xml version="1.0"?>
<page>
    <title>mumbai india</title>
    <query>mumbai india</query>
    <request>
        <url>http://maps.google.com/maps?q=mumbai+india&amp;num=10</url>
        <query>mumbai india</query>
    </request>
    <center lat="18.959999" lng="72.819999"/>
    <span lat="0.089989" lng="0.095151"/>
    <overlay panelStyle="/maps?file=gp&amp;hl=en">
        <location infoStyle="/maps?file=gi&amp;hl=en" id="A">
            <point lat="18.959999" lng="72.819999"/>
            <icon class="noicon"/>
            <info>
                <address>
                    <line>Bombay</line>
                    <line>India</line>
                </address>
            </info>
        </location>
    </overlay>
</page>

To find out the "lat" and "lng" properties of "point" element:

...
    locxml = ... Above XML ...
    parser = XMLParser()
    parser.parseXML(locxml)
    pointNode = parser.getElementsByTagName('point')
    if pointNode is None:
        appuifw.note(u'Address not found', 'error')
    else:
        addressNode = parser.getElementsByTagName('address')
        lines = []
        if addressNode is not None:
            lineNodes = addressNode.childnodes['line']
            for node in lineNodes:
                lines.append(node.content)
        lat = float(pointNode.properties['lat'])
        lng = float(pointNode.properties['lng'])
...

Summary

This little module is an evidence that with minimal effort, complex things can be done with the help of PyS60 on your handset.

5 Comments:

Unknown said...: Hi, I would like to use this code to parse GPX files for my GPLv2 python application ( http://tracker-py.googlecode.com )
Would that be ok for you?

Mark
(mark.hurenkamp [at] xs4all.nl); 2:11 PM
João Felipe said...: Man,

I used your XmlParser lib and i got an error. I, fearlessly, got the code and with an example the lib broken.

Keep contact with me and I'll show it.

Regards

João Felipe; 10:31 AM
Unknown said...: Good work. I just need a tiny parser for read some simple XML doc. I found your implementation has a limitation. It can't parse the document with single quote. My resolution is very simple. I do a global replace before any parsing.; 9:40 PM
thanida said...: Thanks for sharing these tips, you can bet that I’ll be sure to pass them along.Have a great day
scr888; 12:01 AM
thanida said...: Fantastic articles is post by you in this blog. You give a nice thing. Thank you for such a nice article.
gclub; 10:21 PM

<< Home

Weekend Projects of a Geek

Friday, October 20, 2006

Rants of a Snake Charmer - III

5 Comments:

About Me

Previous Posts