Python, Creating XML, Recursion, and Order

I love being challenged every day. Today I ran across a challenge that has several solutions. However, most of them are hacks, and I never feel like I really solved the problem when I implement a hack. There’s also that eerie feeling that this hack is going to bite me later.

I have an API I need to talk to. It’s pure XML-over-HTTP (henceforth XHTTP). Actually, the first issue I had was just dealing with XHTTP. All of the APIs I’ve dealt with in the past were either XMLRPC, SOAP, which have ready-made libraries ready to use, or they used GET parameters and the like, which are pretty simple to deal with. I’ve never had to actually construct an HTTP request and just POST raw XML.

It’s as easy as it should be, really. I code in Python, so once you actually have an XML message ready to send, you can use urllib2 to send it in about 2 lines of code.

The more interesting part is putting the request together. I thought I had this beat. I decided to use the xml.dom.minidom module and create my Document object by going through the DOMImplementation object, because it was the only thing I found that allowed me to actually specify a DTD programmatically instead of hackishly tacking on hard-coded text. No big deal.

Now that I had a document, I needed to add elements to it. Some of these XML queries I’m sending can get too long to write a function that manually creates all the elements, then adds them to the document, then creates the text nodes, then adds them to the proper element… it’s amazingly tedious to do. I really wish Python had something like PHP’s XMLWriter, which lets you create an element and its text in one line of code.

Tedium drives me nuts, so rather than code this all out, I decided to create a dictionary that mirrored the structure of my query, with the data for the text nodes filled in by variables.

query_params = {'super_special_query':
                   {
                      'credentials': {'username': user, 'password': password, 'data_realm': realm},
                      'result_params': {'num_results': setsize, 'order_by': order_by},
                       query_type: query_dict
                    }
                }

def makeDoc():
    impl = getDOMImplementation()
    dt = impl.createDocumentType("super_special", None, 'super_special.dtd')
    doc = impl.createDocument(None, "super_special", dt)
    return doc

def makeQuery(doc, query_params, tag=None):
    """
        @doc is an xml.minidom.Document object
        @query_params is a dictionary structure that mirrors the structure of the xml.
        @tag used in recursion to keep track of the node to append things to next time through.

    """

    if tag is None:
        root = doc.documentElement
    else:
        root = tag

    for key, value in query_params.iteritems():
        tag = doc.createElement(key)
        root.appendChild(tag)
        if isinstance(value, dict):
            makeQuery(doc, value, tag)
        else:
            root.appendChild(tag)
            tag_txt = doc.createTextNode(value)
            tag.appendChild(tag_txt)

    return doc.toxml()

doc = makeDoc()
qxml = makeQuery(doc, query_params)

This is simplistic, really. I don’t need to deal with attributes in my queries, for example. But it is generic enough that if I need to send different types of queries, all that’s required is creating another dictionary to represent it, and passing that through the same makeQuery function to create the query.

Initial testing indicated success, but that’s why you can’t rely on only simple initial tests. Switching things up immediately exposed a problem: The API server validated my query against a DTD that enforced strict ordering of the elements, and Python dictionaries do not have the same notion of “order” that you and I do.

So there’s the code. If nothing else, it’s a less-contrived example of what you might actually use recursion for. Tomorrow I have to figure out how to enforce the ordering. One idea is to have a separate list to consult for the ordering of the elements. It requires an extra outer loop to go through the list, get the name of the next tag, then use that value to ask the dictionary for any related values. Seemed like a good way to go, but I had a bit of difficulty figuring out how to make that work at all. Maybe with fresh eyes in the AM it’ll be more obvious — that happens a lot, and is always a nice surprise.

Ideas, as always, are hereby solicited!

  • http://blog.marxy.org Peter Marks

    This is a great idea. Kind of reminds me of what you can do in the json/simplejson module where you just go json.dumps(myPythonStructure) or json.loads(bigStringOfJson).

    It would be lovely if there was xml.dumps(myPythonStructure), looks like you’re on the way.

  • raphael

    Why not using a special_query definition that is and object? Each class is responsible of generating its own part of the XML in the proper order.

    query = SuperSpecial(
    Credentials(username=user, password=password),
    ResultParams(num_result=set_size, order_by=order_by), …)
    query.to_xml()

    my 2 cents

  • Quikee

    Using tuples instead of dictionaries would solve your ordering problem without introducing much more code complexity.

  • masklinn

    For declarative XML creation in Python, there’s http://github.com/galvez/xmlwitch which is a port of Ruby’s Builder library. There’s a pure text and an elementtre backends (master branch is text). None of the branches handle DTD right now, but that should be easy to add.

    Using it, your example would look like this:

    doc = builder(version=’1.0′, encoding=’utf-8′)
    with doc.super_special_query:
    with doc.credentials:
    doc.username(user)
    doc.password(password)
    doc.data_realm(realm)
    with doc.result_params:
    doc.num_results(setsize)
    doc.order_by(order_by)
    doc.query_type(query_dict)

    # str(doc) to get your document

  • Juan Manuel

    PEP 372 describes the ordered dictionary data type (to be included in python 2.7 and 3.1). One of the references in the document is an implementation in python of this type.

    Hope this helps.

    Juan Manuel