Python Quirks in Cmd, urllib2, and decorators

So, if you haven’t been following along, Python programming now occupies the bulk of my work day.

While I really like writing code and like it more using Python, no language is without its quirks. Let me say up front that I don’t consider these quirks bugs or big hulking issues. I’m not trying to bash the language. I’m just trying to help folks who trip over some of these things that I found to be slightly less than obvious.

Python’s Cmd Module and Handling Arguments

Using the Python Cmd module lets you create a program that provides an interactive shell interface to your users. It’s really simple, too. You just create a class that inherits from cmd.Cmd, and define a bunch of methods named do_<something>, where <something> is the actual command your user will run in your custom shell.

So if you want users to be able to launch your app, be greeted with a prompt, type “hello”, and have something happen in response, you just define a method called “do_hello” and whatever code you put there will be run when a user types “hello” in your shell. Here’s what that would look like:

import cmd

class MyShell(cmd.Cmd):
   def do_hello(self):
      print "Hello!"

# Kick off the shell
shell = MyShell()
shell.cmdloop()

Of course, what’s a shell without command line options and arguments? For example, I created a shell-based app using Cmd that allowed users to run a ‘connect’ command with arguments for host, port, user, and password. Within the shell, the command would look something like this:

> connect -h mybox -u jonesy -p mypass

Note that the “>” is the prompt, not part of the command.

The idea here is that you pass the arguments to the option flags, and you can set sane defaults in the application for missing args (for example, I didn’t provide a port here — I’m leaning on a default, but I did provide a host, since the default might be ‘localhost’).

Passing just one, single-word argument with Cmd is dead easy, because all of the command methods receive a string that contains *everything* on the line after the actual command. If you’re expecting such an argument, just make sure your ‘do_something’ method accepts the incoming string. So, to let users see what “hello” looks like in Spanish, we can accept “esp” as an argument to our command:

class MyShell(cmd.Cmd):
   def do_hello(self, arg):
      print "Hello! %s" % arg

The problems come when you want more than one argument, or when you want flags with arguments. For example, in the earlier “connect” example, my “do_connect” method is still only going to get one big, long string passed to it — not a list of arguments. So where in a normal program you might do something like:

class MyShell(cmd.Cmd):
   def do_connect(self, host='localhost', port='42', user='guest', password='guest'):
      #...connection code here...

In a Cmd method, you’re just going to define it like we did the do_hello method above: it takes ‘self’ and ‘args’, where ‘args’ is one long line.

A couple of quick workarounds I’ve tried:

Parse the line yourself. I created a method in my Cmd app called ‘parseargs’ that just takes the big long line and returns a dictionary. My specific application only takes ‘name=value’ arguments, so I do this:

         d = dict([arg.split('=') for arg in args.split()])

And return the dictionary to the calling method. My connect method can then check for keys in the dictionary and set things up. It’s longer an a little more arduous, but not too bad.

Use optparse. You can instantiate a parser right inside your do_x methods. If you have a lot of methods that all need to take several flags and args, this could become cumbersome, but for one or two it’s not so bad. The key to doing this is creating a list from the Big Long Line and passing it to the parse_args() method of your parser object. Here’s what it looks like:

class MyShell(cmd.Cmd):
   def do_touch(self, line):
      parser = optparse.OptionParser()
      parser.add_option('-f', '--file', dest='fname')
      parser.add_option('-d', '--dir', dest='dir')
      (options,args) = parser.parse_args(line.split())

      print "Directory: %s" % options.dir
      print "File name: %s" % options.fname

This method is just an example, so don’t scratch your head looking for “import os” or anything :)

This is probably the more elegant solution, since it doesn’t require you to restrict your users to passing args in a particular way, and doesn’t require you to come up with fancy CLI argument parsing algorithms.

Using urllib2 for Pure XML Over HTTP

I wrote a web service client this week that does pure XML over HTTP to send queries to a service. I’ve written things like this before using Python, but it turns out, after looking back at my code, I was always either using XMLRPC, SOAP, or going through some wrapper that hid a lot from me in an effort to make my life easier (like the Google Data API). I’ve never had to try to send a pure XML payload over the wire to a web server.

I figured urllib2 was going to help me out here, and it did, but not before going through some pain due mainly to an odd pattern in various sources of documentation on the topic. I read docs at python.org, effbot.org, a couple of blogs, and did a Google search, and everything, everywhere, seems to indicate that the urllib2.Request object’s optional “data” argument expects a urlencoded string. From http://docs.python.org/library/urllib2.html?highlight=urllib2.request#urllib2.Request

data should be a buffer in the standard application/x-www-form-urlencoded format

The examples on every site I’ve found always pass whatever ‘data’ is through urllib.urlencode() before adding it to the request. I figured urllib2 was no longer my friend, and almost started looking at implementing an HTTPSClient object. Instead I decided to try just passing my unencoded data. What’s it gonna do, detect that my data wasn’t urlencoded? Maybe I’d learn something.

I learned that all of the documentation fails to account for this particular edge case. Go ahead and pass whatever the heck you want in ‘data’. If it’s what the server on the other end expects, you’ll be fine. :)

Decorators

I found myself in dark, dusty corners when I had to decide how and where inside of a much larger piece of code to implement a feature. I really wanted to use a decorator, and still think that’s what I’ll wind up doing, but then how to implement the decorator isn’t as straightforward as I’d like either.

Decorators are used to alter how a decorated function operates. They’re amazingly useful, because instead of implementing some bit of code in a bunch of methods that themselves live inside a bunch of classes across various modules, or creating an entire class or mixin to inherit from when you only need the code overhead in a couple of edge cases, you can just create a decorator and apply it only to the proper methods or functions.

The lesson I learned is to try very hard to make one solid decision about how your decorator will work up front. Will it be a class? That’s done somewhat differently than doing it with a function. Will the decorator take arguments? That’s handled differently in both implementations, and also requires changes to an existing decorator class that didn’t used to take arguments. I don’t know why I expected this to be more straightforward, but I totally did.

If you’re new to decorators or haven’t had to dig into them too deeply, I highly recommend Bruce Eckel’s series introducing Python decorators, which walks you through all of the various ways to implement them. Part I (of 3) is here.

  • Roger

    As you point out the interesting part is splitting the string into pieces. Using regular split is a bad idea since it won’t cope with quoted arguments.

    Instead you can use shlex.split which does the job really well. It has two gotchas. The first is that it doesn’t process backslashes. so if someone did –password=abc\ def then you’ll still get that literally – ie you’ll need to turn backslash space in to a space, t into tab, quote into a quote etc.

    The second problem is that shlex.split is completely broken in Python 2 (Py 3 is ok) if you supply a Unicode string. It works on the raw Unicode binary bytes giving a resulting mess. If you are using unicode (which you should) then you have to convert to utf8 first, call it and then convert the results back to Unicode.

  • http://agentultra.com j_king

    Roger is right, you should be using the shlex module. The unicode issue is kind of annoying but can be abstracted away.

    Decorators are really simple but I think the trip up is that people expect them to be complex. Just gotta remember that they’re only syntactic sugar. Instead of always calling f(x()) you can just use the decorator syntax. They’re the same thing.

    You can get more complex than that, but the beginning Python programmer doesn’t need to worry about it; they’re pretty far edge cases that a programmer should have enough experience with Python to not be phased by it when they come across it.

  • m0j0

    Thanks for this — I haven’t actually implemented this in my code yet, so nice to know in advance, though I did expect that to some degree I was still going to have to restrict the user in some way to make parsing the args line predictable.

    Another option I forgot to mention that either a) alleviates the issue or b) trades one problem for another is to use raw_input, which Cmd will handle in a sane way. That’s actually what I’m using for my ‘connect’ method now. I imported getpass to handle the password :)

  • http://nearfar.org srid

    cmdln is a Python library that does what Cmd and optparser combined .. misses. http://code.google.com/p/cmdln