Nose Hates Me

I easy_install’d nose on my iMac some time in the last month, and tried to use it with options for the first time today, and I’ve found that a good number of the ones shown in ‘nosetests –help’ are actually not recognized when I run nosetests. Meanwhile, running nosetests with no options still works fine. This is 0.11.3 on OS X. Google gives me only references to plugins not being found. These are allegedly “built in”! Nobody replied on Twitter either, which is pretty odd in my experience. So here I am. Wtf is going on here?

For sure, -x and -v aren’t recognized, and -p *is* recognized. Using ‘-w’ with ‘.’ as an argument results in ‘no option -w’, but feeding it a non-existent directory results in a Python ValueError (/foo not found, or not a directory). Wtf?

In checking out /Library/Python/2.6/site-packages/nose-0.11.3-py2.6.egg/nose/config.py, I can see that these options are defined, so I’m a bit confused. Here’s a small sampling of output:

Brian-Joness-iMac:tests bjones$ nosetests -x 
Usage: nosetests [options]

nosetests: error: no such option: -x
Brian-Joness-iMac:tests bjones$ nosetests -v 
Usage: nosetests [options]

nosetests: error: no such option: -v
Brian-Joness-iMac:tests bjones$ nosetests 
..
----------------------------------------------------------------------
Ran 2 tests in 0.360s

OK

Clues hereby solicited.

PyTPMOTW: PsycoPG2

What is this module for?

Interacting with a PostgreSQL database in Python.

What is PostgreSQL?

PostgreSQL is an open source relational database product. It has some more advanced features, like built-in networking-related and GIS-related datatypes, the ability to script stored functions in multiple languages (including Python), etc. If you have never heard of PostgreSQL, get out from under your rock!

Making Contact

Using the pscyopg2 module to connect to a PostgreSQL database couldn’t be simpler. You can use the connect() method of the module, passing in either the individual arguments required to make contact (dbname, user, etc), or you can pass them in as one long “DSN” string, like this:

dsn = "host=localhost port=6000 dbname=testdb user=jonesy"
conn = psycopg2.connect(dsn)
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)

The DSN value is a space-delimited collection of key=value pairs, which I construct before sending the dsn to the psycopg2.connect() method. Once we have a connection object, the very first thing I do is set the connection’s isolation level to ‘autocommit’, so that INSERT and UPDATE transactions are committed automatically without my having to call conn.commit() after each transaction. There are several isolation levels defined in the psycopg2.extensions package, and they’re defined in ‘extensions’ because they go beyond what is defined in the DB API 2.0 spec that is typically used as a reference in creating Python database modules.

Simple Queries and Type Conversion

In order to get anything out of the database, we have to know how to talk to it. Of course this means writing some SQL, but it also means sending query arguments in a format understood by the database. I’m happy to report that psycopg2 does a pretty good job of making things “just work” when it comes to converting your input into PostgreSQL types, and converting the output directly into Python types for easy manipulation in your code. That said, understanding how to properly use these features can be a bit confusing at first, so let me address the source of a lot of early confusion right away:

cur = conn.cursor()
cur.execute("""SELECT id, fname, lname, balance FROM accounts WHERE balance > %s""", min_balance)

Chances are, min_balance is an integer, but we’re using ‘%s’ anyway. Why? Because this isn’t really you telling Python to do a string formatting operation, it’s you telling psycopg2 to convert the incoming data using the default psycopg2 method, which converts integers into the PostgreSQL INT type. So, you can use “%s” in the ‘execute()’ method to properly convert integers, strings, dates, datetimes, timedeltas, lists, tuples and most other native Python types to a corresponding PostgreSQL type. There are adapters built into psycopg2 as well if you need more control over the type conversion process.

Cursors

Psycopg2 makes it pretty easy to get your results back in a format that is easy for the receiving code to deal with. For example, the projects I work on tend to use the  RealDictCursor type, because the code tends to require accessing the parts of the resultset rows by name rather than by index (or just via blind looping). Here’s how to set up and use a RealDictCursor:

curs = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
curs.execute("SELECT id, name FROM users")
rs = curs.fetchall()
for row in rs:
   print rs['id'], rs['name']

It’s possible you have two sections of code that’ll rip apart a result set, and one needs by-name access, and the other just wants to loop blindly or access by index number. If that’s the case, just replace ‘RealDictCursor’ with ‘DictCursor’, and you can have it both ways!

Another nice thing about psycopg2 is the cursor.query attribute and cursor.mogrify method. Mogrify allows you to test and see how a query will look after all input variables are bound, but before the query is sent to the server. Cursor.query prints out the exact query that was actually sent over the wire. I use cursor.query in my logging output all the time to catch out-of-order parameters and mismatched input types, etc. Here’s an example:

try:
    curs.callproc('myschema.myprocedure', callproc_params)
except Exception as out:
    print out
    print curs.query

Calling Stored Functions

Stored procedures or ‘functions’ in PostgreSQL-speak can be immensely useful in large complex applications where you want to enforce business rules in a single place outside the domain of the main application developers. It can also in some cases be more efficient to put functionality in the database than in the main application code. In addition, if you’re hiring developers, they should develop in the standard language for your environment, not SQL: SQL should be written by database administrators and developers, and exposed to the developers as needed, so all the developers have to do is call this newly-exposed function. Here’s how to call a function using psycopg2:

callproc_params = [uname, fname, lname, uid]
cur.callproc("myschema.myproc", callproc_params)

The first argument to ‘callproc()’ is the name of the stored procedure, and the second argument is a sequence holding the input parameters to the function. The input parameters should be in the order that the stored procedure expects them, and I’ve found after quite a bit of usage that the module typically is able to convert the types perfectly well without my intervention, with one exception…

The UUID Array

PostgreSQL has built-in support for lots of interesting data types, like INET types for supporting IP addresses and CIDR network blocks, and GIS-related data types. In addition, PostgreSQL supports a type that is an array of UUIDs. This comes in handy if you use a UUID to identify items and want to store an array of them to associate with an order, or you use UUIDs to track messages and want to store an array of them together to represent a message thread or conversation. To get a UUID array into the database quickly and easily, it’s really not too difficult. If you have a list of strings that are UUID strings, you can do a quick conversion, call one function, and then use the array like any other input parameter:

my_uuid_arr = [uuid.UUID(i) for i in my_uuid_arr]
psycopg2.extras.register_uuid()
callproc_params = [
myvar1,
myvar2,
my_uuid_arr
]

curs.callproc('myschema.myproc', callproc_params)

Connection Status

It’s not a given that your database connection lives on from query to query, and you shouldn’t really just assume that because you did a query a fraction of a second ago that it’s still around now. Actually, to speak about things more Pythonically, you *should* assume the connection is still there, but be ready for failure, and check the connection status to diagnose and help get things back on track. You can check the ‘status’ attribute of your connection object. Here’s one way you might do it:

    @property
    def active_dbconn(self):
        return self.conn.status in [psycopg2.extensions.STATUS_READY, psycopg2.extensions.STATUS_BEGIN]:

So, I’m assuming here that you have some object that has a connection object that it refers to as ‘self.connection’. This one-liner function uses the @property built-in Python decorator, so the other methods in the class can either check the connection status before attempting a query:

if self.active_dbconn:
    try:
        curs.execute(...)
    except Exception as out:
         logging.error("Houston we have a problem")

Or you can flip that around like this:

try:
   curs.execute(...)
except Exception as out:
    if not self.active_dbconn:
        logging.error("Execution failed because your connection is dead")
    else:
         logging.error("Execution failed in spite of live connection: %s" % out)

Read On…

A database is a large, complex beast. There’s no way to cover the entirety of a database or a module that talks to it in a simple blog post, but I hope I’ve been able to show some of the more common features, and maybe one or two other items of interest. If you want to know more, I’m happy to report that, after a LONG time of being unmaintained, the project has recently sprung back to life and is pretty well-documented these days. Check it out!

PyTPMOTW: PyYAML

What’s This Module For?

Reading and writing files formatted using “YAML Ain’t Markup Language”” (YAML), and converting YAML syntax into native Python objects and datatypes.

What is YAML?

According to the website which houses the YAML Specification:

YAML™ (rhymes with “camel”) is a human-friendly, cross language, Unicode
based data serialization language designed around the common native data
structures of agile programming languages. It is broadly useful for
programming needs ranging from configuration files to Internet messaging to
object persistence to data auditing.

My introduction to YAML came several years ago in the context of messaging, and I then had a run-in with YAML as a logging format (actually, I was trying to parse a MySQL slow query log by coaxing it into YAML format). However, when I started writing Python full time, working on several different initiatives, YAML quickly became the standard configuration format.

Why? Simplicity. Using YAML for our config files and PyYAML to parse them, any developer can figure out what’s happening in our application in a matter of minutes, even if Python is not their primary language. It’s also nice that the YAML syntax is parsed into native Python datatypes, so Python coders looking at a config file can start to get a pretty good picture of how the program basically works.

The other thing that makes it simpler than some other config-specific options is that there’s not a lot of underlying “stuff” to know about. YAML isn’t a configuration engine, it’s essentially just a way to deal with data structures without locking the format to a specific language.

I also happen to like that it’s not config-specific, because it means that if I later need a messaging format, I already know one, and am familiar with a certain Python module to work with it!

Basic Usage

Let’s write a very simple YAML configuration for the logging portion of anapplication:

%YAML 1.2
---
Logging:
format: "%(levelname) -10s %(asctime)s %(module)s:%(funcName)s()  %(message)s"
level: 10
...

I’ve put logging-related configuration in its own “section” (really data structure) here so when I want to configure other things in the application I can do so without shooting myself in the foot and having to be careful not to use the same key names, etc.

I’ve stored this configuration in a file called ‘log.conf’. From there you can easily play with it in an interpreter session:

>>> import yaml
>>> config_file = open('log.conf', 'r')
>>> config = yaml.load(config_file)
>>> config
{'Logging': {'format': '%(levelname) -10s %(asctime)s %(module)s:%(funcName)s()  %(message)s', 'level': 10}}
>>>

With the configuration out of the way, let’s look at the code that would use it:

#!/usr/bin/env python

import logging
import yaml

def doit(uid):
    logging.debug("Working with uid: %s" % uid)

if __name__ == "__main__":
    config_file = open('log.conf', 'r')
    config = yaml.load(config_file)
    config_file.close()
    logging.basicConfig(**config['Logging'])

    doit(22222)

logging.basicConfig() takes a keyword dictionary of optional configuration items. Here I’m just using the ‘format’ and ‘level’ options, but there are more.

The only thing I do inside the doit() function is use logging to output the value of ‘uid’ passed in. This is really a test that the format I’ve configured is actually being used.

The format is fairly intuitive: indentation defines a block, just like in Python. The ‘—’ and ‘…’ lines denote the beginning and end of the YAML document. You can have several documents in a file if you so choose. This might be done if you’re storing a feed or email threads in YAML format.

Type Conversion

Type conversion to the built in Python primitives works very well and is very intuitive in my experience. The above would be parsed as a string for the ‘format’ key, and an ‘int’ for the ‘level’ key. The entire block above will become a dictionary, and there is YAML syntax you can use to create lists and lists of lists, etc., as well.

For example, let’s say I’m creating a Django-like web application framework and I’ve decided to store my URL-to-handler mappings in a YAML file. You could easily do it with a list of lists, which looks like this in YAML:

RequestHandlers:
- [/, framework.handlers.RootHandler]
- [/signup, framework.handlers.RegisterNow]
- [/login, framework.handlers.Login]
- [/faq, framework.handlers.FAQ]

This will form a list of lists that you can work with in your code that looks like this in the config dictionary:

{'RequestHandlers': [['/', 'framework.handlers.RootHandler'], ['/signup',
'framework.handlers.RegisterNow'], ['/login', 'framework.handlers.Login'],
['/faq', 'framework.handlers.FAQ']]}

If for some reason type conversion doesn’t work as you expect, or you need to represent, say, a boolean using a string like “y” or “Yes” instead of “True”, you can explicitly tag your value using tags defined in the YAML specification for this very purpose. Here’s how you’d explicitly tag “Yes” as a boolean, to insure it’s not parsed as a string:

verbose: !!bool "Yes"

When this is parsed by PyYAML, it will be a Python boolean, and the value when printed to the screen will be ‘True’ (without quotes). There are several other explicit type tags, including ‘!!int’, ‘!!float’, ‘!!null’, ‘!!timestamp’ and more.

If you like, you could alter our URL mapper from above and create a list of tuples. Note the use of the !!omap tag, which is short for ‘ordered mapping’:

RequestHandlers: !!omap
- /: framework.handlers.RootHandler
- /signup: framework.handlers.RegisterNow
- /login: framework.handlers.Login
- /faq: framework.handlers.FAQ

The resulting config dictionary looks like this:

{'RequestHandlers': [('/', 'framework.handlers.RootHandler'), ('/signup',
'framework.handlers.RegisterNow'), ('/login', 'framework.handlers.Login'),
('/faq', 'framework.handlers.FAQ')]}

More than once I’ve gone back to my YAML configuration to alter the type of data structure returned to better suit the code that uses it. It’s pretty convenient, and making the changes to both the configuration file and the code are typically easy enough to be considered a non-event.

Beyond Basic Data Types

The ‘level’ option in logging.basicConfig can be specified either as a word or a numeric value (internally, logging.DEBUG maps to the integer value 10). But what if you didn’t know this, or you didn’t have the option of using an integer? Specifying ‘logging.DEBUG’ in the config file wouldn’t have worked, because it would’ve come in as a string, and not an exposed module name.

If you don’t care about locking your configuration file to a language, PyYAML will let you do what you need using language-specific tags. So, for the purposes of our program, the following two lines in YAML produce the same effect:

level: 10
level: !!python/name:logging.DEBUG

You might also choose to do this because reading ‘logging.DEBUG’, even with the added tag overhead, is probably easier to understand than trying to figure out what “10″ means.

If you’re developing code that allows users to write plugins, you can also let them add their plugins by adding a simple line to a ‘plugin’ section of the YAML config file in such a way that the config dictionary itself will contain an actual new instance of the proper object:

Plugins:
- !!python/object/new:MyPlugin.Processor [logfile='foo.log']
- !!python/object/new:FooPluginModule.CementMixers.RotaryMixer
[consistency='chunky']

The above will produce a list of plugin instances with ‘args’ in the appended list fed to each classes __init__ method. Don’t forget that if you want to access the plugins by name instead of looping over a list, you can easily make this a dictionary. Also, PyYAML supports passing more intialization info to the class constructor.

Anchors and Aliases

You can create a block in your YAML config file, and then reference it in other sections of the configuration, and it can save you a lot of lines in a more complex configuration. This is done using anchors and aliases. An anchor starts with “&” and an alias (a reference to the anchor) begins with a “*”. So, let’s say you have multiple plugins loaded (continuing on from the example), and they all need their own configuration, but they’ll all connect to the same exact database server, and use the same credentials and db name, etc. Just create the db config once, make it an anchor, and reference it as needed:

DB: &MainDB
   server: localhost
   port: 6000
   user: dbuser
   db: myappdb
Plugins:
   loghandler: !!python/object/new:MyLogHandler
      args: ['mylogfile.log']
      db: *MainDB

When this is read in, the dictionary defined in &MainDB will appear as the value for the dict key ['Plugins']['loghandler']['db']. If you wanted to pass the *entire* config structure to your plugin, you technically wouldn’t need this, but I typically would only pass the portion of the config structure specifically dealing with the plugin, because configs can get large, and there could be lots of stuff that have nothing to do with the plugin in the rest of the config.

Moving Ahead

Although 90% of your use of PyYAML might well consist of loading a YAML file or message and working with the resulting data structure, it’s nice to know that it does provide quite a bit of flexibility if you’re willing to look for it. Here are some links for further reading about PyYAML, including a couple of items not covered in this tutorial:

Pass more initialization data to classes specified with !!python/object/new

Create your own app-specific tags, a la ‘!!bool’ and ‘!!python’.

Dump Python Objects to YAML

Tornado’s Big Feature is Not ‘Async’

I’ve been working with the Tornado web server pretty much since its release by the Facebook people several months ago. If you’ve never heard of it, it’s a sort of hybrid Python web framework and web server. On the framework side of the equation, Tornado has almost nothing. It’s completely bare bones when compared to something like Django. On the web server side, it is also pretty bare bones in terms of hardcore features like Apache’s ability to be a proxy and set up virtual hosts and all of that stuff. It does have some good performance numbers though, and the feature that seems to drive people to Tornado seems to be that it’s asynchronous, and pretty fast.

I think some people come away from their initial experiences with Tornado a little disheartened because only upon trying to benchmark their first real app do they come face to face with the reality of “asynchronous”: Tornado can be the best async framework out there, but the minute you need to talk to a resource for which there is no async driver, guess what? No async.

Some people might even leave the ring at this point, and that’s a shame, because to me the async features in Tornado aren’t what attract me to it at all.

Why Tornado, if Not For Async?

For me, there’s an enormous win in going with Tornado (or other things like it), and to get this benefit I’m willing to deal with some of Tornado’s warts and quirks. I’m willing to deal with the fact that the framework provides almost nothing I’m used to having after being completely spoiled by Django. What’s this magical feature you ask? It’s simply the knowledge that, in Tornado-land, there’s no such thing as mod_wsgi. And no mod_python either. There’s no mod_anything.

This means I don’t have to think about sys.path, relative vs. absolute paths, whether to use daemon or embedded mode, “Cannot be loaded as Python module” errors, “No such module” errors, permissions issues, subtle differences between Django’s dev server and Apache/mod_wsgi, reconciling all of these things when using/not using virtualenv, etc. It means I don’t have to metascript my way into a working application. I write the app. I run the app.

Wanna see how to create a Tornado app? Here’s one right here:

import tornado.httpserver
import tornado.ioloop
import tornado.web

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("This is a Tornado app")

application = tornado.web.Application([
    (r"/", MainHandler),
])

if __name__ == "__main__":
    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(8888)
    tornado.ioloop.IOLoop.instance().start()

Save this to whatever file you want, run it, and do ‘curl http://localhost:8888′ and you’ll see ‘This is a Tornado app’ on your console.

Simplistic? Yes, absolutely. But when you can just run this script, put it behind nginx, and have it working in under five minutes, you dig a little deeper and see what else you can do with this thing. Turns out, you can do quite a bit.

Can I Do Real Work With This?

I’ve actually been involved in a production launch of a non-trivial service running on Tornado, and it was mind-numbingly easy. It was several thousand lines of Python, all of which was written by two people, and the prototype was up and running inside of a month. Moving from prototype to production was a breeze, and the site has been solid since its launch a few months ago.

Do You Miss Django?

I miss *lots* of things about Django, sure. Most of all I miss Django’s documentation, but Tornado is *so* small that you actually can find what you need in the source code in 2 minutes or less, and since there aren’t a ton of moving parts, when you find what you’re looking for, you just read a few lines and you’re done: you’re not going to be backtracking across a bunch of files to figure out the process flow.

I also miss a lot of what I call Django’s ‘magic’. It sure does a lot to abstract away a lot of work. In place of that work, though, you’re forced to take on a learning curve that is steeper than most. I think it’s worth getting to know Django if you’re a web developer who hasn’t seen it before, because you’ll learn a lot about Python and how to architect a framework by digging in and getting your hands dirty. I’ve read seemingly most books about Django, and have done some development work in Django as well. I love it, but not for the ease of deployment.

I spent more time learning how to do really simple things with Django than it took to:

  1. Discover Tornado
  2. Download/install and run ‘hello world’
  3. Get a non-trivial, commercial application production-ready and launch it.

Deadlines, indeed!

Will You Still Work With (Django/Mingus/Pinax/Coltrane/Satchmo/etc)?

Sure. I’d rather not host it, but if I have to I’ll get by. These applications are all important, and I do like developing with them. It’s mainly deployment that I have issues with.

That’s not to say I wouldn’t like to see a more mature framework made available for Tornado either. I’ve worked on one, though it’s not really beyond the “app template” phase at this point. Once the app template is able to get out of its own way, I think more features will start to be added more quickly… but I digress.

In the end, the astute reader will note that my issue isn’t so much with Django-like frameworks (though I’ll note that they don’t suit every purpose), but rather with the current trend of using mod_wsgi for deployment. I’ll stop short of bashing mod_wsgi, because it too is an important project that has done wonders for the state of Python in web development. It really does *not* fit my brain at all, though, and I find when I step into a project that’s using it and it has mod_wsgi-related problems, identifying and fixing those problems is typically not a simple and straightforward affair.

So, if you’re like me and really want to develop on the web with Python, but mod_wsgi eludes you or just doesn’t fit your brain, I can recommend Tornado. It’s not perfect, and it doesn’t provide the breadth of features that Django does, but you can probably get most of your work done with it in the time it took you to get a mod_wsgi “Hello World!” app to not return a 500 error.

PyTPMOTW: py-amqplib

What’s This Module For?

To interact with a queue broker implementing version 0.8 of the Advanced Message Queueing Protocol (AMQP) standard. Copies of various versions of the specification can be found here. At time of writing, 0.10 is the latest version of the spec, but it seems that many popular implementations used in production environments today are still using 0.8, presumably awaiting a finalization of v.1.0 of the spec, which is a work in progress.

What is AMQP?

AMQP is a queuing/messaging protocol that is implemented by server daemons (called ‘brokers’) like RabbitMQ, ActiveMQ, Apache Qpid, Red Hat Enterprise MRG, and OpenAMQ. Though messaging protocols used in the enterprise are historically proprietary, AMQP has a bold and vocal stance that AMQP will be:

  • Broadly applicable for enterprise use
  • Totally open
  • Platform agnostic
  • Interoperable

The working group consists of several huge enterprises who have a vested interest in a protocol that meets these requirements. Most are either huge enterprises who are (or were) a victim of the proprietary lock-in that came with what will now likely become ‘legacy’ protocols, or implementers of the protocols, who will sell products and services around their implementation. Here’s a brief list of those involved in the AMQP working group:

  • JPMorgan Chase (the initial developers of the protocol, along with iMatix)
  • Goldman Sachs
  • Red Hat Software
  • Cisco Systems
  • Novell

Message brokers can facilitate an awfully large amount of flexibility in an architecture. They can be used to integrate applications across platforms and languages, enable asynchronous operations for web front ends, modularize and more easily distribute complex processing operations.

Basic Publishing

The first thing to know is that when you code against an AMQP broker, you’re dealing with a hierarchy: a ‘vhost’ contains one or more ‘exchanges’ which themselves can be bound to one or more ‘queues’. Here’s how you can programmatically create an exchange and queue, bind them together, and publish a message:

from amqplib import client_0_8 as amqp

conn = amqp.Connection(userid='guest', password='guest', host='localhost', virtual_host='/', ssl=False)

# Create a channel object, queue, exchange, and binding.
chan = conn.channel()
chan.queue_declare('myqueue', durable=True)
chan.exchange_declare('myexchange', type='direct', durable=True)
chan.queue_bind('myqueue', 'myexchange', routing_key='myq.myx')

# Create an AMQP message object

msg = amqp.Message('This is a test message')
chan.basic_publish(msg, 'myexchange', 'myq.myx')

As far as we know, we have one exchange and one queue on our server right now, and if that’s the case, then technically the routing key I’ve used isn’t required. However, I strongly suggest that you always use a routing key to avoid really odd (and implementation-specific) behavior like getting multiple copies of a message on the consumer side of the equation, or getting odd exceptions from the server. The routing key can be arbitrary text like I’ve used above, or you can use a common formula of using ‘.’ as your routing key. Just remember that without the routing key, the minute more than one queue is bound to an exchange, the exchange has no way of knowing which queue to route a message to. Remeber: you don’t publish to a queue, you publish to an exchange and tell it which queue it goes in via the routing key.

Basic Consumption

Now that we’ve published a message, how do we get our hands on it? There are two methods: basic_get, which will ‘get’ a single message from the queue, or ‘basic_consume’, which technically doesn’t get *any* messages: it registers a handler with the server and tells it to send messages along as they arrive, which is great for high-volume messaging operations.

Here’s the ‘basic_get’ version of a client to grab the message we just published:

msg = chan.basic_get(queue='myqueue', no_ack=False)
chan.basic_ack(msg.delivery_tag)

In the above, I’ve used the same channel I used to publish the message to get it back again using the basic_get operation. I then acknowledged receipt of the message by sending the server a ‘basic_ack’, passing along the delivery_tag the server included as part of the incoming message.

Consuming Mass Quantities

Using basic_consume takes a little more thought than basic_get, because basic_consume does nothing more than register a method with the server to tell it to start sending messages down the pipe. Once that’s done, however, it’s up to you to do a chan.wait() to wait for messages to show up, and find some elegant way of breaking out of this wait() operation. I’ve seen and used different techniques myself, and the right thing will depend on the application.

The basic_consume method also requires a callback method which is called for each incoming message, and is passed the amqp.Message object when it arrives.

Here’s a bit of code that defines a callback method, calls basic_consume, and does a chan.wait():

consumer_tag = 'foo'
def process(msg):
   txt = msg.body
   if '-1' in txt:
      print 'Got -1'
      chan.basic_cancel(consumer_tag)
      chan.close()
   else: 
      print 'Got message!'

chan.basic_consume('messages', callback=process, consumer_tag=consumer_tag)
while True:
   print 'Message processed. Next?'
   try:
      chan.wait()
   except IOError as out:
      print "Got an IOError: %s" % out
      break
   if not chan.is_open:
      print "Done processing. Later"
      break

So, basic_consume tells the server ‘Start sending any and all messages!’. The server registers a method with a name given by the consumer_tag argument, or it assigns one and it becomes the return value of basic_consume(). I define one here because I don’t want to run into race conditions where I want to call basic_cancel() with a consumer_tag variable that doesn’t exist yet, or is out of scope, or whatever. In the callback, I look for a sentinel message whose body contains ‘-1′, and at that point I call basic_cancel (passing in the consumer_tag so the server knows who to stop sending messages to), and I close the channel. In the ‘while True’, the channel object checks its status and exits if it’s not open.

The above example starts to uncover some issues with py-amqplib. It’s not clear how errors coming back from the server are handled, as opposed to errors caused by the processing code, for example. It’s also a little clumsy trying to determine the logic for breaking out of the loop. In this case there’s a sentinel message sent to the queue representing the final message on the stack, at which point our ‘process()’ callback closes the channel, but then the channel has to check its own status to move forward. Just returning False from process() doesn’t break out of the while loop, because it’s not looking for a return value from that function. We could have our process() function raise an error of its own as well, which might be a bit more elegant, if also a bit more work.

Moving Ahead

What I’ve covered here actually covers perhaps 90% of the common cases for amqplib, but there’s plenty more you can do with it. There are various exchange types, including fanout exchanges and topic exchanges, which can facilitate more interesting messaging and pub/sub models. To learn more about them, here are a couple of places to go for information:

Broadcasting your logs with RabbitMQ and Python
Rabbits and Warrens
RabbitMQ FAQ section “Messaging Concepts: Exchanges