Category Archives: Linux

Sending Alerts With Graphite Graphs From Nagios

Disclaimer

The way I’m doing this relies on a feature I wrote for Graphite that was only recently merged to trunk, so at time of writing that feature isn’t in a stable release. Hopefully it’ll be in 0.9.10. Until then, you can at least test this setup using Graphite’s trunk version.

Oh yeah, the new feature is the ability to send graph images (not links) via email. I surfaced this feature in Graphite through the graph menus that pop up when you click on a graph in Graphite, but implemented it such that it’s pretty easy to call from a script (which I also wrote – you’ll see if you read the post).

Also, note that I assume you already know Nagios, how to install new command scripts, and all that. It’s really easy to figure this stuff out in Nagios, and it’s well-documented elsewhere, so I don’t cover anything here but the configuration of this new feature.

The Idea

I’m not a huge fan of Nagios, to be honest. As far as I know, nobody really is. We all just use it because it’s there, and the alternatives are either overkill, unstable, too complex, or just don’t provide much value for all the extra overhead that comes with them (whether that’s config overhead, administrative overhead, processing overhead, or whatever depends on the specific alternative you’re looking at). So… Nagios it is.

One thing that *is* pretty nice about Nagios is that configuration is really dead simple. Another thing is that you can do pretty much whatever you want with it, and write code in any language you want to get things done. We’ll take advantage of these two features to actually do a couple of things:

  • Monitor a metric by polling Graphite for it directly
  • Tell Nagios to fire off a script that’ll go get the graph for the problematic metric, and send email with the graph embedded in it to the configured contacts.
  • Record that we sent the alert back in Graphite, so we can overlay those events on the corresponding metric graph and verify that alerts are going out when they should, that the outgoing alerts are hitting your phone without delay, etc.

The Candy

Just to be clear, we’re going to set things up so you can get alert messages from Nagios that look like this (click to enlarge):

And you’ll also be able to track those alert events in Graphite in graphs that look like this (click to enlarge, and note the vertical lines – those are the alert events.):

Defining Contacts

In production, it’s possible that the proper contacts and contact groups already exist. For testing (and maybe production) you might find that you want to limit who receives graphite graphs in email notifications. To test things out, I defined:

  • A new contact template that’s configured specifically to receive the graphite graphs. Without this, no graphs.
  • A new contact that uses the template
  • A new contact group containing said contact.

For testing, you can create a test contact in templates.cfg:

define contact{
        name                            graphite-contact 
        service_notification_period     24x7            
        host_notification_period        24x7 
        service_notification_options    w,u,c,r,f,s 
        host_notification_options       d,u,r,f,s  
        service_notification_commands   notify-svcgraph-by-email
        host_notification_commands      notify-host-by-email
        register                        0
        }

You’ll notice a few things here:

  • This is not a contact, only a template.
  • Any contact defined using this template will be notified of service issues with the command ‘notify-svcgraph-by-email’, which we’ll define in a moment.

In contacts.cfg, you can now define an individual contact that uses the graphite-contact template we just assembled:

define contact{
        contact_name    graphiteuser
        use             graphite-contact 
        alias           Graphite User
        email           someone@example.com 
        }

Of course, you’ll want to change the ‘email’ attribute here, even for testing.

Once done, you also want to have a contact group set up that contains this new ‘graphiteuser’, so that you can add users to the group to expand the testing, or evolve things into production. This is also done in contacts.cfg:

define contactgroup{
        contactgroup_name       graphiteadmins
        alias                   Graphite Administrators
        members                 graphiteuser
        }

Defining a Service

Also for testing, you can set up a test service, necessary in this case to bypass default settings that seek to not bombard contacts by sending an email for every single aberrant check. Since the end result of this test is to see an email, we want to get an email for every check where the values are in any way out of bounds. In templates.cfg put this:

define service{
    name                        test-service
    use                         generic-service
    passive_checks_enabled      0
    contact_groups              graphiteadmins
    check_interval              20
    retry_interval              2
    notification_options        w,u,c,r,f
    notification_interval       30
    first_notification_delay    0
    flap_detection_enabled      1
    max_check_attempts          2
    register                    0
    }

Again, the key point here is to insure that no notifications are ever silenced, deferred, or delayed by nagios in any way, for any reason. You probably don’t want this in production. The other point is that when you set up an alert for a service that uses ‘test-service’ in its definition, the alerts will go to our previously defined ‘graphiteadmins’.

To make use of this service, I’ve defined a service in ‘localhost.cfg’ that will require further explanation, but first let’s just look at the definition:

define service{
        use                             test-service 
        host_name                       localhost
        service_description             Some Important Metric
        check_command                   check_graphite_data!24!36
        notifications_enabled           1
        }

There are two new things we need to understand when looking at this definition:

  • What is ‘check_graphite_data’?
  • What is ‘_GRAPHURL’?

These questions are answered in the following section.

In addition, you should know that the value for _GRAPHURL is intended to come straight from the Graphite dashboard. Go to your dashboard, pick a graph of a single metric, grab the URL for the graph, and paste it in (and double-quote it).

Defining the ‘check_graphite_data’ Command

This command relies on a small script written by the folks at Etsy, which can be found on github: https://github.com/etsy/nagios_tools/blob/master/check_graphite_data

Here’s the commands.cfg definition for the command:

# 'check_graphite_data' command definition
define command{
        command_name    check_graphite_data
        command_line    $USER1$/check_graphite_data -u $_SERVICEGRAPHURL$ -w $ARG1$ -c $ARG2$
        }

The ‘command_line’ attribute calls the check_graphite_data script we got on github earlier. The ‘-u’ flag is a URL, and this is actually using the custom object attribute ‘_GRAPHURL’ from our service definition. You can see more about custom object variables here: http://nagios.sourceforge.net/docs/3_0/customobjectvars.html - the short story is that, since we defined _GRAPHURL in a service definition, it gets prepended with ‘SERVICE’, and the underscore in ‘_GRAPHURL’ moves to the front, giving you ‘$_SERVICEGRAPHURL’. More on how that works at the link provided.

The ‘-w’ and ‘-c’ flags to check_graphte_data are ‘warning’ and ‘critical’ thresholds, respectively, and they correlate to the positions of the service definition’s ‘check_command’ arguments (so, check_graphite_data!24!36 maps to ‘check_graphite_data -u <url> -w 24 -c 36′)

Defining the ‘notify-svcgraph-by-email’ Command

This command relies on a script that I wrote in Python called ‘sendgraph.py’, which also lives in github: https://gist.github.com/1902478

The script does two things:

  • It emails the graph that corresponds to the metric being checked by Nagios, and
  • It pings back to graphite to record the alert itself as an event, so you can define a graph for, say, ‘Apache Load’, and if you use this script to alert on that metric, you can also overlay the alert events on top of the ‘Apache Load’ graph, and vet that alerts are going out when you expect. It’s also a good test to see that you’re actually getting the alerts this script tries to send, and that they’re not being dropped or seriously delayed.

To make use of the script in nagios, lets define the command that actually sends the alert:

define command{
    command_name    notify-svcgraph-by-email
    command_line    /path/to/sendgraph.py -u "$_SERVICEGRAPHURL$" -t $CONTACTEMAIL$ -n "$SERVICEDESC$" -s $SERVICESTATE$
    }

A couple of quick notes:

  • Notice that you need to double-quote any variables in the ‘command_line’ that might contain spaces.
  • For a definition of the command line flags, see sendgraph.py’s –help output.
  • Just to close the loop, note that notify-svcgraph-by-email is the ‘service_notification_commands’ value in our initial contact template (the very first listing in this post)

Fire It Up

Fire up your Nagios daemon to take it for a spin. For testing, make sure you set the check_graphite_data thresholds to numbers that are pretty much guaranteed to trigger an alert when Graphite is polled. Hope this helps! If you have questions, first make sure you’re using Graphite’s ‘trunk’ branch, and not 0.9.9, and then give me a shout in the comments.

The Python User Group in Princeton (PUG-IP): 6 months in

In May, 2011, I started putting out feelers on Twitter and elsewhere to see if there might be some interest in having a Python user group that was not in Philadelphia or New York City. A single tweet resulted in 5 positive responses, which I took as a success, given the time-sensitivity of Twitter, my “reach” on Twitter (which I assume is far smaller than what might be the entire target audience for that tweet), etc.

Happy with the responses I received, I still wanted to take a baby step in getting the group started. Rather than set up a web site that I’d then have to maintain, a mailing list server, etc., I went to the cloud. I started a group on meetup.com, and started looking for places to hold our first meeting.

Meetup.com

Meetup.com, I’m convinced, gives you an enormous value if you’re looking to start a user group Right Now, Today™. For $12/mo., you get a place where you can announce future meetups, hold discussions, collect RSVPs so you have a head count for food or space or whatever, and vendors can also easily jump in to provide sponsorship or ‘perks’ in the form of discounts on services to user group members and the like. It’s a lot for a little, and it’s worked well enough. If we had to stick with it for another year, I’d have no real issue with that.

Google Groups

I set up a mailing list using Google Groups about 2-3 months ago now. I only waited so long because I thought meetup.com’s discussion forum might work for a while. After a few meetings, though, I noticed that there were always about five more people in attendance than had RSVP’d on meetup.com. Some people just aren’t going to be bothered with having yet another account on yet another web site I guess. If that’s the case, then I have two choices (maybe more, but these jumped to mind): force the issue by constantly trumpeting meetup.com’s service, or go where everyone already was. Most people have a Google account, and understand its services. Also, since the group is made up of technical people, they mostly like the passive nature of a mailing list as opposed to web forums.

If you’re setting up a group, I’d say that setting up a group on meetup.com and simultaneously setting up a Google group mailing list is the way to go if you want to get a fairly complete set of services for very little money and about an hour’s worth of time.

Meeting Space

Meeting space can come from a lot of different places, but I had a bit of trouble settling on a place at first. Princeton University is an awesome place and has a ton of fantastic places to meet with people, but if you’re not living on campus (almost no students are group members, btw), parking can be a bit troublesome, and Princeton University is famous for having little or no signage, and that includes building names, so finding where to go even if you did find parking can be problematic. So, so far, the University is out.

The only sponsor I had that was willing to provide space was my employer, but we’re nowhere near Princeton, and don’t really have the space. Getting a sponsor for space can be a bit difficult when your group doesn’t exist yet, in part because none of them have engaged with you or your group until the first meeting, when the attendees, who all work for potential sponsors, show up.

I started looking at the web site for the Princeton Public Library. I’ve been involved in the local Linux user group for several years, and they use free meeting space made available by the public library in Lawrenceville, which borders Princeton. I wondered if the Princeton Public Library did this as well, but they don’t, actually. In fact, meeting space at that location can get pretty expensive, since they charge for the space and A/V equipment like projectors and stuff separately (or they did when I started the group – I believe it’s still the case).

I believe I tweeted my disappointment about the cost of meeting at the Princeton Public Library, and did a callout on Twitter for space sponsors and other ideas about meeting space in or near Princeton. The Princeton Public Library got in touch through their @PrincetonPL Twitter account, and we were able to work out a really awesome deal where they became a sponsor, and agreed to host our group for 6 months, free of charge. Awesome!

Now, six months in, we either had to come to some other agreement with the library, or move on to a new space. After six months, it’s way easier to find space, or sponsors who might provide space, but I felt if we could find some way to continue the relationship with the library, it’d be best not to relocate the group. We wound up finding a deal that does good things for the group, the library, the local Python user community, and the evangelism of the Python language….

Knowledge for Space

Our group got a few volunteers together to commit to providing a 5-week training course to the public, held at the Princeton Public Library. Adding public offerings like this adds value to the library, attracts potential new members (they’re a member-supported library, not a state/municipality-funded one), etc. In exchange for providing this service to the library, the library provides us with free meeting space, including the A/V equipment.

If you don’t happen to have a public library that offers courses, seminars, etc., to the general public, you might be able to cut a similar deal with a local community college, or even high school. If you know of a corporation locally that uses Python or some other technology the group can speak or train people in, you might be able to trade training for meeting space in their offices. Training is a valued perk to the employees of most corporations.

How To Get Talks (or “How we stopped caring about getting talks”)

Whether you’re running a publishing outfit, a training event, or user group, getting people to deliver content is a challenge. Some people don’t think they have any business talking to what they perceive as a roomful of geniuses about anything. Some just aren’t comfortable talking in front of audiences, but are otherwise convinced of their own genius. Our group is trying to attack this issue in various ways, and so far it seems to be working well enough, though more ideas are welcome!

Basically, the group isn’t necessarily locked into traditions like “Thou shalt provide a speaker, who shalt bequeath upon our many wisdom of the ages”. Once you’ve decided as a group that having cookie-cutter meetings isn’t necessary, you start to think of all sorts of things you could all be doing together.

Below are some ideas, some in the works, some in planning, that I hope help other would-be group starters to get the ball rolling, and keep it in motion!

Projects For the Group, By the Group

Some members of PUG-IP are working together on building the pugip.org website, which is housed in a GitHub repository under the ‘pugip’ GitHub organization. This one project will inevitably result in all kinds of home-grown presentations & events within the group. As new ideas come up and new features are implemented, people will give lightning talks about their implementation, or we’ll do a group peer review of the code, or we’ll have speakers give talks about third-party technologies we might use (so, we might have two speakers each give a 30-minute talk about two different NoSQL solutions, for example. We’ve already had a great overview of about 10 different Python micro-frameworks), etc.

We may also decide to break up into pairs, and then sprint together on a set of features, or a particularly large feature, or something like that.

As of now, we’ve made enough decisions as a group to get the ball rolling. If there’s any interest I can blog about the setup that allows the group to easily share, review, and test code, provide live demos of their work, etc. The tl;dr version is we use GitHub and free heroku accounts, but new ideas come into play all the time. Just today I was wondering if we could, as a group, make use of the cloud9 IDE (http://cloud9ide.com).

The website is a great idea, but other group projects are likely to come up.

Community Outreach

PUG-IPs first official community outreach project will be the training we provide through the Princeton Public library. A few of us will collaborate on delivering the training, but the rest of the group will be involved in providing feedback on various aspects of the material, etc., so it’s a ‘whole group’ project, really. On top of increasing interactivity among the group members, outreach is also a great way to grow and diversify the group, and perhaps gain sponsorships as well!

There’s another area group called LUG-IP (a Linux user group) that also does some community outreach through a hardware SIG (special interest group), certification training sessions, and participating in local computing events and conferences. I’d like to see PUG-IP do this, too, maybe in collaboration with the LUG (they’re a good and passionate group of technologists).

Community outreach can also mean teaming up with various other technology groups, and one event I’m really looking forward to is a RedSnake meeting to be held next February. A RedSnake meeting is a combined meeting between PhillyPUG (the Philadelphia Python User Group) and Philly.rb (the Philadelphia Ruby Group). As a member of PhillyPUG I participated in last year’s RedSnake meeting, and it was a fantastic success. Probably 70+ people in attendance (here’s a pic at the end – some had already left by the time someone snapped this), and perhaps 10 or so lightning talks given by members of both organizations. We tried to do a ‘matching’ talk agenda at the meeting, so if someone on the Ruby side did a testing talk, we followed that with a Python testing talk, etc. It was a ton of fun, and the audience was amazing.

Socials

Socials don’t have to be dedicated events, per se. For example, PUG-IP has a sort of mini-social after every single meetup. We’re lucky to have our meetings located about a block away from a brewpub, so after each meeting, perhaps half of us make it over for a couple of beers and some great conversations. After a few of these socials, I started noticing that more talk proposals started to spring up.

Of course, socials can also be dedicated events. Maybe some day PUG-IP will…. I dunno… go bowling? Or maybe we’ll go as a group to see the next big geeky movie that comes out. Maybe we’ll have some kind of all-inclusive, bring-the-kids BBQ next summer. Who knows?

As a sort of sideshow event to the main LUG meetings, LUG-IP has a regularly-scheduled ‘coffee klatch’. Some of the members meet up one Sunday per month at (if memory serves) 8-11AM at a local Panera for coffee, pastries, and geekery. It’s completely informal, but it’s a good time.

Why Not Having Talks Will Help You Get Talks

I have a theory that is perhaps half-proven through my experiences with technology user groups: increasing engagement among and between the members of the group in a way that doesn’t shine a huge floodlight on a single individual (like a talk would) eventually breaks down whatever fears or resistance there is to proposing and giving a talk. Sometimes it’s just a comfort level thing, and working on projects, or having a beer, or sprinting on code, etc. — together — turns a “talking in front of strangers” experience into more of a “sharing with my buddies” one.

I hope that’s true, anyway. It seems to be. :)

Thanks For Reading

I hope someone finds this useful. It’s early on in the life of PUG-IP, but I thought it would be valuable to get these ideas out into the ether early and often before they slip from my brain. Good luck with your groups!

pyrabbit Makes Testing and Managing RabbitMQ Easy

I have a lot of hobby projects, and as a result getting any one of them to a state where I wouldn’t be completely embarrassed to share it takes forever. I started working on pyrabbit around May or June of this year, and I’m happy to say that, while it’ll never be totally ‘done’ (it is software, after all), it’s now in a state where I’m not embarrassed to say I wrote it.

What is it?

It’s a Python module to help make managing and testing RabbitMQ servers easy. RabbitMQ has, for some time, made available a RESTful interface for programmatically performing all of the operations you would otherwise perform using their browser-based management interface.

So, pyrabbit lets you write code to manipulate resources like vhosts & exchanges, publish and get messages, set permissions, and get information on the running state of the broker instance. Note that it’s *not* suitable for writing AMQP consumer or producer applications; for that you want an *AMQP* module like pika.

PyRabbit is tested with Python versions 2.6-3.2. The testing is automated using tox. In fact, PyRabbit was a project I started in part because I wanted to play with tox.

Here’s the example, ripped from the documentation (which is ripped right from my own terminal session):

>>> from pyrabbit.api import Client
>>> cl = Client('localhost:55672', 'guest', 'guest')
>>> cl.is_alive()
True
>>> cl.create_vhost('example_vhost')
True
>>> [i['name'] for i in cl.get_all_vhosts()]
[u'/', u'diabolica', u'example_vhost', u'testvhost']
>>> cl.get_vhost_names()
[u'/', u'diabolica', u'example_vhost', u'testvhost']
>>> cl.set_vhost_permissions('example_vhost', 'guest', '.*', '.*', '.*')
True
>>> cl.create_exchange('example_vhost', 'example_exchange', 'direct')
True
>>> cl.get_exchange('example_vhost', 'example_exchange')
{u'name': u'example_exchange', u'durable': True, u'vhost': u'example_vhost', u'internal': False, u'arguments': {}, u'type': u'direct', u'auto_delete': False}
>>> cl.create_queue('example_queue', 'example_vhost')
True
>>> cl.create_binding('example_vhost', 'example_exchange', 'example_queue', 'my.rtkey')
True
>>> cl.publish('example_vhost', 'example_exchange', 'my.rtkey', 'example message payload')
True
>>> cl.get_messages('example_vhost', 'example_queue')
[{u'payload': u'example message payload', u'exchange': u'example_exchange', u'routing_key': u'my.rtkey', u'payload_bytes': 23, u'message_count': 2, u'payload_encoding': u'string', u'redelivered': False, u'properties': []}]
>>> cl.delete_vhost('example_vhost')
True
>>> [i['name'] for i in cl.get_all_vhosts()]
[u'/', u'diabolica', u'testvhost']

Hopefully you’ll agree that this is simple enough to use in a Python interpreter to get information and do things with RabbitMQ ‘on the fly’.

How Can I Get It?

Well, there’s already a package on PyPI called ‘pyrabbit’, and it’s not mine. It’s some planning-stage project that has no actual software associated with it. I’m not sure when the project was created, but the PyPI page has a broken home page link, and what looks like a broken RST-formatted doc section. I’ve already pinged someone to see if it’s possible to take over the name, because I can’t think of a cool name to change it to.

Until that issue is cleared up, you can get downloadable packages or clone/fork the code at the pyrabbit github page (see the ‘Tags’ section for downloads), and the documentation is hosted on the (awesome) ReadTheDocs.org site.

Slides, an App, a Meetup, and More On the Way

I’ve been busy. Seriously. Here’s a short dump of what I’ve been up to with links and stuff. Hopefully it’ll do until I can get back to my regular blogging routine.

PICC ’11 Slides Posted

I gave a Python talk at PICC ’11. If you were there, then you have a suboptimal version of the slides, both because I caught a few bugs, and also because they’re in a flattened, lifeless PDF file, which sort of mangles anything even slightly fancy. I’m not sure how much value you’ll get out of these because my presentation slides tend to present code that I then explain, and you won’t have the explanation, but people are asking, so here they are in all their glory. Enjoy!

I Made a Webapp Designed To Fail

No really, I did. WebStatusCodes is the product of necessity. I’m writing a Python module that provides an easy way for people to talk to a web API. I test my code, and for some of the tests I want to make sure my code reacts properly to certain HTTP errors (or in some cases, to *any* HTTP status code that’s not 200). In unit tests this isn’t hard, but when you’re starting to test the network layers and beyond, you need something on the network to provide the errors. That’s what WebStatusCodes does. It’s also a simple-but-handy reference for HTTP status codes, though it is incomplete (418 I’m a teapot is not supported). Still, worth checking out.

Interesting to note, this is my first AppEngine application, and I believe it took me 20 minutes to download the SDK, get something working, and get it deployed. It was like one of those ‘build a blog in -15 minutes’ moments. Empowering the speed at which you can create things on AppEngine, though I’d be slow to consider it for anything much more complex.

Systems and Devops People, Hack With Me!

I like systems-land, and a while back I was stuck writing some reporting code, which I really don’t like, so I started a side project to see just how much cool stuff I could do using the /proc filesystem and nothing but pure Python. I didn’t get too far because the reporting project ended and I jumped back into all kinds of other goodness, but there’s a github project called pyproc that’s just a single file with a few functions in it right now, and I’d like to see it grow, so fork it and send me pull requests. If you know Linux systems pretty well but are relatively new to Python, I’ll lend you a hand where I can, though time will be a little limited until the book is done (see further down).

The other projects I’m working on are sort of in pursuit of larger fish in the Devops waters, too, so be sure to check out the other projects I mention later in this post, and follow me on github.

Python Meetup Group in Princeton NJ

I started a Meetup group for Pythonistas that probably work in NYC or PA, but live in NJ. I work in PA, and before this group existed, the closest group was in Philly, an hour from home. I put my feelers out on Twitter, found some interest, put up a quick Meetup site, and we had 13 people at the first meetup (more than had RSVP’d). It’s a great group of folks, but more is always better, so check it out if you’re in the area. We hold meetings at the beautiful Princeton Public Library (who found us on twitter and now sponsors the group!), which is just a block or so from Triumph, the local microbrewery. I’m hoping to have a post-meeting impromptu happy hour there at some point.

Python Cookbook Progress

The Python Cookbook continues its march toward production. Lots of work has been done, lots of lessons have been learned, lots of teeth have been gnashed. The book is gonna rock, though. I had the great pleasure of porting all of the existing recipes that are likely to be kept over to Python 3. Great fun. It’s really amazing to see just how it happens that a 20-line recipe is completely obviated by the addition of a single, simple language feature. It’s happened in almost every chapter I’ve looked at so far.

If you have a recipe, or stumble upon a good example of some language feature, module, or other useful tidbit, whether it runs in Python 3 or not, let me know (see ‘Contact Me’). The book is 100% Python 3, but I’ve gotten fairly adept at porting things over by now :) Send me your links, your code, or whatever. If we use the recipe, the author will be credited in the book, of course.

PyRabbit is Coming

In the next few days I’ll be releasing a Python module on github that will let you easily work with RabbitMQ servers using that product’s HTTP management API. It’s not nearly complete, which is why I’m releasing it. It does some cool stuff already, but I need another helper or two to add new features and help do some research into how RabbitMQ broker configuration affects JSON responses from the API. Follow me on github if you want to be the first to know when I get it released. You probably also want to follow myYearbook on github since that’s where I work, and I might release it through the myYearbook github organization (where we also release lots of other cool open source stuff).

Python Asynchronous AMQP Consumer Module

I’m also about 1/3 of the way through a project that lets you write AMQP consumers using the same basic model as you’d write a Tornado application: write your handler, import the server, link the two (like, one line of code), and call consume(). In fact, it uses the Tornado IOLoop, as well as Pika, which is an asynchronous AMQP module in Python (maintained by none other than my boss and myYearbook CTO,  @crad), which also happens to support the Tornado IOLoop directly.

‘Grokking Python’ Going to PICC Conference!

In conjunction with my involvement as co-author of the upcoming Python Cookbook, 3rd Ed. (not yet released), a tutorial at this year’s PyCon in Atlanta, an internal (and ongoing) lunchtime seminar series entitled ‘Snakes On a Plate’, and other recent Python-related projects, I’ve also been refining and revising what I can now call a completely awesome 3-hour introduction to the Python programming language.

If you’re a sysadmin, operations engineer, devops engineer, or just want to get your hands dirty with Python, I can’t think of a better more cost-effective way to do it than to attend the ‘Grokking Python’ tutorial at this year’s PICC conference, which is being held in New Brunswick, NJ, April 29-30.

While I do plan for the tutorial to run through the basics, I also assume attendees have programmed in some other language before. In addition, I firmly believe that, properly presented, most would find that Python is a very simple language to get to know and understand. That being the case, the most basic elements of the language (control statements, loops, etc) will be covered in the first hour (and the materials will be available for later reference).

Once we’re through that, it’s head first into what admin/ops engineers do for a living. Python was developed by a systems programmer for systems programming. As such, support for a huge swath of admin tasks (and far, far beyond) is baked into the language, and enormous tomes have been written covering third party tools and modules to do anything else you can possibly imagine.

We’re going to look at some of the more ho-hum parts of scripting, like accepting input from users, command line options and arguments, and file handling, but before it’s over we’re going to have a look at the basics of email, networking, multiprocessing, threading, coroutines, SSH, and more.

We’re also going to cover use of the Python interactive shell, which will not only help speed your mastery of the language and its standard library, but also holds promise as a sysadmin tool in its own right.

The blowing of minds is a goal of the tutorial. Bring a laptop, and bring some bandages ;-)

Brain Fried Over NoSQL

So, I’m working on a pet project. It’s in stealth mode. Just kidding — I don’t believe in stealth mode ;-)

It’s a twitter analytics dashboard that actually does useful things with the mountains of data available from the various Twitter APIs. I’m writing it in Python using Tornado. Here’s the first mockup I ever did for it, just like 2 nights ago:

It’s already a lot of fun. I’ve worked with Tornado before and like it a lot. I have most of the base infrastructure questions answered, because this is a pet project and they’re mostly easy and in some sense “don’t matter”. But that’s what has me stuck.

It Doesn’t Matter

It’s true. Past a certain point, belaboring choices of what tools to use where is pointless and is probably premature optimization. I’ve been working with startups for the past few years, and I’m painfully aware of what happens when a company takes too long to react to their popularity. I want to architect around that at the start, but I’m resisting. It’s a pet project.

But if it doesn’t matter, that means I can choose tools that are going to be fun to dig into and learn about. I’ve been so busy writing code to help avoid or buffer impact to the database that I haven’t played a whole lot with the NoSQL choices out there, and there are tons of them. And they all have a different world view and a unique approach to providing solutions to what I see as somewhat different problems.

Why NoSQL?

Why not? I’ve been working with relational database systems since 1998. I worked on large data reporting projects, a couple of huge data warehousing projects, financial transaction systems, I worked for Sybase as a consulting DBA and project manager for a while, I was into MySQL and PostgreSQL by 2000, used them in production environments starting around 2001-02… I understand them fairly well. I also understand BDB and other “flat-file” databases and object stores. SQLite has become unavoidable in the past few years as well. It’s not like I don’t understand the compromises I’m making going to a NoSQL system.

There’s a good bit of talk from the RDBMS camp (seriously, why do they need their own camp?) about why NoSQL is bad. Lots of people who know me  would put me in the RDBMS camp, and I’m telling you not to cry yourself to sleep out of guilt over a desire to get to know these systems. They’re interesting, and they solve some huge issues surrounding scalability with greater ease than an RDBMS.

Like what? Well, cost for one. If I could afford Oracle I’d sooner use that than go NoSQL in all likelihood. I can’t afford it. Not even close. Oracle might as well charge me a small planet for their product. It’s great stuff, but out of reach. And what about sharding? Sharding a relational database sucks, and to try to hide the fact that it sucks requires you to pile on all kinds of other crap like query proxies, pools, and replication engines, all in an effort to make this beast do something it wasn’t meant to do: scale beyond a single box. All this stuff also attempts to mask the reality that you’ve also thrown your hands in the air with respect to at least 2 letters that make up the ACID acronym. What’s an RDBMS buying you at that point? Complexity.

And there’s another cost, by the way: no startup I know has the kind of enormous hardware that an enterprise has. They have access to commodity hardware. Pizza boxes. Don’t even get me started on storage. I’ve yet to see SSD or flash storage at a startup. I currently work at MyYearbook.com, and there are some pretty hefty database servers there, but it can hardly be called a startup anymore. Hell, they’re even profitable! ;-)

Where Do I Start?

One nice thing about relationland is I know the landscape pretty well. Going to NoSQL is like dropping me in a country I’ve never heard of where I don’t really speak the language. I have some familiarity with key-value stores from dealing with BDB and Memcache, and I’ve played with MongoDB a bit (using pymongo), but that’s just the tip of the iceberg.

I heard my boss mention Tokyo Tyrant a few times, so I looked into it. It seems to be one of the more obscure solutions out there from the standpoint of adoption, community, documentation, etc., but it does appear to be very capable on a technical level. However, my application is going to be number-heavy, and I’m not going to need to own all of the data required to provide the service. I can probably get away with just incrementing counters in Memcache for some of this work. For persistence I need something that will let me do aggregation *FAST* without having to create aggregation tables, ideally. Using a key/value store for counters really just seems like a no-brainer.

That said, I think what I’ve decided to do, since it doesn’t matter, is punt on this decision in favor of getting a working application up quickly.

MySQL

Yup. I’m going to pick one or two features of the application to implement as a ‘first cut’, and back them with a MySQL database. I know it well, Tornado has a built-in interface for it, and it’s not going to be a permanent part of the infrastructure (otherwise I’d choose PostgreSQL in all likelihood).

To be honest, I don’t think the challenge in bringing this application to life are really related to the data model or the engine/interface used to access it (though if I’m lucky that’ll be a major part of keeping it alive). No, the real problem I’m faced with is completely unrelated to these considerations…

Twitter’s API Service

Not the API itself, per se, but the service providing access to it, and the way it’s administered, is going to be a huge challenge. It’s not just the Twitter website that’s inconsistent, the API service goes right along. Not only that, but the type of data I really need to make this application useful isn’t immediately available from the API as far as I can tell.

Twitter maintains rate limits on the API. You can only make so many calls over so short a period of time. That alone makes providing an application like this to a lot of people a bit of a challenge. Compounding the issue is that, when there are failwhales washing up on the shores, those limits can be dynamically decreased. Ugh.

I guess it’s not a project for the faint of heart, but it’ll drive home some golden rules that are easy to neglect in other projects, like planning for failure (of both my application, and Twitter). Also, it’ll be a lot of fun.

Why Open Shop In California?

DISCLAIMER: I live on the East Coast, so these are perceptions and opinions that I don’t put forth as facts. I’m more asking a question to start a dialog than professing knowledge.

So, I just heard a report claiming that there are more IT jobs than techs to fill them in Southern California. Anyone who ever reads a tech job board and/or TechCrunch has also no doubt taken note that a vast majority of startups seem to be starting up there, and that there are just a metric asston of jobs there anyway.

This boggles my mind. This is a place with an extremely high cost of living, making labor more expensive. At the same time, aren’t there rolling power outages in CA? Does that not effect corporations or something? Do they just move their datacenters across the border to another state?

Between what I would think is an amazingly high labor cost and what I would think is an unfavorable place in terms of simple things like availability of power, I would think more places would look elsewhere for expansion or startups.

I live within spitting distance of at least 5 universities with engineering departments that I think would rate at the very least “solid”, many would rate better. I would guess that I could get to any Ivy League school in 6 hours or less, driving (3 are within an hour of my NJ home). MIT and Stevens are very good non-Ivy schools, and lots of other ones like Rutgers, NJIT, Penn State, NYU, and lots more are here, and those are just a few of the ones between NYC and Philadelphia, which are less than 2 hours apart. So…. there’s a labor pool here.

Is it tax breaks? Some aspect of the political atmosphere? Transportation? Is San Francisco such a clean, safe, friendly city that you just deal with the nonsense to live there?

What’s your take on this?

Python IDE Frustration

I didn’t think I was looking for a lot in an IDE. Turns out what I want is impossibly hard to find.

In the past 6 months I’ve tried (or tried to try):

  • Komodo Edit
  • Eclipse w/ PyDev
  • PyCharm (from the first EAP build to… yesterday)
  • Wingware
  • Textmate

Wingware

First, let’s get Wingware out of the way. I’m on a Mac, and if you’re not going to develop for the Mac, I’m not going to pay you hundreds of dollars for your product. Period. I don’t even use free software that requires X11. Lemme know when you figure out that coders like Macs and I’ll try Wingware.

Komodo Edit

Well, I wanted to try the IDE but I downloaded it, launched it once for 5 minutes (maybe less), forgot about it, and now my trial is over. I’ll email sales about this tomorrow. In the meantime, I use Komodo Edit.

Komodo Edit is pretty nice. One thing I like about it is that it doesn’t really go overboard forcing its world view down my throat. If I’m working on bunny, which is a one-file Python project I keep in a git repository, I don’t have to figure out their system for managing projects. I can just “Open File” and use it as a text editor.

It has “ok” support for Vi key bindings, and it’s not a plugin: it’s built in. The support has some annoying limitations, but for about 85% of what I need it to do it’s fine. One big annoyance is that I can’t write out a file and assign it a name (e.g. ‘:w /some/filename.txt’). It’s not supported.

Komodo Edit, unless I missed it, doesn’t integrate with Git, and doesn’t offer a Python console. Its capabilities in the area of collaboration in general are weak. I don’t absolutely have to have them, but things like that are nice for keeping focused and not having to switch away from the window to do anything else, so ideally I could get an IDE that has this. I believe Komodo IDE has these things, so I’m looking forward to trying it out.

Komodo is pretty quick compared to most IDEs, and has always been rock solid stable for me on both Mac and Linux, so if I’m not in the mood to use Vim, or I need to work on lots of files at once, Komodo Edit is currently my ‘go-to’ IDE.

PyCharm

PyCharm doesn’t have an officially supported release. I’ve been using Early Adopter Previews since the first one, though. When it’s finally stable I’m definitely going to revisit it, because to be honest… it’s kinda dreamy.

Git integration is very good. I used it with GitHub without incident for some time, but these are early adopter releases, and things happen: two separate EAP releases of PyCharm made my project files completely disappear without warning, error, or any indication that anything was wrong at all. Of course, this is git, so running ‘git checkout -f’ brought things back just fine, but it’s unsettling, so now I’m just waiting for the EAP to be over with and I’ll check it out when it’s done.

I think for the most part, PyCharm nails it. This is the IDE I want to be using assuming the stability issues are worked out (and I don’t have reason to believe they won’t be). It gives me a Python console, VCS integration, a good class and project browser, some nice code analytics, and more complex syntax checking that “just works” than I’ve seen elsewhere. It’s a pretty handsome, very intuitive IDE, and it leverages an underlying platform whose plugins are available to PyCharm users as well, so my Vim keys are there (and, by the way, the IDEAVim plugin is the most advanced Vim support I’ve seen in any IDE, hands down).

Eclipse with PyDev

One thing I learned from using PyCharm and Eclipse is that where tools like this are concerned, I really prefer a specialized tool to a generic one with plugins layered on to provide the necessary functionality. Eclipse with PyDev really feels to me like a Java IDE that you have to spend time laboriously chiseling, drilling, and hammering to get it to do what you need if you’re not a Java developer. The configuration is extremely unintuitive, with a profuse array of dialogs, menus, options, options about options and menus, menus about menus and options… it never seems to end.

All told, I’ve probably spent the equivalent of 2 working days mucking with Eclipse configuration, and I’ve only been able to get it “pretty close” to where I want it. The Java-loving underpinnings of the Eclipse platform simply cannot be suppressed, while things I had to layer on with plugins don’t show up in the expected places.

Add to this Eclipse’s world-view, which reads something like “there is no filesystem tree: only projects”, and you have a really damned annoying IDE. I’ve tried on and off for over a year to make friends with Eclipse because of the good things I hear about PyDev, but it just feels like a big hacky, duct-taped mess to me, and if PyCharm has proven anything to me, it’s that building a language specific IDE on an underlying platform devoted to Java doesn’t have to be like this. When I finally got it to some kind of usable point, and after going through the “fonts and colors” maze, it turns out the syntax highlighting isn’t really all that great!

A quick word about Vi key bindings in Eclipse: it’s not a pretty picture, but the best I’ve been able to find is a free tool called Vrapper. It’s not bad. I could get by with Vrapper, but I don’t believe it’s as mature and evolved as IDEAVim plugin in PyCharm.

So, I’ll probably turn back to Eclipse for Java development (I’m planning on taking on a personal Android project), but I think I’ve given up on it for anything not Java-related.

Vim

Vim is technically ‘just an editor’, but it has some nice benefits, and with the right plugins, it can technically do all of the things a fancy IDE can. I use the taglist plugin to provide the project and class browser functionality, and the kicker here is that you can actually switch to the browser pane, type ‘/’ and the object or member you’re looking for, and jump to it in a flash. It’s also the most complete Vim key binding implementation available ;-)

The big win for me in using Vim though is remote work. Though I’d rather do all of my coding locally, there are times when I really have to write code on remote machines, and I don’t want to go through the rigmarole of coding, pushing my changes, going to my terminal, pulling down the changes, testing, failing, fixing the code on my machine, pushing my changes, pulling my changes… ugh.

So why not just use Vim? I could do it. I’ve been using Vim for many years and am pretty good with it, but I just feel like separating my coding from my terminal whenever I can is a good thing. I don’t want my code to look like my terminal, nor do I want my terminal to look like my IDE theme. I’m SUPER picky about fonts and colors in my IDE, and I’m not that picky about them in my terminal. I also want the option of using my mouse while I’m coding, mostly to scroll, and getting that to work on a Mac in Terminal.app isn’t as simple as you might expect (and I’m not a fan of iTerm… and its ability to do this comes at a cost as well).

MacVim is nice, solves the separation of Terminal and IDE, and I might give it a more serious try, but let’s face it, it’s just not an IDE. Code completion is still going to be mediocre, the interface is still going to be terminal-ish… I just don’t know. One thing I really love though is the taglist plugin. I think if I could just find a way to embed a Python console along the bottom of MacVim I might be sold.

One thing I absolutely love about Vim, the thing that Vim gets right that none of the IDEs get is colorschemes: MacVim comes with like 20 or 30 colorschemes! And you can download more on the ‘net! The other IDEs must lump colorscheme information into the general preferences or something, because you can’t just download a colorscheme as far as I’ve seen. The IDE with the worst color/font configuration? Eclipse – the one all my Python brethren seem to rave about. That is so frustrating. Some day I’ll make it to PyCon and someone will show me the kool-aid I guess.

The Frustrating Conclusion

PyCharm isn’t soup yet, Wingware is all but ignoring the Mac platform, Eclipse is completely wrong for my brain and I don’t know how anyone uses it for Python development, Komodo Edit is rock solid but lacking features, and Komodo IDE is fairly pricey and a 30-day trial is always just really annoying (and I kinda doubt it beats PyCharm for Python-specific development). MacVim is a stand-in for a real IDE and it does the job, but I really want more… integration! I also don’t like maintaining the plugins and colorschemes and *rc files and ctags, and having to understand its language and all that.

I don’t cover them here, but I’ve tried a bunch of the Linux-specific Python IDEs as well, and I didn’t like a single one of them at all. At some point I’ll spend more time with those tools to see if I missed something crucial that, once learned, might make it hug my brain like a warm blanket (and make me consider running Linux on my desktop again, something I haven’t done on a regular ongoing basis in about 4 years).

So… I don’t really have an IDE yet. I *did* however just realize that the laptop I’m typing on right now has never had a Komodo IDE install, so I’m off to test it now. Wish me luck!

Per-machine Bash History

I do work on a lot of machines no matter what environment I’m working in, and a lot of the time each machine has a specific purpose. One thing that really annoys me when I work in an environment with NFS-mounted home directories is that if I log into a machine I haven’t used in some time, none of the history specific to that machine is around anymore.

If I had a separate ~/.bash_history file on each machine, this would likely solve the problem. It’s pretty simple to do as it turns out. Just add the following lines to ~/.bashrc:

srvr=`hostname`
export HISTFILE="/home/jonesy/.bash_history_${srvr}"

Don’t be alarmed when you source ~/.bashrc and you don’t see the file appear in your home directory. Unless you’ve configured things otherwise, history is only written at the end of a bash session. So go ahead and source bashrc, run a few commands, end your session, log back in, and the file should be there.

I’m not actually sure if this is going to be a great idea for everyone. If you work in an environment where you run the same commands from machine to machine, it might be better to just leave things alone. For me, I’m running different psql/mysql connection commands and stuff like that which differ depending on the machine I’m on and the connection perms it has.

PyTPMOTW: py-amqplib

What’s This Module For?

To interact with a queue broker implementing version 0.8 of the Advanced Message Queueing Protocol (AMQP) standard. Copies of various versions of the specification can be found here. At time of writing, 0.10 is the latest version of the spec, but it seems that many popular implementations used in production environments today are still using 0.8, presumably awaiting a finalization of v.1.0 of the spec, which is a work in progress.

What is AMQP?

AMQP is a queuing/messaging protocol that is implemented by server daemons (called ‘brokers’) like RabbitMQ, ActiveMQ, Apache Qpid, Red Hat Enterprise MRG, and OpenAMQ. Though messaging protocols used in the enterprise are historically proprietary, AMQP has a bold and vocal stance that AMQP will be:

  • Broadly applicable for enterprise use
  • Totally open
  • Platform agnostic
  • Interoperable

The working group consists of several huge enterprises who have a vested interest in a protocol that meets these requirements. Most are either huge enterprises who are (or were) a victim of the proprietary lock-in that came with what will now likely become ‘legacy’ protocols, or implementers of the protocols, who will sell products and services around their implementation. Here’s a brief list of those involved in the AMQP working group:

  • JPMorgan Chase (the initial developers of the protocol, along with iMatix)
  • Goldman Sachs
  • Red Hat Software
  • Cisco Systems
  • Novell

Message brokers can facilitate an awfully large amount of flexibility in an architecture. They can be used to integrate applications across platforms and languages, enable asynchronous operations for web front ends, modularize and more easily distribute complex processing operations.

Basic Publishing

The first thing to know is that when you code against an AMQP broker, you’re dealing with a hierarchy: a ‘vhost’ contains one or more ‘exchanges’ which themselves can be bound to one or more ‘queues’. Here’s how you can programmatically create an exchange and queue, bind them together, and publish a message:

from amqplib import client_0_8 as amqp

conn = amqp.Connection(userid='guest', password='guest', host='localhost', virtual_host='/', ssl=False)

# Create a channel object, queue, exchange, and binding.
chan = conn.channel()
chan.queue_declare('myqueue', durable=True)
chan.exchange_declare('myexchange', type='direct', durable=True)
chan.queue_bind('myqueue', 'myexchange', routing_key='myq.myx')

# Create an AMQP message object

msg = amqp.Message('This is a test message')
chan.basic_publish(msg, 'myexchange', 'myq.myx')

As far as we know, we have one exchange and one queue on our server right now, and if that’s the case, then technically the routing key I’ve used isn’t required. However, I strongly suggest that you always use a routing key to avoid really odd (and implementation-specific) behavior like getting multiple copies of a message on the consumer side of the equation, or getting odd exceptions from the server. The routing key can be arbitrary text like I’ve used above, or you can use a common formula of using ‘.’ as your routing key. Just remember that without the routing key, the minute more than one queue is bound to an exchange, the exchange has no way of knowing which queue to route a message to. Remeber: you don’t publish to a queue, you publish to an exchange and tell it which queue it goes in via the routing key.

Basic Consumption

Now that we’ve published a message, how do we get our hands on it? There are two methods: basic_get, which will ‘get’ a single message from the queue, or ‘basic_consume’, which technically doesn’t get *any* messages: it registers a handler with the server and tells it to send messages along as they arrive, which is great for high-volume messaging operations.

Here’s the ‘basic_get’ version of a client to grab the message we just published:

msg = chan.basic_get(queue='myqueue', no_ack=False)
chan.basic_ack(msg.delivery_tag)

In the above, I’ve used the same channel I used to publish the message to get it back again using the basic_get operation. I then acknowledged receipt of the message by sending the server a ‘basic_ack’, passing along the delivery_tag the server included as part of the incoming message.

Consuming Mass Quantities

Using basic_consume takes a little more thought than basic_get, because basic_consume does nothing more than register a method with the server to tell it to start sending messages down the pipe. Once that’s done, however, it’s up to you to do a chan.wait() to wait for messages to show up, and find some elegant way of breaking out of this wait() operation. I’ve seen and used different techniques myself, and the right thing will depend on the application.

The basic_consume method also requires a callback method which is called for each incoming message, and is passed the amqp.Message object when it arrives.

Here’s a bit of code that defines a callback method, calls basic_consume, and does a chan.wait():

consumer_tag = 'foo'
def process(msg):
   txt = msg.body
   if '-1' in txt:
      print 'Got -1'
      chan.basic_cancel(consumer_tag)
      chan.close()
   else: 
      print 'Got message!'

chan.basic_consume('messages', callback=process, consumer_tag=consumer_tag)
while True:
   print 'Message processed. Next?'
   try:
      chan.wait()
   except IOError as out:
      print "Got an IOError: %s" % out
      break
   if not chan.is_open:
      print "Done processing. Later"
      break

So, basic_consume tells the server ‘Start sending any and all messages!’. The server registers a method with a name given by the consumer_tag argument, or it assigns one and it becomes the return value of basic_consume(). I define one here because I don’t want to run into race conditions where I want to call basic_cancel() with a consumer_tag variable that doesn’t exist yet, or is out of scope, or whatever. In the callback, I look for a sentinel message whose body contains ‘-1′, and at that point I call basic_cancel (passing in the consumer_tag so the server knows who to stop sending messages to), and I close the channel. In the ‘while True’, the channel object checks its status and exits if it’s not open.

The above example starts to uncover some issues with py-amqplib. It’s not clear how errors coming back from the server are handled, as opposed to errors caused by the processing code, for example. It’s also a little clumsy trying to determine the logic for breaking out of the loop. In this case there’s a sentinel message sent to the queue representing the final message on the stack, at which point our ‘process()’ callback closes the channel, but then the channel has to check its own status to move forward. Just returning False from process() doesn’t break out of the while loop, because it’s not looking for a return value from that function. We could have our process() function raise an error of its own as well, which might be a bit more elegant, if also a bit more work.

Moving Ahead

What I’ve covered here actually covers perhaps 90% of the common cases for amqplib, but there’s plenty more you can do with it. There are various exchange types, including fanout exchanges and topic exchanges, which can facilitate more interesting messaging and pub/sub models. To learn more about them, here are a couple of places to go for information:

Broadcasting your logs with RabbitMQ and Python
Rabbits and Warrens
RabbitMQ FAQ section “Messaging Concepts: Exchanges