Category Archives: Technology

Slides, an App, a Meetup, and More On the Way

I’ve been busy. Seriously. Here’s a short dump of what I’ve been up to with links and stuff. Hopefully it’ll do until I can get back to my regular blogging routine.

PICC ’11 Slides Posted

I gave a Python talk at PICC ’11. If you were there, then you have a suboptimal version of the slides, both because I caught a few bugs, and also because they’re in a flattened, lifeless PDF file, which sort of mangles anything even slightly fancy. I’m not sure how much value you’ll get out of these because my presentation slides tend to present code that I then explain, and you won’t have the explanation, but people are asking, so here they are in all their glory. Enjoy!

I Made a Webapp Designed To Fail

No really, I did. WebStatusCodes is the product of necessity. I’m writing a Python module that provides an easy way for people to talk to a web API. I test my code, and for some of the tests I want to make sure my code reacts properly to certain HTTP errors (or in some cases, to *any* HTTP status code that’s not 200). In unit tests this isn’t hard, but when you’re starting to test the network layers and beyond, you need something on the network to provide the errors. That’s what WebStatusCodes does. It’s also a simple-but-handy reference for HTTP status codes, though it is incomplete (418 I’m a teapot is not supported). Still, worth checking out.

Interesting to note, this is my first AppEngine application, and I believe it took me 20 minutes to download the SDK, get something working, and get it deployed. It was like one of those ‘build a blog in -15 minutes’ moments. Empowering the speed at which you can create things on AppEngine, though I’d be slow to consider it for anything much more complex.

Systems and Devops People, Hack With Me!

I like systems-land, and a while back I was stuck writing some reporting code, which I really don’t like, so I started a side project to see just how much cool stuff I could do using the /proc filesystem and nothing but pure Python. I didn’t get too far because the reporting project ended and I jumped back into all kinds of other goodness, but there’s a github project called pyproc that’s just a single file with a few functions in it right now, and I’d like to see it grow, so fork it and send me pull requests. If you know Linux systems pretty well but are relatively new to Python, I’ll lend you a hand where I can, though time will be a little limited until the book is done (see further down).

The other projects I’m working on are sort of in pursuit of larger fish in the Devops waters, too, so be sure to check out the other projects I mention later in this post, and follow me on github.

Python Meetup Group in Princeton NJ

I started a Meetup group for Pythonistas that probably work in NYC or PA, but live in NJ. I work in PA, and before this group existed, the closest group was in Philly, an hour from home. I put my feelers out on Twitter, found some interest, put up a quick Meetup site, and we had 13 people at the first meetup (more than had RSVP’d). It’s a great group of folks, but more is always better, so check it out if you’re in the area. We hold meetings at the beautiful Princeton Public Library (who found us on twitter and now sponsors the group!), which is just a block or so from Triumph, the local microbrewery. I’m hoping to have a post-meeting impromptu happy hour there at some point.

Python Cookbook Progress

The Python Cookbook continues its march toward production. Lots of work has been done, lots of lessons have been learned, lots of teeth have been gnashed. The book is gonna rock, though. I had the great pleasure of porting all of the existing recipes that are likely to be kept over to Python 3. Great fun. It’s really amazing to see just how it happens that a 20-line recipe is completely obviated by the addition of a single, simple language feature. It’s happened in almost every chapter I’ve looked at so far.

If you have a recipe, or stumble upon a good example of some language feature, module, or other useful tidbit, whether it runs in Python 3 or not, let me know (see ‘Contact Me’). The book is 100% Python 3, but I’ve gotten fairly adept at porting things over by now :) Send me your links, your code, or whatever. If we use the recipe, the author will be credited in the book, of course.

PyRabbit is Coming

In the next few days I’ll be releasing a Python module on github that will let you easily work with RabbitMQ servers using that product’s HTTP management API. It’s not nearly complete, which is why I’m releasing it. It does some cool stuff already, but I need another helper or two to add new features and help do some research into how RabbitMQ broker configuration affects JSON responses from the API. Follow me on github if you want to be the first to know when I get it released. You probably also want to follow myYearbook on github since that’s where I work, and I might release it through the myYearbook github organization (where we also release lots of other cool open source stuff).

Python Asynchronous AMQP Consumer Module

I’m also about 1/3 of the way through a project that lets you write AMQP consumers using the same basic model as you’d write a Tornado application: write your handler, import the server, link the two (like, one line of code), and call consume(). In fact, it uses the Tornado IOLoop, as well as Pika, which is an asynchronous AMQP module in Python (maintained by none other than my boss and myYearbook CTO,  @crad), which also happens to support the Tornado IOLoop directly.

Book Review: Python Standard Library by Example

Quick Facts:

  • Author: Doug Hellmann
  • Pages: 1344
  • Publisher: Addison-Wesley (Developer’s Library)
  • ETA: June 5, 2011
  • Amazon link: http://www.amazon.com/Python-Standard-Library-Example-Developers/dp/0321767349/ref=sr_1_1?ie=UTF8&qid=1307109464&sr=1-1-spell

What this book says it does:

From the book’s description:

This book is a collection of essays and example programs demonstrating how to use more than 100 modules from Python standard library. It goes beyond the documentation available on python.org to show real programs using the modules and demonstrating how you can use them in your daily programming tasks.

What this book actually does:

This book actually kinda rocks, in part because of its a unique take on documentation of the Python standard library. The Python standard library documentation is actually a pretty good high-level reference, and this book doesn’t seek to duplicate what’s there. Instead, it specifically seeks out places in the existing documentation that are underdocumented, undocumented, don’t have clear enough examples, or just don’t provide the value to the end user that they should for whatever reason. Even as good as the standard library documentation is, Doug easily cranked out 1000+ pages of invaluable information that has given me a much greater insight into the standard library modules that I use on a regular basis (and plenty that I don’t use on a regular basis).

How it works

The book is simply laid out by module. Using the multiprocessing module? It’s right there in the Table of Contents. It’s as easy to use as the standard library docs from a navigational perspective, and the index, it could be argued, is an improvement over docs.python.org’s  search behavior.

When you get to the module you’re looking for, you’ll primarily see code. There is enough English text to explain what the code actually does, but the main illustrative tool in this book is the code. This is not an easy thing to accomplish, but Doug provides a very nice and balanced presentation of the real meaty parts of your favorite standard library modules.

What’s Great About it

First, it provides both depth and breadth, it’s easy to find whatever you’re looking for, and if it’s not needed (usually because it’s well-covered in the standard library docs) it’s not there.

Second, the book is written by an authoritative, knowledgeable, experienced, and prolific Python developer. While he’s a creative thinker, his work is balanced by a healthy dose of pragmatism and grounded in best practices. Contrived as the examples might get at times, you won’t typically find code written by Doug that would garner sideways glances by experienced Python developers.

Third, it’s not a rehashing of the docs. In fact it skips coverage of things that are well-documented in the docs. Yes, the book does contain simple introductory material for each module to give the uninitiated some context, but that’s different from a book that takes existing docs and just moves the letters around. Doug does a great job of getting you into the good stuff without much fluff.

Fourth, there’s almost zero fluff. I’d love to see a statistical breakdown of the number of lines of code vs. text in this book. And the good part really isn’t that he’s put so much code in the book, it’s that he presents with code alongside the text in a way that insures readers don’t get lost.

Fifth, this wasn’t a rush job. Doug has been writing this content in the form of his Python Module of the Week blog series for a few years now. Most of the work was editing, finessing, updating, and testing (and retesting) the code, not developing the content from scratch. So what’s there in the book is not just a braindump from Doug’s brain: it’s had the benefit of peer review and feedback from the blog, email, etc., and that adds a ton of value to the final product in my eyes.

What’s Not Great About it

I insist on including bad things about everything I review, because nothing is perfect, and the more people talk about things they don’t like, the more makers start to listen and make things better.

To be honest, the only thing I found lacking in this book is the index. This should not shock anyone who is a tech bibliophile. Most indexes, at least on tech books, are pretty bad (ironic since tech books often serve as references, which makes the index pretty crucial). Also, consider that this review is based on a review copy of the book, so it’s possible that the final version will have a totally awesome index.

Ah, one other thing (which also might be due to this being a review copy): there are no tab markers. Since Python defines scope using whitespace, not having indentation markers in any medium containing page breaks can lead to confusion if a code sample crosses page boundaries. The alternative to having the markers is to insure that the code samples don’t cross page boundaries. It’s not possible for me to know if they’ll do one or both of these before the final printing.

The Final Word

My petty complaints about the index and indentation markers are not only trivial, but they both may be fixed in the final printing. I saw nothing so bad that I wouldn’t highly recommend this book, and I’ve seen tons and tons of stuff that would make me highly recommend this book. I’m using this book myself on a fairly regular basis, and it’s an effective, easy-to-use tool that makes a great companion reference to the standard library.

Buy it.

‘Grokking Python’ Going to PICC Conference!

In conjunction with my involvement as co-author of the upcoming Python Cookbook, 3rd Ed. (not yet released), a tutorial at this year’s PyCon in Atlanta, an internal (and ongoing) lunchtime seminar series entitled ‘Snakes On a Plate’, and other recent Python-related projects, I’ve also been refining and revising what I can now call a completely awesome 3-hour introduction to the Python programming language.

If you’re a sysadmin, operations engineer, devops engineer, or just want to get your hands dirty with Python, I can’t think of a better more cost-effective way to do it than to attend the ‘Grokking Python’ tutorial at this year’s PICC conference, which is being held in New Brunswick, NJ, April 29-30.

While I do plan for the tutorial to run through the basics, I also assume attendees have programmed in some other language before. In addition, I firmly believe that, properly presented, most would find that Python is a very simple language to get to know and understand. That being the case, the most basic elements of the language (control statements, loops, etc) will be covered in the first hour (and the materials will be available for later reference).

Once we’re through that, it’s head first into what admin/ops engineers do for a living. Python was developed by a systems programmer for systems programming. As such, support for a huge swath of admin tasks (and far, far beyond) is baked into the language, and enormous tomes have been written covering third party tools and modules to do anything else you can possibly imagine.

We’re going to look at some of the more ho-hum parts of scripting, like accepting input from users, command line options and arguments, and file handling, but before it’s over we’re going to have a look at the basics of email, networking, multiprocessing, threading, coroutines, SSH, and more.

We’re also going to cover use of the Python interactive shell, which will not only help speed your mastery of the language and its standard library, but also holds promise as a sysadmin tool in its own right.

The blowing of minds is a goal of the tutorial. Bring a laptop, and bring some bandages ;-)

Lessons Learned Porting Dateutil to Python 3

The dateutil module is a very popular third-party (pure) Python module that makes it easier (and in some cases, possible) to perform more advanced manipulations on dates and date ranges than simply using some combination of Python’s ‘included batteries’ like the datetime, time and calendar modules.

Dateutil does fuzzy date matching, Easter calculations in the past and future, relative time delta calculations, time zone manipulation, and lots more, all in one nicely bundled package.

I decided to port dateutil to Python 3.

Why?

For those who haven’t been following along at home, David Beazley and I are working on the upcoming Python Cookbook 3rd Edition, which will contain only Python 3 recipes. Python 2 will probably only get any real treatment when we talk about porting code.

When I went back to the 2nd edition of the book to figure out what modules are used heavily that might not be compatible with Python 3, dateutil stuck out. It’s probably in half or more of the recipes in the ‘Time and Money’ chapter in the 2nd Edition. I decided to give it a look.

How Long Did it Take?

Less than one work day. Seriously. It was probably 4-5 hours in total, including looking at documentation and getting to know dateutil. I downloaded it, I ran 2to3 on it without letting 2to3 do the in-place edits, scanned the output for anything that looked ominous (there were a couple of things that looked a lot worse than they turned out to be), and once satisfied that it wasn’t going to do things that were dumb, I let ‘er rip: I ran 2to3 and told it to go ahead and change the files (2to3 makes backup copies of all edited files by default, by the way).

What Was the Hardest Part?

Well, there were a few unit tests that used the base64 module to decode some time zone file data into a StringIO object before passing the file-like object to the code under test (I believe the code under test was the relativedelta module). Inside there, the file-like StringIO object is subjected to a bunch of struct.unpack() calls, and there are a couple of plain strings that get routed elsewhere.

The issue with this is that there are NO methods inside the base64 module that return strings anymore, which makes creating the StringIO object more challenging. All base64 methods return Python bytes objects. So, I replaced the StringIO object with a BytesIO object, all of the struct.unpack() calls “just worked”, and the strings that were actually needed as strings in the code had a ‘.decode()’ appended to them to convert the bytes back to strings. All was well with the world.

What Made it Easier?

Two things, smaller one first:

First, Python built-in modules for date handling haven’t been flipped around much, and dateutil doesn’t have any dependencies outside the standard library (hm, maybe that’s 2 things right there). The namespaces for date manipulation modules are identical to Python 2, and I believe for the most part all of the methods act the same way. There might be some under-the-hood changes where things return memoryview objects or iterators instead of lists or something, but in this and other porting projects involving dates, that stuff has been pretty much a non-event most of the time

But the A #1 biggest thing that made this whole thing take less than a day instead of more than a week? Tests.

Dateutil landed on my hard drive with 478 tests (the main module has about 3600 lines of actual code, and the tests by themselves are roughly 4000 lines of code). As a result, I didn’t have to manually check all kinds of functionality or write my own tests. I was able to port the tests fairly easily with just a couple of glitches (like the aforementioned base64 issue). From there I felt confident that the tests were testing the code properly.

In the past couple of days since I completed the ‘project’, I ported some of the dateutil recipes from the 2nd edition of the book to Python 3, just for some extra assurance. I ported 5 recipes in under an hour. They all worked.

Had You Ported Stuff Before?

Well, to be honest most of my Python 3 experience (pre-book, that is) is with writing new code. To gain a broader exposure to Python 3, I’ve also done lots of little code golf-type labs, impromptu REPL-based testing at work for things I’m doing there, etc. I have ported a couple of other small projects, and I have had to solve a couple of issues, but it’s not like I’ve ever ported something the size of Django or ReportLab or something.

The Best Part?

I had never seen dateutil in my life.

I had read about it (I owned the Python Cookbook 2nd Edition since its initial release, after all), but I’d never been a user of the project.

The Lessons?

  1. This is totally doable. Stop listening to the fear-inducing rantings of naysayers. Don’t let them hold you back. The pink ponies are in front of you, not behind you.
  2. There are, in fact, parts of Python that remain almost unchanged in Python 3. I would imagine that even Django may find that there are swaths of code that “just works” in Python 3. I’ll be interested to see metrics about that (dear Django: keep metrics on your porting project!)
  3. Making a separation between text and data in the language is actually a good thing, and in the places where it bytes you (couldn’t resist, sorry), it will likely make sense if you have a fundamental understanding of why text and data aren’t the same thing. I predict that, in 2012, most will view complainers about this change the same way we view whitespace haters today.

“I Can’t Port Because…”

If you’re still skeptical, or you have questions, or you’re trying and having real problems, Dave and I would both love for *you* to come to our tutorial at PyCon. Or just come to PyCon so we can hack in the hallway on it. I’ve ported, or am in the process of porting, 3 modules to Python 3. Dave has single-handedly ported something like 3-5 modules to Python 3 in the past 6 weeks or so. He’s diabolical.

I’d love to help you out, and if it turns out I can’t, I’d love to learn more about the issue so we can shine a light on it for the rest of the community. Is it a simple matter of documentation? Is it a bug? Is it something more serious? Let’s figure it out and leverage the amazing pool of talent at PyCon to both learn about the issue and hopefully get to a solution.

Python 3: Informal String Formatting Performance Comparison

If you haven’t heard the news, Dave Beazley and I have officially begun work on the next edition of the Python Cookbook, which will be completely overhauled using absolutely nothing but Python 3. Yay!

Right now, I’m going through some string formatting recipes from the 2nd edition to see if they still work, and if Python 3 offers any preferred alternatives to the solutions provided. As usual, it turns out that the answer to that is often ‘it depends’. For example, you might decide on a slower solution that’s more readable. Conversely, you might need to run an operation in a loop a million times and really need the speed.

New string formatting operations like the built-in format() function (separate from the str.format method) and the format mini-language are available in 2.6, and made nicer in 2.7. All of it is backported from the 3.x tree to my knowledge, and I’ll be using a Python 3.2b2 interpreter session for my examples.

I want to focus specifically on string alignment here, because there are very obviously multiple ways to solve alignment needs. Here’s an example solution from the 2nd edition:

>>> print '|' , 'hej'.ljust(20) , '|' , 'hej'.rjust(20) , '|' , 'hej'.center(20) , '|'
| hej             |             hej |       hej       |

Note that this is of course in Python 2.x syntax, but this works in Python 3.2 if you just make it a function call instead of a statement (so, just add parens and it works). The string methods used here are still in Python 3.2, with no notices of deprecation or preference for newer methods available now. That said, this looks messy to me, and so I wondered if I could make it more readable without losing performance, or at least without losing so much performance that it’s not worth any gains in the area of readability.

Single String Formatting

Here are three ways to get the same string alignment behavior in Python 3.2b2:

>>> '{:+<20s}'.format('hej')
'hej+++++++++++++++++'
>>> format('hej', '+<20s')
'hej+++++++++++++++++'
>>> 'hej'.ljust(20, '+')
'hej+++++++++++++++++'

Ok, so they all work the same. Now I’m going to wrap each one in a function and use the timeit module to help me get an idea what the difference is in terms of performance.

>>> def runit():
...     format('hej', '+<20s')
...
>>> def runit2():
...     'hej'.ljust(20, '+')
...
>>> def runit3():
...     '{:+<20s}'.format('hej')
...
>>> timeit(stmt=runit3, number=1000000)
0.6168370246887207
>>> timeit(stmt=runit3, number=1000000)
0.6109819412231445
>>> timeit(stmt=runit3, number=1000000)
0.6166291236877441
>>> timeit(stmt=runit2, number=1000000)
0.49651098251342773
>>> timeit(stmt=runit2, number=1000000)
0.4870288372039795
>>> timeit(stmt=runit2, number=1000000)
0.49135899543762207
>>> timeit(stmt=runit, number=1000000)
0.7751290798187256
>>> timeit(stmt=runit, number=1000000)
0.7771239280700684
>>> timeit(stmt=runit, number=1000000)
0.7805869579315186

Turns out using the old, tried and true str.* methods are fastest in this case, though I think in a more complex case like the recipe from the 2nd edition I’d opt for something more readable if I had the chance.

One String, Three Ways

Let’s look at a more complex case. Let’s take each of the methodologies used in runit, runit2, and runit3, and see how things pan out when we want to do something like the 2nd edition recipe. I’ll start with the bare interpreter operation to compare the output:


>>> '|' + format('hej', '+<20s') + '|' + format('hej', '+^20s') + '|' + format('hej', '+>20s') + '|'
'|hej+++++++++++++++++|++++++++hej+++++++++|+++++++++++++++++hej|'
>>> '|' + 'hej'.ljust(20, '+') + '|' + 'hej'.center(20, '+') + '|' + 'hej'.rjust(20, '+') + '|'
'|hej+++++++++++++++++|++++++++hej+++++++++|+++++++++++++++++hej|'
>>> '|{0:+<20s}|{0:+^20s}|{0:+>20s}|'.format('hej')
'|hej+++++++++++++++++|++++++++hej+++++++++|+++++++++++++++++hej|'

Unless you go through the rigamarole of creating a sequence and using ‘|’.join(myseq), the last method seems the most readable to me. I’d really just like to use the built-in print function with a “sep=’|'” argument, but that won’t cover the pipes at the beginning and end of the string unless I’ve missed something.

Here are the functions and timings:


>>> def threeways():
...     '|' + format('hej', '+<20s') + '|' + format('hej', '+^20s') + '|' + format('hej', '+>20s') + '|'
...

>>> def threeways2():
...     '|' + 'hej'.ljust(20, '+') + '|' + 'hej'.center(20, '+') + '|' + 'hej'.rjust(20, '+') + '|'
...

>>> def threeways3():
...     '|{0:+<20s}|{0:+^20s}|{0:+>20s}|'.format('hej')
...

>>> timeit(stmt=threeways, number=1000000)
2.4910600185394287
>>> timeit(stmt=threeways, number=1000000)
2.50291109085083
>>> timeit(stmt=threeways, number=1000000)
2.4913830757141113
>>> timeit(stmt=threeways2, number=1000000)
1.9027390480041504
>>> timeit(stmt=threeways2, number=1000000)
1.8975908756256104
>>> timeit(stmt=threeways2, number=1000000)
1.8957319259643555
>>> timeit(stmt=threeways3, number=1000000)
1.311446189880371
>>> timeit(stmt=threeways3, number=1000000)
1.3099820613861084
>>> timeit(stmt=threeways3, number=1000000)
1.3031558990478516

The threeways3 function has a bit of an advantage in not having to muck with concatenation at all, and this probably explains the difference. Changing threeways() to use a list and '|'.join() brought it from about 2.50 to about 2.30. Better. Changing threeways2() in the same way was also a small improvement from ~1.90 to ~1.77. No big wins there, and they’re not particularly readable in either case. For this one arguably trivial corner case, the new formatting mini-language wins in both performance and (IMO) readability.

Big Assumptions

This of course assumes I didn’t overlook something in creating the comparison functions, that there’s not yet a different way to do this that blows all of my work out of the water. If you see a completely different way to do this that’s both readable and performant, or I did something bone-headed, please let me know in the comments. :)

The Makings of a Great Python Cookbook Recipe

I’ve seen some comments on Twitter, Buzz, Reddit, and elsewhere, and we’ve gotten some suggestions for recipes already via email (thanks!), and both Dave and I thought it’d be good to present a simple-to-follow ‘meta-recipe’; a recipe for making a great recipe that has a good shot at making it into the cookbook.

So let’s get down to what makes a good recipe. These are in no particular order:

Concise

When you read a recipe for apple pie, it doesn’t include a diatribe about how to grow and pick apples. This is in part because of space constraints (the book would be huge, or the coverage would be incomplete, take your pick). It’s also partly because you can probably assume that the reader has somehow procured apples and assumes they’ll need them for this recipe and all of that.

In a recipe for, say, an ETL script that hopes to illustrate Python’s useful string manipulation features, you probably don’t need to spend much time explaining what a database is, or take a lot of lines of code to deal with the database. It makes the code longer, and just distracts from the string manipulation goodness you’re trying to relay to the reader.

Short and sweet. “Everything you need, nothing you don’t”.

Illustrative

Recipes for the cookbook should be relatively narrowly focused on illustrating a particular technique, module (built-in or otherwise), or language feature, and bringing it to life. It need not be illustrative of a business process, remotely associated technology, etc.

So, if you want to write a recipe about os.walk, you don’t need to get into the semantics of the ext3 filesystem implementation, because that’s not what you’re trying to illustrate.

Practical

Above I noted that the recipe should be relatively narrowly focused on a technique or language feature. It should NOT be narrowly focused in terms of its applicability.

For example, if you wanted to illustrate the usefulness of the Python csv module, awesome! And if you want to mention that csv will attempt to make its output usable in Excel, awesome! But if you wanted to write a recipe called “Supporting Windows ’95 Excel Clients With Python” dealing only with Excel, specifically on Windows ’95, well… that’s really specific, and really a ‘niche’ recipe. It’d be better left for some future ‘Python Hacks’ book or something.

When you read a cookbook, you probably don’t seek out “How to make mulligatawny soup in a Le Creuset™ Dutch Oven Using an Induction Stove at 30,000 Feet”. Likewise, in order to be useful to a wider audience, your recipe should ideally not force so many assumptions onto readers who just want to make a good meal (so to speak).

Our devotion to the practical also means we don’t plan to include any recipes dealing with Fibonacci numbers, factorials, or the like. Leave those for some future “Python Homework Problems” book.

Well-Written

By ‘well-written’, I’m partially just lumping everything I just said all together under one title. However, in addition, I would ask that recipe authors resist the temptation to utilize unnecessary ‘cleverness’ that might make the code harder to read and understand, or be a distraction from what you’re trying to relay to the reader.

Just because you can get the job done with a nested list comprehension doesn’t mean you should. Open up in the code listing to allow easy comprehension by readers at all levels. If you must use nested list comprehensions, perhaps it warrants a separate recipe?

Nested list comprehensions are just an example, of course. I’m sure you can think of others. When you’re looking at your recipe, just ask yourself if there’s a simpler construct, technique, or idiom that can be used to achieve the same goal.

Pythonic

In general, follow the ‘import this’ rules like you would with any other Python code you write. “Sparse is better than dense”, “Readability counts”, etc. In addition, bonus points are given for following PEP 8.

But I’m not just talking about the code. Long-time Python coders (or maybe even not-so-long-time ones) come to realize that the Zen of Python applies not just to code, but to the way you think about your application. When Dave and I are reading recipe descriptions, we’re thinking in that mode. “Would we do this? Why would we do this? When would we do this? Is there a more Pythonic solution to this problem? Can this solution be made more Pythonic?”

When in doubt…

If you’re not sure about any of that, your default action should be to post your recipe on the ActiveState site. The reality is that posting a recipe there will help both the community and yourself. The feedback you get on your recipe will help you become a better coder, and it’ll help people evaluating your recipe to make sound decisions and perhaps consider things they hadn’t. It’s good all over. Getting into the book is just a nice cherry on the sundae.

Also, while unit tests are fantastic, they’re not expected to show up along with the recipe. ActiveState to my knowledge doesn’t have a mechanism for easily including test code alongside (but not embedded in) the recipe code. If you want to use doctest, great. If you want to point us at a recipe you’ve posted, you can include tests in that email, or not. It’s totally unnecessary to include them, although they are appreciated.

Questions?

If you have questions, email them to Dave and I at PythonCookbook at oreilly dot com. You can also post questions here in the comments of this post.

Good Things Come in Threes: Python Cookbook, Third Edition

It became official earlier today that David Beazley and myself will be co-editing/co-curating the next edition (the Third Edition) of the Python Cookbook. That’s really exciting. Here’s why:

It’s Python 3, Cover to Cover

Go big or go home. The third edition will be a Python 3 Cookbook. This by itself makes this a rather large undertaking, since it means modules used in earlier editions that don’t work in Python 3 can’t be used, and so those old recipes will need to be scrapped or rewritten.

You heard it right: if a module used in the last edition of Python Cookbook doesn’t work in Python 3, it won’t be in this edition. This includes some modules for which several recipes already exist, like dateutil and wxPython, and other modules I would’ve liked to use for illustrative purposes, like Tornado and psycopg2. I guess there’s still some time for smaller modules to port to Python 3 and be left alone in this edition, but I don’t know how realistic that is for something like wxPython.

It’s going to be ok.

I can’t find any modules (yet) in the second edition for which Python 3-compatible substitutes don’t exist, and in some cases I found myself wondering why a separate module was used at all when what needs doing isn’t a whole lot of work in pure Python anyway. I guess if the module is there and stable, why not use it, eh? Fair enough.

Three Authors?

Actually, yes. David, myself, and YOU, where YOU is anyone who posts good Python 3 recipes to the ActiveState ‘Python Cookbook’ site. This is actually not new. If you look at the last edition, you’ll see separate credits for each recipe. The Python Cookbook has always been a community effort, featuring recipes by some very familiar names in the Python community.

Anyway, just in case that wasn’t direct enough:

Go to the ActiveState Python Cookbook site and post a Python 3 recipe, and if it’s solid, your name and recipe may well be in the book.

Basically, ActiveState gives us free reign over their Python Cookbook content, so it’s a convenient way to let the community contribute to the work. If it’s in there, and it’s good, we can use it. They’re cool like that.

Three Questions

Answer these in the comments, or send email to Dave and I at PythonCookbook at oreilly.

  1. What are your three favorite recipes from either the 1st or 2nd edition?
  2. What are your three least favorite recipes from either the 1st or 2nd edition?
  3. What are three things (techniques, modules, basic tasks) you’d like to see covered that weren’t covered in earlier editions?

Three Cool Things About the Third Edition

  1. Tests. When the book comes out, we’ll make the unit tests for the recipes available in some form that is yet to be determined (but they’ll be available).
  2. Porting help. We’re not going to leave module authors out in the cold. We’re going to provide some help/advice. We’ve both written code in Python 3, and dealt with the issues that arise. I’m still porting one of my projects and have another in line after that, so I’ll be dealing with it even more.
  3. Dave and I are both overwhelmed with excitement about this book, about Python 3, and about working with you on it. Come help us out by posting Python 3 recipes (tests are also nice, but not required) on ActiveState, and shoot us an email at PythonCookbook at oreilly dotcom.

There will be other cool things, too. We’ll let you know, so stay tuned to this blog, Dave’s blog, and you should definitely follow @bkjones and @dabeaz on Twitter, since we’ll be asking for opinions/resources/thoughts on things as we go.

Nose and Coverage.py Reporting in Hudson

I like Hudson. Sure, it’s written in Java, but let’s be honest, it kinda rocks. If you’re a Java developer, it’s admittedly worlds better because it integrates with seemingly every Java development tool out there, but we can do some cool things in Python too, and I thought I’d share a really simple setup to get coverage.py’s HTML reports and nose’s xUnit-style reports into your Hudson interface.

I’m going to assume that you know what these tools are and have them installed. I’m working with a local install of Hudson for this demo, but it’s worth noting that I’ve come to find a local install of Hudson pretty useful, and it doesn’t really eat up too much CPU (so far). More on that in another post. Let’s get moving.

Process Overview

As mentioned, this process is really pretty easy. I’m only documenting it because I haven’t seen it documented before, and someone else might find it handy. So here it is in a nutshell:

  • Install the HTML Publisher plugin
  • Create or alter a configuration for a “free-style software project”
  • Add a Build Step using the ‘Execute Shell’ option, and enter a ‘nosetests’ command, using its built-in support for xUnit-style test reports and coverage.py
  • Check the ‘Publish HTML Report’, and enter the information required to make Hudson find the coverage.py HTML report.
  • Build, and enjoy.

Install The HTMLReport Plugin

From the dashboard, click ‘Manage Hudson’, and then on ‘Manage Plugins’. Click on the ‘Available’ tab to see the plugins available for installation. It’s a huge list, so I generally just hit ‘/’ in Firefox or cmd-F in Chrome and search for ‘HTML Publisher Plugin’. Check the box, go to the bottom, and click ‘Install’. Hudson will let you know when it’s done installing, at which time you need to restart Hudson.

Install tab

HTML Publisher Plugin: Check!

Configure a ‘free-style software project’

If you have an existing project already, click on it and then click the ‘Configure’ link in the left column. Otherwise, click on ‘New Job’, and choose ‘Build a free-style software project’ from the list of options. Give the job a name, and click ‘OK’.

Build a free-style software project.

You have to give the job a name to enable the 'ok' button :)

Add a Build Step

In the configuration screen for the job, which you should now be looking at, scroll down and click the button that says ‘Add build step’, and choose ‘Execute shell’ from the resulting menu.

Add Build Step

Execute shell. Mmmmm... shells.

This results in a ‘Command’ textarea appearing, which is where you type the shell command to run. In that box, type this:

/usr/local/bin/nosetests --with-xunit --with-coverage --cover-package demo --cover-html -w tests

Of course, replace ‘demo’ with the name of the package you want covered in your coverage tests to avoid the mess of having coverage.py try to seek out every module used in your entire application.

We’re telling Nose to generate an xUnit-style report, which by default will be put in the current directory in a file called ‘nosetests.xml’. We’re also asking for coverage analysis using coverage.py, and requesting an HTML report of the analysis. By default, this is placed in the current directory in ‘cover/index.html’.

execute shell area

Now we need to set up our reports by telling Hudson we want them, and where to find them.

Enable JUnit Reports

In the ‘Post-Build Actions’ area at the bottom of the page, check ‘Publish JUnit test result report’, and make it look like this:

The ‘**’ is part of the Ant Glob Syntax, and stands for the current working directory. Remember that we said earlier nose will publish, by default, to a file called ‘nosetests.xml’ in the current working directory.

The current working directory is going to be the Hudson ‘workspace’ for that job, linked to in the ‘workspace root’ link you see in the above image. It should mostly be a checkout of your source code. Most everything happens relative to the workspace, which is why in my nosetest command you’ll notice I pass ‘-w tests’ to tell nose to look in the ‘tests’ subdirectory of the current working directory.

You could stop right here if you don’t track coverage, just note that these reports don’t get particularly exciting until you’ve run a number of builds.

Enable Coverage Reports

Just under the JUnit reporting checkbox should be the Publish HTML Reports checkbox. The ordering of things can differ depending on the plugins you have installed, but it should at least still be in the Post-build Actions section of the page.

Check the box, and a form will appear. Make it look like this:

By default, coverage.py will create a directory called ‘cover’ and put its files in there (one for each covered package, and an index). It puts them in the directory you pass to nose with the ‘-w’ flag. If you don’t use a ‘-w’ flag… I dunno — I’d guess it puts it in the directory from where you run nose, in which case the above would become ‘**/cover’ or just ‘cover’ if this option doesn’t use Ant Glob Syntax.

Go Check It Out!

Now that you have everything put together, click on ‘Save’, and run some builds!

On the main page for your job, after you’ve run a build, you should see a ‘Coverage.py Report’ link and a ‘Latest Test Result’ link. After multiple builds, you should see a test result ‘Trend’ chart on the job’s main page as well.

job page

Almost everything on the page is clickable. The trend graph isn’t too enlightening until multiple builds have run, but I find the coverage.py reports a nice way to see at-a-glance what chunks of code need work. It’s way nicer than reading the line numbers output on the command line (though I sometimes use those too).

How ’bout you?

If you’ve found other nice tricks in working with Hudson, share! I’ve been using Hudson for a while now, but that doesn’t mean I’m doing anything super cool with it — it just means I know enough to suspect I could be doing way cooler stuff with it that I haven’t gotten around to playing with. :)

Python Packaging, Distribution, and Deployment: Volume 1

This is just Volume 1. I’ll cover as much as I can and just stop when it gets so long most people will stop reading :)

I’ve been getting to know the Python packaging and distribution landscape way better than I ever wanted to over the last couple of weeks. After 2 or 3 weeks now, I’m saddened to report that I still find it quite painful, and not a little bit murky. At least it gets clearer and not murkier as I go on.

I’m a Senior Operations Developer at myYearbook.com, which produces a good bit of Python code (and that ‘good bit’ is getting bigger by the day). We also open source lots of stuff (also growing, and not all Python). I’m researching all of this so that when we develop new internal modules, they’re easy to test and deploy on hundreds of machines, and when we decide to open source that module, it’s not painful for us to do that, and we can distribute the packages in a way that is intuitive for other users without jumping through hoops because “we do it different”. One process and pipeline to rule them all, as it were.

There are various components involved in building an internal packaging and distribution standard for Python code. Continuous integration, automated testing, and automated deployment (maybe someday “continuous deployment”) are additional considerations. This is a more difficult problem than I feel it should be, but it’s a pretty interesting one, and I’ll chronicle the adventure here as I have time. Again, this is just Volume 1.

Part 1: Packaging, Distribution, and Deployment

Let’s define the terms. By ‘packaging’ I mean assembling some kind of singular file object containing a Python project, including its source code, data files, and anything it needs in order to be installed. To be clear, this would include something like a setup.py file perhaps, and it would not include external dependencies like a NoSQL server or something.

By ‘distribution’, I mean ‘how the heck do you get this beast pushed out to a couple hundred machines or more?’

Those two components are necessary but not sufficient to constitute a ‘deployment’, which I think encompasses the last two terms, but also accounts for things like upgrades, rollbacks, performing start/stop/restarts, running unit tests after it gets to its final destination but before it kills off the version that is currently running happily, and other things that I think make for a reliable, robust application environment.

With definitions out of the way, let’s dive into the fun stuff.

Part 2: Interdependencies

Some knowledge of packaging in Python is helpful when you go to discuss distribution and deployment. The same goes for the other two components. When you start out looking into the various technologies involved, at some point you’re going to look down and notice that pieces of your brain appear to have dropped right out of your ears. Then you’ll reach to pull out your hair only to realize that your brain hasn’t fallen out your ears: it has, in fact, exploded.

If you’re not careful, you’ll find yourself thinking things like ‘if I don’t have a packaging format, how can I know my distribution/deployment method? If I don’t have a deployment method, how do I know what package format to use?’ It’s true that you can run into trouble if you don’t consider the interplay between these components, but it’s also true that the Python landscape isn’t really all that treacherous compared to other jungles I’ve had to survive in.

I believe the key is to just take baby steps. Start simple. Keep the big picture in mind, but decide early to not let the perfect be the enemy of the good. When I started looking into this, I wanted an all-singing all-dancing, fully-automated end-to-end, “developer desktop to production” deployment program that worked at least as well as those hydraulic vacuum thingies at the local bank drive-up windows. I’ll get there, too, but taking baby steps probably means winding up with a better system in the end, in part because it takes some time and experience to even identify the moving parts that need greasing.

So, be aware that it’s possible to get yourself in trouble by racing ahead with, say, eggs as a package format if you plan to use pip to do installation, or if you want to use Fabric for distribution of tarballs but are in a Windows shop with no SSH servers or tarball extractors.

Part 3: Package Formats

  • tar.gz (tarballs)
  • zip
  • egg
  • rpm/deb/ebuild/<os-specific format here>
  • None

If you choose not to decide, you still have made a choice…

Don’t forget that picking a package format also includes the option to not use a package format. I’ve worked on projects of some size that treated a centralized version control system as a central distribution point as well. They’d manually log into a server, do an svn checkout (back when svn was cool and stuff), test it, and if all was well, they’d flip a symlink to point at the newly checked out code and restart. Deployment was not particularly automated (though it could’ve been), but some aspects of the process were surprisingly good, namely:

  • They ran a surprisingly complete set of tests on every package, on the system it was to be deployed on, without interrupting the currently running service. As a result, they had a high level of confidence that all dependencies were met, the code acted predictably, and the code was ‘fit for purpose’ to the extent that you can know these things from running the available tests.
  • Backing out was Mind-Numbingly Easy™ – since moving from one version to the next consisted of changing where a symlink pointed to and restarting the service, backing out meant doing the exact same thing in reverse: point the symlink back at the old directory, and restart.

I would not call that setup “bad”, given the solutions I’ve seen. It just wasn’t automated at all to speak of. It beats to hell what I call the “Pull & Pray” deployment scenario, in which your running service is a VCS checkout, and you manually log in, cd to that directory, do a ‘pull’ or ‘update’ or whatever the command does an in-place update of the code in your VCS, and then just restarting the service. That method is used in an astonishingly large number of projects I’ve worked on in the past. Zero automation, zero testing, and any confidence you might find in a solution like that is, dare I say, hubris.

Python Eggs

I don’t really want to entertain using eggs and the reasoning involves understanding some recent history in the Python packaging and distribution landscape. I’ll try to be brief. If you decide to look deeper, here’s a great post to use as a starting point in your travels.

distutils is built into Python. You create a setup.py file, you run ‘python setup.py install’, and distutils takes over and does what setup.py tells it to. That is all.

Setuptools was a response to features a lot of people wanted in distutils. It’s built atop distutils as a collection of extensions that add these features. Included with setuptools is the easy_install command, which will automatically install egg files.

Setuptools hasn’t been regularly and actively maintained in a year or more, and that got old really fast with developers and other downstream parties, so some folks forked it and created ‘distribute‘, which is setuptools with all of the lingering commits applied that the setuptools maintainer never applied. They also have big plans for distribute going forward. One is to do away with easy_install in favor of pip.

pip, at time of writing, cannot install egg files.

So, in a nutshell, I’m not going to lock myself to setuptools by using eggs, and I’m not going down the road of manually dealing with eggs, and pip doesn’t yet ‘do’ eggs, so in my mind, the whole idea of eggs being a widely-used and widely-supported format is in question, and I’m just not going there.

If I’m way out in left field on any of that, please do let me know.

Tarballs

The Old Faithful of package formats is the venerable tarball. A 30-year-old file format compressed using a 20-year-old compression tool. It still works.

Tarball distribution is dead simple: you create your project, put a setup.py file inside, create a tarball of the project, and put it up on a web server somewhere. You can point easy_install or pip at the URL to the tarball, and either tool will happily grab it, unpack it, and run ‘python setup.py install’ on it. In addition, users can also easily inspect the contents of the file without pip or easy_install using standard tools, and wide availability of those tools also makes it easy to unpack and install manually if pip or easy_install aren’t available.

Python has a tar module as well, so if you wanted to bypass every other available tool you could easily use Python itself to script a solution that packages your project and uploads it with no external Python module dependencies.

Zip Files

I won’t go into great detail here because I haven’t used zip files on any regular basis in years, but I believe that just about everything I said about tarballs is true for zip files. Python has a zipfile module, zip utilities are widely available, etc. It works.

Distro/OS-specific Package Formats

I’ve put this on the ‘maybe someday’ list. It’d be great if, in addition to specifying Python dependencies, your package installer could also be aware of system-level dependencies that have nothing to do with Python, except that your app requires them. :)

So, if your application is a web app but requires a local memcache instance, RPM can manage that dependency.

I’m not a fan of actually building these things though, and I don’t know many people who are. Sysadmins spend their entire early careerhood praying they’ll never have to debug an RPM spec file, and if they don’t, they should.

That said, the integration of the operations team into the deployment process is, I think, a good thing, and leveraging the tools they already use to manage packages to also manage your application is a win for everyone involved. Sysadmins feel more comfortable with the application because they’re far more comfortable with the tools involved in installing it, and developers are happy because instead of opening five tickets to get all of the system dependencies right, they can hand over the RPM or deb and have the tool deal with those dependencies, or at least have the tool tell the sysadmin “can’t install, you need x, y, and z”, etc.

Even if I went this route someday, I can easily foresee keeping tarballs around, at least to keep developers from having to deal with package managers if they don’t want/need to, or can’t. In the meantime, missing system-level dependencies can be caught when the tests are run on the machine being deployed to.

Let Me Know Your Thoughts

So that’s it for Volume 1. Let me know your thoughts and experiences with different packaging formats, distribution, deployment, whatever. I expect that, as usual in the Python community, the comments on the blog post will be as good as or better than the post itself. :)

PyCharm is My New Python IDE

Friends, family, and maybe regular readers know that I’m more likely to publicly

Regular readers know that I’ve used a large number of IDEs over the past several years. They also know that I have, in every single case, returned to Vim, and I’ve spent a lot of time and effort making Vim be a more productive tool for me.

No more. I’m using PyCharm. It’s my primary code editor.

I’ve been using it since the very early EAP releases — maybe the first EAP release. I have rarely been disappointed, and when I was, it was fixed fairly rapidly. Here’s a quick overview of the good and bad.

Vim Keybindings!

I’ve been using Vi and Vim for an extremely long time. Long enough that whenever I’m editing text, I instinctively execute Vi keystrokes to navigate. Apparently my brain just works that way and isn’t going to stop. When I use other editors, and talk to users of other editors, one of the first things that comes up is how to do things efficiently by exploiting the keyboard shortcuts provided by the editor, so why reinvent the wheel? Why make me learn yet another collection of key strokes to get things done?

Sure, Vi keybindings are a pretty much completely arbitrary set of shortcuts, but so are whatever shortcuts anyone else is going to come up with. I’m glad that PyCharm decided to let the Vi-using community easily embrace its IDE.

And, by the way, PyCharm has by far the best and most complete Vi emulation mode I’ve ever seen in any IDE.

Git Integration

Well, not just git, but I use git. The git integration isn’t 100% flawless, but it’s perfect for most day-to-day needs. I use it with local repositories, as well as a centralized one at work, in addition to GitHub. Updating the project works really well, and lets me easily see what changes were applied. Likewise, when I’m ready to push and a file shows up in the list I didn’t recall changing, a quick double-click let’s me see what’s going on in a very nice diff viewer.

The most recent EAP release of PyCharm adds GitHub support specifically, in addition to Git. I’m not sure what that’s about just yet, because I’ve been perfectly happy using PyCharm with GitHub for quite some time. I’ll have to report further on that later.

It Gets Python Right

This is pretty huge. PyCharm makes me a more productive coder because it points out when I’ve done something goofy. If I mistype a variable, forget a colon, or whatever, I expect any editor to let me know, but PyCharm goes further than that. It’ll let me know if I’ve inadvertently changed the signature in a method I’m overriding in a subclass. With a single click, it’ll also open up the file the overridden method lives in and show me the method. That rocks.

It’ll also suggest imports if it can’t resolve a reference I’ve made in my code. Pressing alt+Enter adds the import to the file (at the top, not inline) without moving the cursor.

Code completion is good, but it’s not actually the reason I use an IDE, so I’m not one of those people who opens an IDE, types two lines into it, judges the outcome, and never comes back if it’s not precisely what I expect. I want to see how the overall environment makes me more productive. Code completion has always been more ‘gravy’ than ‘meat’ to me. I do happen to like that when I make a method call, PyCharm very unobtrusively and without taking over shows me the method signature. I honestly don’t require more than that.

Indentation, I must say, is very good. I don’t think perfection exists, but PyCharm comes closer than most. One detail I rather like is that PyCharm will automatically dedent after a return statement. This attention to detail pervades PyCharm, and pretty much guarantees I’ll miss a lot of cool features in PyCharm that I use all the time without thinking about it. My bad.

Run Configurations

It’s pretty easy to do just about whatever you want to do to run your code. For example, I use nosetests and coverage.py within PyCharm. Another PyCharm user I know uses pylint in PyCharm. I recently added a pep8 run configuration to PyCharm, and I have a pylint configuration too (though I’ve found that pylint doesn’t actually tell me much about my code I don’t already know, so it’s kind of a nag: “Yeah yeah, I’ll fix it” I find myself saying.)

Point is, if there’s a tool you use in your development process, it’s probably doable from within PyCharm, so you don’t have to break your focus.

Nits

I really only have a couple of minor nits about PyCharm. I’ll put them here just so you don’t think I’m getting paid to do this or something (‘cos believe me, nobody would pay me to say these things about their product):

  1. It’s kind of a beast in terms of performance. There’s a certain threshold after which shiny splash screens and pretty icons fail to hide the fact that your app is just a bear. I have a desktop with 4GB RAM and it runs “ok”. It’s reasonably fast. One thing I’ll note about it is that the performance, for whatever reason, doesn’t seem to get exponentially worse as more windows are added.
  2. Window crippling: I hate applications that refuse to let you activate a window when a dialog it considers more important is open. If I have the git commit dialog open, I should be able to start writing a commit message, refer back to the code by clicking the editor window, then click back and finish my message. It’s one example. Others abound. It’s damned annoying at times.
  3. It seems to still fail parsing docstrings. I’m surprised this is still broken. Basically, if I put a docstring in triple quotes at the module level, it marks the docstring as problematic, saying ‘line appears to do nothing’. Well, duh — it’s a docstring.
  4. I find its support for git branches to be really really clumsy to use. You can create new branches, set up tracking branches, etc., but it’s very confusing and unintuitive, and the documentation for the features don’t seem to use terminology I as a git user am familiar with. I actually don’t do my branching operations in PyCharm as of right now because I don’t want to screw anything up.

Overall Opinion: It’s a Win

At least 3 people who know me well are falling out of their chairs, or at least wondering who kidnapped me and took over my blog and started writing nice things about IDEs. It’s not something I really… do.

Look, PyCharm is a win. I don’t like every single thing about it, but here’s the deal:

  1. It does a great many things at least reasonably well (many much better than that).
  2. It does a bunch of other things in a way that is at least not broken (I have issues with some of their UI decisions for some obscure functions, but the functions do work)
  3. It doesn’t actually massively screw up anything that I’ve been able to find.
  4. It tries.

By “It tries”, what I mean is that the PyCharm team seems to go out of its way to make sure that the existing features work, that new features aren’t broken upon release for the sake of saying “HEY, NEW FEATURE!”, and that the finer details of Python don’t go ignored.

I once (ok, maybe twice) tweeted about PyCharm’s inability to just… open a file. Sometimes I want to do that. You know… open a file. Sometimes more than one. Sometimes I’m working on a project, and in the spirit of code reuse I want to open two or three other files from other projects as a handy reference, or to double check my work if tests fail (heck, I might just want the test code!). PyCharm used to make that impossible, but within about a month of my mentioning it on Twitter it’s now implemented in the most recent EAP. Others have had similar experiences. They’re responsive. They try. It’s appreciated.