PyTPMOTW: First Post!

Doug Hellmann’s excellent PyMOTW series started 3 years ago yesterday, and I’ve loved every minute of it. His sheer commitment and dedication to his craft is to be commended; it’s not easy to find the discipline to sit down every single week and commit the time and research to carve bits in such a way that people who come across them leave with a greater understanding of, and a higher level of confidence with the topic at hand. Thanks, Doug!

Doug’s work has inspired me to take on a sort of sister series called Python Third Party Module of the Week. While there’s probably no way I could possibly cover every single third party module, I’ll try to cover them in enough variety and depth to make things interesting. Here’s a sampling of the modules on the “short list” for the series, but feel free to make suggestions of your own in the comments!

So, there’s 10 third party modules to start with. I might change it up, but these are off the top of my head. I’ve used all of them, most of them for production code. I do most of my work in the system/network space, and entirely in a *nix environment. I’m happy to take suggestions for whatever everyone is interested in, though. I can and do code for the web pretty regularly too; I’ve written a module myself for Django, and I’m helping to create an app template (later, hopefully, a framework) to aid rapid development with Tornado, so don’t be fooled by the rugged “systems”-related modules – I want your suggestions about anything.

There are also things I’d love to do more with, like PyGame, Pyro, and wxPython, but these strike me as frameworks more than modules. I’m looking to cover things you import into your app and use, not things that dictate the entire construction of your application like a framework tends to do.

What Do You Want to See?

There’s a vast ocean of third-party modules. If you’re a module author or a fan of a particular module, feel free to suggest it in the comments here or send me mail or a tweet. My Twitter handler is @bkjones, and that’s also my gmail user name ;-)

Also, if there’s a particular thing you need to do with Python and aren’t sure which module to use, let me know! I’ll try to figure out the best-of-breed module for the task and write about it here. Win win!

Why Don’t You Write?

No reason to stop here! I had considered doing “Python Technique of the Week”, but chose this instead. It’d be great to see someone cover things like delegation, mixins, generators, context managers, event-driven programming, the observer pattern… whatever, maybe pulling exemplary code from open source projects, or at least providing decent examples — You do it!

It’s also pretty clear from my perusal of PyPI that a “Python Framework of the Week” would actually run quite a long time without running out of things to write about, and would probably be pretty popular. A post I wrote some time ago about ReportLab has been in the top 5 posts on this site since it was posted

If you’re already writing about Python on a regular basis, post a link, and if you’re not already there, submit your blog to the kind folks at Planet Python (Python topic/category feed URLs only please) so we can all keep up with everything. There’s some brilliant content out there that I’ve come across, and it doesn’t all make it to Reddit, Hacker News, or the Planet.

Python Testing Beginner’s Guide: The Review

I try not to make a habit of reviewing technical books. I own more than my fair share of technical books, and I’ve been involved in publishing and even wrote a book myself. Most technical books, on the whole, are pretty bad.

So, there’s Disclaimer 1: you know the mindset I’m starting with. I’m a bit critical of tech books, publishers, and occasionally even authors (but usually publishers).

Most book reviews are also biased for one reason or another, and I hate to be lumped into a pool of “book reviewers” that I have nothing to do with. A great many of them are astroturfers trading favors with the publishing company or just cheerleading in general when in fact they’ve never even seen the book.

So, Disclaimer 2:

I requested a free “review copy” of Python Testing: Beginner’s Guide, specifically because I wanted to pick up some of the techniques I expected I’d find in the book on a project I was working on in the moment. The timing really could not have been better. They requested that I post a review, and I said that I would — “good or bad”.

So let’s talk about the book!

Things That Impressed Me

It’s an eBook

I got an eBook from Packt Publishing and dove into the book almost immediately, without having to wait behind some annoying old lady griping incessantly to an equally annoying emo-hipster wannabe behind the counter at a bookstore who inevitably can’t do whatever it is this old lady wants. We’ve all been there. eBooks rock. I think some publishers do better than others where eBooks are concerned, but that’s a tale for another day. I was happy I could at least get an eBook (PDF).

Code Listings

This book’s author and/or editor appears to have taken great pains to insure that the Python code listings do not break across pages in strange places. Python, as a language, isn’t exactly custom made for book layouts: since there are no brackets and only indentation is used to denote the scope of a given statement, if you have trouble telling how far a line on one page is indented relative to the line on the next page, it can be tough. Especially on newbies, but in some cases just as much for experienced coders. Anyway, I could not find a single instance of this issue in this book, and I was really impressed by that.

Example Scenarios

Ok, pretty much all example scenarios given by an author to provide some context to illustrate a concept are contrived. That doesn’t bother me. What does bother me is that most authors just shrug and say “this is how it is, and how it will always be, forever, so it doesn’t matter that my example scenario completely fails to engage the reader”. This author didn’t do that. He came up with example scenarios that were pretty much contrived, but he lent to them a “slice of life” that made them a billion times more interesting to actually read through, and I did!

I often just skim the explanation of the example scenario because most of them are so much alike, and are such null operations, that the pages spent explaining “Stupidest Program You’ll Ever Write: Take 1,112,039″ are a complete waste of space. But I read these, and really enjoyed them. I mean really, who sits down to explain Python Testing and says to themselves “I know! We’ll write a PID controller!”. The answer is Daniel Arbuckle, this book’s author.

Python Intro

There isn’t one in this book, and I’m happy about that. If you don’t know the language, you shouldn’t be reading a book devoted to testing code you don’t know how to write. That said, beginner books on languages could (and perhaps should, at least to an extent) teach by starting with testing, or at least integrate testing into the learning. Anyway, this book skips those really mind-numbing first three chapters about who Guido is and how ‘Python’ is not a reptilian reference. The author dives right into the meat of the matter. Very good.

The Author

I don’t know Daniel Arbuckle. Never met him. Never heard of him. Don’t think he wrote any other books that I know of. But the guy can really write. There’s only so much an editor or publisher can do for you in terms of your voice as an author. At some point, it’s just you, the author, telling a story to the reader, and efforts by the publisher to make it “more this” or “more that” just become really obvious and distracting.

Daniel Arbuckle writes like he really is excited to be talking to you about Python testing. He has interesting things to say, he writes in a good-natured, friendly, engaging tone, and he is thoughtful about his audience of beginners throughout. If you know Python, but are a little bewildered by testing, you’re going to understand everything Arbuckle says, and you’re going to have lots of light bulb moments. The shocking bit to me was this this is a guy who has a PhD in Computer Science, and a lot of PhD’s (in any topic area) write everything as if they’re writing their thesis. A thesis is not typically written to engage newcomers and non-academics.

Impact

At the start of this book, my experience with testing consisted of historically writing some really ugly code to do what amounts to functional testing, and I had, over the past year, taken to writing proper unit tests for some of my less-complicated projects. More complicated projects involving threading/multiprocessing with networked queues, databases and remote APIs? Well, I was trying to organize my code to make it more testable, but it impacted my development timeline to such an extent that testing took a back seat to deadlines, sadly (I’m the only person I know who has an actual, genuine *interest* in code quality).

At the end of this book, I still have some of the same problems, of course, but I am extremely excited and far more confident that I *will* eventually be able to work this into my breakneck development pace and have it be effective. I have a pretty firm grasp of both unittest and doctest, in addition to Nose, and one thing I *didn’t* have going in was an understanding that doctest is actually perfectly suited for testing in some areas where unittest might be a little overkill, harder to do, etc.

I also was able to put my own experiences with testing into perspective, and I was able to apply some things I learned pretty much right away.

Things That Did Not Impress Me

The Index

Here’s the thing, 99% of all tech book indexes are terrible, because 99% of all books have a computer-generated index that the author never sees until the published book reaches his doorstep. The books with really good indexes utilize input from the author, and I use the quality of an index as a sign of the quality of the publisher, the editor, their ability to manage the hectic process that is the creation of a book, and their commitment to quality.

This book’s index is not nearly the worst I’ve seen, but it’s pretty bad. If you’ve never noticed that the indexes of several books you own are terrible, you can probably disregard this.

The Editing

There’s the author, and then there’s the editor. When you’re looking for authors, you’re less concerned with whether they can write in accordance with the Chicago Style Manual and more interested in whether they can convey ideas to an audience using words. It’s the job of the publisher’s staff to deal with grammatical problems and issues with punctuation. So when you see bad grammar and punctuation (or even spelling) in a technical book, look to the publisher, not the author. The publisher’s job is to make the author look like a rock star. If you’re not thinking that, it’s not the author’s fault.

As it turns out, one of two things seems to have happened in this book:

  1. Daniel Arbuckle didn’t major in English, so his use of commas is off, and the editors failed to pick up on that (like, a lot), or
  2. The publisher hired a summer intern from the local high school to do Mr. Arbuckle the favor of inserting commas wherever she thought they looked pretty.

I don’t know which one of these things happened, but the misuse of punctuation in this book really bugged me.

Mock Coverage

Before I read this book, I perceived mocking objects as being the hardest part of testing my more complicated applications. After reading this book, I still feel that way.

Unlike Arbuckle’s coverage of both unittest and doctest (and nose) for performing basic unit tests, he only covers one mocking library/framework: Python Mocker. I was glad to go through the exercise, because it forced me to sit down and do things with a record/playback style framework, but I still feel like this style of framework just does not fit my brain. I really wish the book had one more chapter about object mocking using a library that did not use that model. There are a few out there. One I’ve found that looks good (if I could ever get time to dig into it) is Michael Foord’s Mock, which uses an ‘action/assertion’ model, much like the rest of the testing tools outside of mocking frameworks.

In All…

If you know Python, and you’re not testing yet, I strongly recommend this book to get you started. It covers unittest, doctest, nose, twill; the differences between unit testing, functional testing and integration testing; testing web sites; and lots of other stuff. It also has a ‘catch-all’ chapter that talks about coverage.py, installing nose as a post-commit hook to every VCS under the sun, etc.

I’m confident that you’ll find this book a useful introduction to the world of Python testing, and that it will equip you with the knowledge of both tools and concepts you’ll need to go off on your own and solve as yet unforeseen testing issues you might come across.

Quick Loghetti Update

For the familiar and impatient: Loghetti has moved to github and has been updated. An official release hasn’t been made yet, but cloning the repository and installing argparse will result in perfectly usable code. More on the way.

For the uninitiated, Loghetti is a command line log sifting/reporting tool written in Python to parse Apache Combined Format log files. It was initially released in late 2008 on Google Code. I used loghetti for my own work, which involved sifting log files with tens of millions of lines. Needless to say, it needed to be reasonably fast, and give me a decent amount of control over the data returned. It also had to be easy to use; just because it’s fast doesn’t mean I want to retype my command because of confusing options or the like.

So, loghetti is reasonably fast, and reasonably easy, and gives a reasonable amount of control to the end user. It’s certainly a heckuva lot easier than writing regular expressions into ‘grep’ and doing the ol’ ‘press & pray’.

Loghetti suffered a bit over the last several months because one of its dependencies broke backward compatibility with earlier releases. Such is the nature of development. Last night I finally got to crack open the code for loghetti again, and was able to put a solution together in an hour or so, which surprised me.

I was able to completely replace Doug Hellmann’s CommandLineApp with argparse very, very quickly. Of course, CommandLineApp was taking on responsibility for actually running the app itself (the main loghetti class was a subclass of CommandLineApp), and was dealing with the options, error handling, and all that jazz. It’s also wonderfully generic, and is written so that pretty much any app, regardless of the type of options it takes, could run as a CommandLineApp.

argparse was not a fast friend of mine. I stumbled a little over whether I should just update the namespace of my main class via argparse, or if I should pass in the Namespace object, or… something else. Eventually, I got what I needed, and not much more.

So loghetti now requires argparse, which is not part of the standard library, so why replace what I knew with some other (foreign) library? Because argparse is, as I understand it, slated for inclusion in Python 3, at which point optparse will be deprecated.

So, head on over to the GitHub repo, give it a spin, and send your pull requests and patches. Let the games begin!

Programmers that… can’t program.

So, I happened across this post about hiring programmers, which references two other posts about hiring programmers. There seems to be a demand for blog posts about hiring programmers, but that’s not why I’m writing this. I’m writing because there was this sort of nagging irony that I couldn’t help but stumble upon.

In a blog post, Joel Spolsky talks about the mathematical inaccuracies associated with claims of “only hiring the top 1%”. It seemed pretty obvious to me that whether or not you’re hiring the top 1% of all programmers is pretty much unknowable, and when managers say they hire “the top 1%”, I assume they’re talking about the top 1% of their applicants. Note too that I always thought it was idiotic to point this out, because, well, isn’t that what you’re SUPPOSED to do? You’re not very well going to aim for the middle & hope for the best are you?

Apparently I’ve been giving too much credit to management. There I go giving people with ties on the benefit of the doubt again.

Then, in another blog post, Jeff Atwood talks about how it’s very difficult to even get interviews with programmers who can actually program. The problem is real.

The original blog post that pointed me at the two others is one by Roberto Alsina where he talks about his own methods for weeding out the non-programmers. He’s clearly seen the issue as well.

But if you open all three of these posts in separate tabs and read them, you’re likely to come away with the same basic problem I did:

  • Who the hell are these managers who can’t figure out a dead simple statistics problem?
  • How can a person fairly inept at simple math be qualified to make a hiring decision for anything but a summer intern?

That sorta blew my mind a little. But it blew my mind a lot when Atwood started describing the problems that interviewees *couldn’t* perform in an interview! One task described by Imran was called a ‘FizzBuzz’ question. Here’s one such question:

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Here’s the part that blew my mind: He says, and I quote:

Most good programmers should be able to write out on paper a program which does this in a under a couple of minutes.

Want to know something scary ? – the majority of comp sci graduates can’t. I’ve also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.

That’s amazing to me. I decided to quickly pop open a Python prompt and see if I could do it:

>>> for i in range(1,101):
...     if (i % 3 == 0) and (i % 5 == 0):
...             print i,'FizzBuzz'
...     elif i % 3 == 0:
...             print i, 'Fizz'
...     elif i % 5 == 0:
...             print i, 'Buzz'
...     else:
...             print i
...

Note that I’ve taken the liberty of printing out the numbers in addition to the required words. I’m playing the role of interviewer and interviewee here, and wanted to be able to easily verify that things were correct, since there was no time for unit testing :)

Turns out it worked on the first try! That was pasted directly from my terminal screen. I didn’t time myself, but it took far less than 5 minutes. This leads to my other question, of course, which is “if you’re going to complain about CS degree holders not writing good code, maybe it’s time to open the doors to non-CS degree holders?”