Category Archives: Big Ideas

On Keeping A Journal and Journaling

I've kept a journal since I was 11. That was in the mid 80's. I hadn't heard the word “journaling” until about a year or so ago, and lately it appears that giving advice and howto information about journaling is a bit of a cottage industry. I had no idea so many people were in demand of this information, but in a world where you can subscribe to multiple very successful monthly magazines about running, I shouldn't be so shocked.

I should also not be shocked to learn that people have heard some of the benefits people tout about journaling and want those benefits for themselves. But maybe some are skeptical and wonder “is journaling really all it's cracked up to be?” For those folks, let me say this: I've read a few of the blogs myself, and while I'd say the authors are mostly 1 or 2-year “veterans” who don't really get it yet, the benefits they say you can get from journaling are not only possible, they're just the tip of the iceberg.

The key to getting started, though, is probably to completely ignore everything you've read, go get a Pilot v5 pen, and some kind of blank book, and just start doing as you please. Beyond encouraging words and listing the benefits, what the books and blogs are telling you to do is, quite honestly, bullshit. They're either trying to put arbitrary rules around it, or telling you about the process they themselves find useful… for them. Some things I read seemed utterly destructive to some of the benefits. Tread carefully around advice, from me or anyone else.

Journaling is ultimately a personal thing. All of it. Not just the process, but the goals or what you want to get out of it. Furthermore, most of the goals I've achieved over multiple decades of journaling (and they are both countless and priceless) happened not because I willed it to be so and structured my process or writing to meet them, but by happy accident. I was benefitting from journaling long before I even realized what journaling had brought to my life.

So the idea that you can learn to journal strikes me as pretty odd. But I do encourage you to do it. Just find some quiet time, a blank page, and a pen, and spill it. Even if it looks boring. In the meantime, if you have questions best answered by someone who has been doing this for more than a year or two, let me know and I'll be happy to help if I can.

What Geeks Could Learn From Working In Restaurants

I spent a long time in my teens and early 20′s working in restaurants. To be honest, I loved just about every minute of it. It’s exciting, you’re never idle, and it’s usually not boring: you either get immediate gratification from happy customers leaving a nice tip, or you’re on the defensive, thinking on your feet to save an experience after something has gone wrong. 

In contrast, technology, I’ve found, requires more of an investment to get to the gratification of a job well done. I’ve also always felt like certain aspects of technology, like coding, are somewhat therapeutic for me, whereas restaurant work mostly wasn’t. And, occasionally, technology work can be monotonous and boring.

In spite of the quite dramatic differences between the two working environments, I sometimes find myself thinking about restaurants when I’m working on software or systems. It turns out that some of the tenets of restauranteurism, or just waitering or bartending, can be applied to work in the tech sector. Allow me a bit of license to riff on some of them here:

  • Treat your station as a single table. If one table needs a refill on a drink, chances are so do one of the other 4 tables that are all within 10 feet of that one. Save yourself some running by checking with all of the tables when one asks for something. In geek terms, this maps to looking for wins across multiple projects in your environment. Does your app need user authentication or a mail transport service? Chances are other projects do too. Looking for these kinds of wins can also help distribute the load of creating the requirements, and foster a sense of shared ownership in the result. 
  • Anticipate the need. Your customers shouldn’t have to ask for everything they need. Ideally they wouldn’t need to ask for anything they need, because you’d keep an eye on their drinks, you’d notice that someone wasn’t eating their salad and suspect something is wrong, you’d know that having kids means the first thing to the table is a huge stack of napkins, etc. The point is, there’s what’s happening now, and what is, in your professional opinion, likely to screw that all up. Your experience will likely guide you to solutions that are likely to keep that from happening. If not, you’re about to get more experience. Just like in technology ;-)  
  • Presentation counts. All restaurants fiddle with the light levels, the music rotation, how food is placed on the plate, the garnishes added to drinks, and any restaurant you’d want to eat in is fastidious about tiny, tiny details from sweeping the entry foyer, to replacing tablecloths, to tossing old-looking napkins, to replacing chairs with tiny pinhole tears in the leather. Why? Because a restaurant is, in the process of serving you, also constantly selling you through your experience there. Of course, your software should do this, but you can get deeper than that: all of those ideas you have that nobody listens to, or you can’t get approval for? You’re missing some details. You don’t have any metrics, or you’re giving a geeky experience to a business-y audience, or you’re not presenting your idea in the form of an experience, or you’re creating a vision of a present instead of a product in the mind of the person with the checkbook.
  • Presentation can’t save poor quality. You can put all kinds of shiny AJAX-y Bootstrap-y stuff all over your site. It could be the nicest UI in existence. If the service doesn’t work, you’re toast. And those ideas you’re trying to sell internally? You’d better be confident they’ll work and have a solid execution plan, because a couple of bombs and no amount of presentation goodness is going to get you to “yes” again.
  • Consistency! Food quality, presentation, service… the entire dining experience, from the moment you set foot on the property to the moment you leave, should ideally be completely 100% consistent. Completely predictable. Zero surprises. It makes the restaurant easy to get to know, because if you’ve seen one, you’ve basically seen them all. Now think about the naming of your variables. Do you abbreviate here and write it out there? Think about how you package your software. Do you sometimes provide an XML config file and sometimes INI format? Think about your code layout. Do you sometimes use inheritance and sometimes composition, when either will do? Do you sometimes test? Do you sometimes… anything? Small inconsistencies add up. I believe we’re pretty much all guilty of them sometimes, but we should all strive to minimize them. 
  • Minimize Complexity. Very little inside of a restaurant is complicated or complex. Inevitably, the friction points where small complications live are where problems arise. A huge number of issues within a restaurant are human error: mistaken orders, mismatching food to table when the tray is assembled, food under- or overcooked, drinks made wrong or sent to the wrong table, miscalculated bills, forgotten requests for drink refills or side orders, the list goes on for an eternity. Ironically, computers are employed to help with some of it, and the food service world is better for it on the whole. In technology, and I’m sure in most jobs, most of the issues are more with the people than the technology. From forgotten feature details to overlooked metrics, from miscommunicated load projections to undocumented security policies, the clumsy inner workings, and inter-workings of human beings gets us every time. 
  • Mise en place! This is a French term, and I don’t speak much French, but in a kitchen it means “have your shit together and easy to reach, kid”. Chefs on a line will have a shelf or some other space that’s within arm’s reach where they put bowls and bottles of things they need all the time. The pre-meal time for a chef is spent double-checking that this small area is picture perfect, double-checking that refills and extras will be easy to grab during the shift, and making sure there’s no excuse for failure. I see teams fail at this quite a lot, actually. The mise en place for a technology worker are all of those things that need to be in place in order for a service to go to production. Metrics and monitoring. Hardware allocations. Infrastructure service dependencies. Security concerns. Power! I cannot count how many times a service has gotten to the week of deployment before a request to allocate storage in production was created, or the number of times hardware arrived before anyone thought about where it would be plugged in, or what that would do to the uptime provided by the UPS during an outage. 
Overall, I’d say it’s been for my progress in the technology industry to have grown and learned lessons from experiences in other industries, even if those experiences sometimes came from rather menial roles. Also, as my great grandmother used to say, all of that sweaty, smelly work was also good for me: “It builds character!” 

What I’ve Been Up To

Historically, I post fairly regularly on this blog, but I haven’t been lately. It’s not for lack of anything to write about, but rather a lack of time to devote to blogging. I want to post at greater length about some of the stuff I’ve been doing, and I have several draft posts, but I wanted to list what I’ve been up to for two reasons:

  1. I use my blog as a sort of informal record of what I accomplished over the course of the year, hurdles I ran into, etc. I also sometimes use it to start a dialog about something, or to ‘think out loud’ about ideas. So it’s partially for my own reference.
  2. Someone might actually be interested in something I’m doing and want to pitch in, fork a repo, point me at an existing tool I’m reinventing, give me advice, or actually make use of something I’ve done or something I’ve learned and can share.

PyCon 2013

I’m participating again for the third year in the Program Committee and anywhere else I can help out (time permitting) with the organization of PyCon. It’s historically been a fantastic and really rewarding experience that I highly (and sometimes loudly) recommend to anyone who will listen. Some day I want to post at greater length about some actual instances where being really engaged in the community has been a win in real, practical ways. For now, you’ll have to take my word for it. It’s awesome.

I also hope to submit a talk for PyCon 2013. Anyone considering doing this should know a couple of things about it:

  1. Even though I participate in the Program Committee, which is a completely volunteer committee that takes on the somewhat grueling process of selecting the talks, tutorials, and poster sessions, it’s pretty much unrelated to my chances of having my talk accepted. In other words, submitting a talk is as daunting for me as it is for anyone. Maybe more so.
  2. Giving a talk was a really rewarding experience, and I recommend to anyone to give it a shot.

I just published a really long post about submitting a talk to PyCon. It’s full of completely unsolicited advice and subjective opinions about the do’s and don’ts of talk submission, based on my experiences as both a submitter of proposals and a member of the Program Committee, which is the committee that selects the talks.

Python Cookbook

Dave Beazley and I are really rolling with the next edition of the Python Cookbook, which will cover Python 3 *only*. We had some initial drama with it, but the good news is that I feel that Dave and I have shared a common vision for the book since just about day one, and that shared vision hasn’t changed, and O’Reilly hasn’t forced our hand to change it, which means the book should be a really good reflection of that vision when it’s actually released. I should note, however, that the next edition will represent a pretty dramatic departure from the form and function of previous versions. I’m excited for everyone to see it, but that’s going to have to wait for a bit. It’s still early to talk about an exact release date – I won’t know that for sure until the fall, but I would expect it to be at PyCon 2013.

PyRabbit

I’ve blogged a bit about pyrabbit before: it’s a Python client for talking to RabbitMQ’s RESTful HTTP Management API. So, it’s not for building applications that do message passing with AMQP — it’d be more for monitoring, polling queue depths, etc., or if you wanted to build your own version of the browser-based management interface to RabbitMQ.

Pyrabbit is actually being used. Last I looked, kombu was actually using it, and if memory serves, kombu is used in Celery, so pyrabbit is probably installed on more machines than I’m aware of at this point. I also created a little command shell program called bunnyq that will let you poke at RabbitMQ remotely without having to write any code. You can create & destroy most resources, fire off a message, retrieve messages, etc. It’s rudimentary, but it’s fine for quick, simple tests or to validate your understanding of the system given certain binding types, etc.

I have a branch wherein I port the unit tests for Pyrabbit to use a bit of a different approach, but I also need to flesh out more parts of the API and test it on more versions of RabbitMQ. If you use Pyrabbit, you should know that I also accept pull requests if they come with tests.

Stealth Mode

Well, ‘stealth’ is a strong word. I actually don’t believe much in stealth mode, so if you want to know just ask me in person. Anyway, between 2008 and 2012 I’ve been involved in startups (both bought out, by the way! East Coast FTW!) that were very product driven and very focused on execution. I was lucky enough to answer directly to the CEO of one of those companies (AddThis) and directly to the CTO of the other (myYearbook, now meetme.com), which gave me a lot of access and insight into the mechanics, process, and thinking behind how a product actually comes to be. It turns out I really love certain aspects of it that aren’t even necessarily technical. I also really find the execution phase really exciting, and the rollout phase really almost overwhelmingly exciting.

I’ve found myself now with an idea that is really small and simple, but just won’t go away. It’s kind of gnawing at me, and the more I think about it, the more I think that, given what I’ve learned about product development, business-side product metrics, transforming some stories into an execution plan, etc., on top of my experience with software development, architecting for scalability, cloud services, tools/technologies for building distributed systems, etc., I could actually do this. It’s small and simple enough for me to get a prototype working on my own, and awesome enough to be an actual, viable product. So I’m doing it. I’m doing it too slowly, but I’m doing it.

By the way, the one thing I completely suck at is front end design/development. I can do it, but if I could bring on a technical co-founder of my own choosing, that person would be a front end developer who has pretty solid design chops. If you know someone, or are someone, get in touch – I’m @bkjones on Twitter, and bkjones at gmail. I’m jonesy on freenode. I’m not hard to find :)

In and Out of Love w/ NoSQL

I’ve recently added Riak to my toolbelt, next to CouchDB, MongoDB, and Redis (primarily). I was originally thinking that Riak would be a good fit for a project I’m working on, but have grown more uncomfortable with that notion as time has passed. The fact of the matter is that my data has relationships, and it turns out that relational databases are actually a really good fit in terms of the built-in feature set. The only place they really stink is on the operations side, but it also turns out that I have, like, several years of experience in doing that! Where I finally lost patience with NoSQL for this project was this huge contradiction that I never hear anyone ever talk about. You know the one — the one where the NoSQL crowd screams about how flexible everything is and how it really fits in with the “agile” mindset, and then in another doc in the same wiki strongly drives home the message that if you aren’t 100% sure what the needs of your app are, you should really make sure you have a grasp on that up front.

Uhh, excuse me, but if I’m iterating quickly on an app, testing in production, iterating on what works, failing fast, and designing in the direction of, and in direct response to, my customers, HOW THE HELL DO I KNOW WHAT MY APP’S NEEDS ARE?

So, what I’m starting with is what I know for sure: my data has relationships. I’m experienced enough with NoSQL solutions to understand my options for modeling them & enforcing the relationships, but when the relational aspect of the data is pretty much always staring you in the face and isn’t limited to a small subset of operations that rely on the relationships, it seems like a no-brainer to just use the relational database and spend my time writing code to implement actual features. If I find some aspect of the data that can benefit from a NoSQL solution later, well, then I’ll use it later!

Unit Testing Patterns

Most who know me know I’m kind of “into” unit testing. It’s almost like a craft unto itself, and one that I rather enjoy. I recently started a new job at AWeber Communications, where I’m working on a next-generation awesome platform to the stars 2.0 ++, and it’s all agile, TDD, kanban, and all that. It’s pretty cool. What I found in the unit tests they had when I got there were two main things:

First, the project used bare asserts, and used dingus in “mock it all!” mode. Combined, this led to tests that were mostly effective, but not very communicative in the event of failure, and they were somewhat difficult to reason about when you read them.

Second, they had a pretty cool pattern for structuring and naming that gave *running* the tests and viewing the output a more behavioral feel to them that I thought was pretty cool, and looked vaguely familiar. Later I realized it was familiar because it was similar to the very early “Introducing Behavioral Driven Development” post I saw a long time ago but never did anything with. If memory serves, that early introduction did not introduce a BDD framework like the ones popping up all over github over the past few years. It mostly relied on naming to relay the meaning, and used standard tools “behind the curtain”, and it was pretty effective.

So long story short, those tests have mostly been ported to use the mock module, and inherit from unittest2.TestCase (so, no more bare asserts). The failure output is much more useful, and I think the pattern that’s evolving around the tests now is unfinished but starting to look pretty cool! In the process, I also created a repository for unittest helpers that currently only contains a context manager that you feed a list of things to patch, and it’ll automatically patch and then unpatch things after the code under test is run. It has helped me start to think about building tests in a more consistent fashion, which means reading them is more predictable too, and hopefully we spend less time debugging them and less time introducing new developers to how they work.

PyCon Talk Proposals: All You Need to Know And More

Writing a talk proposal needn’t be a stressful undertaking. There are two huge factors that seem to stress people out the most about submitting a proposal, and we’re going to obliterate those right now, so here they are:

  1. It’s not always obvious how a particular section of a proposal is evaluated, so it’s not always clear how much/little should be in a given section, how detailed/not detailed it should be, etc.
  2. The evaluation and selection process is a mystery.

What Do I Put Here?

Don’t fret. Here, in detail, are all of the parts of the proposal submission form, and some insight into what’s expected to be there, and how it’s used.

Title

Pick your title with care. While the title may not be cause for the Program Committee to throw out your proposal, you should consider the marketing aspect of speaking at a conference. For example, there are plenty of conference-goers who, in a mad dash to figure out the talk they’ll attend next, will simply skip a title that requires manual parsing on their part.

So, a couple of DOs and DON’Ts:

  • DOs

    • DO insure that your title targets the appropriate audience for your talk
    • DO keep the title as short and simple as possible
    • DO insure that the title accurately reflects what the talk is about
  • DON’Ts

    • DON’T have a vague title
    • DON’T omit key words that are crucial to understanding who the talk is for
    • DON’T create a title that’s too long or wordy

So, in short, keep the title short and clear. There’s no hard and fast rule regarding length, of course, but consider how many best-selling book titles have ever had more than, say, 7 words? I’m sure it’s happened, but it’s probably more the exception than the rule. If you feel you need a very long title in order to meet the goals of the title, describe your proposal by some friends or coworkers, and when they say “So… how to do ‘x’ with ‘y’, basically, right?”, that’s actually your title. Be gracious and thank them.

Within your concise title, you should absolutely make certain to target your audience, if your audience is a very specific subset of the overall attendee list. For example, if you’re writing a proposal about testing Django applications, and the whole talk revolves around Django’s built-in testing facilities, your title pretty much has to say “Django” in it, for two reasons:

  1. If I’m a Django user, I really want to make sure I’ve discovered all of the Django talks before I decide what I’m doing, and if your title doesn’t say “Django” in it, lots of other ones do. A reasonable person will expect to see a key word like that in the title.
  2. If I’m *not* a Django user and show up to your “Unit Testing Web Applications” talk, only to discover 10 minutes into a 25-minute talk that I’ll get nothing out of it, I’m going to really be peeved.

Finally, unless you totally and completely know what you’re doing, DO NOT use a clever title using a play on words and not a single technology-related word in the whole thing. There are several issues with doing this, but probably the most obvious ones are:

  1. You’re not your audience: just because you get the reference and think it’s a total no-brainer, it’s almost guaranteed that 95% of the attendees will not get it when quickly glancing over a list of 100 talk titles.
  2. You’re basically forcing the reader to read the abstract to see if they’ve even heard of the technology your talk is about. Don’t waste their time!

Category

The ‘Category’ form element is a drop-down list of high-level categories like ‘Concurrency’, ‘Mobile’, and ‘Gaming’. There are lots of categories. You may pick one. So if you have an awesome talk about how you use a standard WSGI app in a distributed deployment to get better numbers from your HPC system without the fancy interconnects, you might wonder whether to put your talk into the ‘HPC’ category or the ‘Web Frameworks’ category.

In cases such as this, it can be helpful to focus on the audience to help guide your decision. What do you think your audience would look like for this talk? Well, of course there are web framework authors and users who will absolutely be interested in the talk, but there isn’t a lot of gain for them in a talk like this, is there? I mean, what are the chances that someone in the audience has always dreamed of writing web frameworks for the HPC market? On the other hand, what are the chances that an HPC administrator, developer, or site manager would love to cut costs, ease deployment, reduce maintenance overhead, etc., of their HPC system by using a totally standard non-commercial web framework? There are probably valid arguments to be made for putting it in the ‘Web Frameworks’ category, but I can’t think of any. I’d put it in the ‘HPC’ category.

One more thing to consider is the other talks at the conference, or talks that could be at the conference, or talks from past conferences. Look at last year’s program. Where does your talk fit in that mix of talks? What talks would your talk have competed with? Is there a talk from last year that is similar in scope to your proposal? What category was it listed in?
Audience Level

There’s a ton of gray area in selecting your target audience level. I’ve never liked the sort of arbitrary “Novice means less than X years experience” formulas, so I’ll do my best to lay out some rules of thumb, but ultimately, what you consider ‘Novice’, and how advanced you think your material is, is up to you. Your choices are:

  • Novice:
    • Has used but not created a decorator and context manager.
    • Has possibly never used anything in the itertools, collections, operator, and/or functools modules
    • Has used but never had any issues with the datetime, email, or urllib modules.
    • Has seen list comprehensions, but takes some time to properly parse them
  • Intermediate:
    • Has created at least a decorator, and possibly a context manager.
    • Has recently had use for the operator module (and used it) and has accepted itertools as their savior.
    • Has had a big problem with at least one of: datetime, email, or urllib.
    • There’s only a slight chance they’ve ever created or knowingly made use of metaclasses.
    • Has potentially never had to use the socket module directly for anything other than hostname lookups.
    • Can write a (non-nested) list comprehension, and does so somewhat regularly.
  • Advanced:
    • Has created both a decorator and context manager using both functions and classes.
    • Has written their own daemonization module using Stevens as their reference.
    • Has been required to understand the implications of metaclasses and/or abstract base classes in a system.
    • May be philosophically opposed to list comprehensions, metaclasses, and abstract base classes
    • Has subclassed at least one of the built-in container types

Still not sure where your talk belongs? Well, hopefully you’re torn between only two of the user categories, in which case, I say “aim high”, for a few reasons:

  1. It’s generally easier to trim a talk to target a less experienced audience than you were expecting than to grow to accommodate a more experienced audience than you were expecting.
  2. Speaking purely anecdotally and with zero statistics, and from memory, there are lots more complaints about talks being more advanced than their chosen audience level than the reverse.

The Program Committee uses the category to insure that, within any given topic space, there’s a good selection of talks for all levels of attendees. In cases where a talk might otherwise be tossed for being too similar to (and not better than) another, targeting a different audience level could potentially save the day.

Extreme?

Talk slots are relatively short. Your idea for a talk is awesome, but way too long. What if you could give that talk to an audience that doesn’t need the whole 15-minute introductory part of the talk? What if, when your time started ticking down, you immediately jumped into the meat of the topic? That’s what Extreme talks are for.

I’d recommend checking the ‘Extreme’ box on the submission form only if your talk *could potentially* be an Extreme talk. Why? Two reasons:

  1. The number of Extreme slots is limited, and
  2. If your talk is not accepted into an ‘Extreme’ slot, it may still be accepted as a regular talk.

Duration

There are 30-minute or 45-minute slots, or you can choose ‘No Preference’. I recommend modeling your proposal around the notion that it could be in either time slot: your ability to be flexible helps the Program Committee to be flexible as well. If your talk competes with another in the process and the only difference of any use that anyone can find is that your talk has a hard, 45-minute slot requirement, you probably have a good chance of losing that battle.

If you’d like to have a 45-minute slot, then it might help you out to build your outline for a 30-minute talk first, and then go back and add bullet points to it that are clearly marked “(If 45min slot)” or something. Alternatively, you can create the outline based on a 45-minute slot, and just use the ‘Additional Notes’ section of the form to explain how you’d alter the talk if the committee requested you do the talk in 30 minutes.

Description

This is the description that, if your talk is accepted, people will be reading in the conference program. It needs to:

  1. Be compelling
  2. Make a promise
  3. Be 400 characters or less

Being compelling can seem very difficult, depending on your topic space. It might help to consider that you only need to be compelling to your target audience. So, while a talk called “Writing Unit Tests” is probably not compelling to the already-testing contingent at the conference, it might be totally compelling for those who aren’t but want to. Meanwhile, a talk called “Setting Up a local Automated TDD Environment in Python 3 With Zero External Dependencies” is probably pretty compelling to the already-testing crowd and not so compelling to those who aren’t yet writing tests.

Making a promise to the reader means that you’re setting an expectation in their mind that you’ll make good on some deliverable at some point in the talk. Some key phrases to use in your description to call out this promise might be “By the end of this talk, you’ll have…”, or “If you want to have a totally solid grasp of…”. The key that both of those phrases have in common is that they both imply that you’re about to tell them what they can expect to get out of the talk. It answers a question in every conference-goer’s mind when reading talk descriptions, which is “What’s in it for me?”. If you don’t answer that question in the description, it may be harder for people to guess what’s in it for them, and frankly they won’t spend a lot of time trying!

Abstract

The form expects a detailed description of the talk, along with an outline describing the flow of the talk. That said, it’s not expected that the talk you outline in August is precisely the same talk you deliver the following March. However, if your talk is accepted, the outline will be made public online (it will not be printed in the conference program), so you’d like to hit the outline as close as possible.

The abstract section will be used by the program committee to answer various questions about the talk, possibly including (but certainly not limited to):

  • Whether the talk’s title and description actually describe the talk as detailed in the abstract. Will attendees get pretty much what they expect if they only read the title and description of the talk?
  • Whether the talk appears to target the correct audience level. If you’re targeting a novice audience, your abstract should not go into topics that are beyond that audience level.
  • Whether the scope of the talk is realistic given the time constraints. If you asked for a 30-minute slot, your abstract should not make the committee think that it would be impossible to cover all of the material even given a whole hour. It’s not uncommon to be a little off in this regard, but being really far off could be an indicator that the proposer may not have thought this through very well.
  • Whether the talk is organized and has a logical flow that incorporates any known essential, obvious topics that should be touched on.

Additional Notes

This is a free-form text field where you can pretty much talk directly to the Program Committee to let them know in your own way why you think your talk is awesome, how you envision it coming off, and how you see the audience benefitting from it and finding value in attending the talk.

Although there are no hard requirements for this section of the submission form, you should absolutely, positively include any of the following that you can:

  • If your talk is about a new software project, a link to the project’s homepage, repository, and anything other relevant articles, interviews, testimonials, etc., about the project.
  • Links to any online slides or videos from any presentations you’ve given previously.
  • Comments discussing how you’d handle moving from a 45-minute to 30-minute slot, or from an Extreme slot to a regular slot, etc. In general, it helps the committee to know you’ve thought about contingencies in your proposal.

Great, so… How do I do this?

If you’ve never written a proposal before, and you’re not sure what you want to talk about, don’t have a crystal clear vision for a talk, have trouble narrowing the scope of your idea, and don’t know exactly where to start, I have a few ideas that might help you get the proposal creation juices flowing:

  • Write down some bullet points in a plain text file that are titles or one-line summaries of talks you’d like to see. Forget about whether you’re even willing or able to actually produce these talks – the idea is to start moving things from your brain onto a page. When you’ve got 5-10 of these points, reflect:
    • Could you deliver any of these yourself?
    • Could you apply an idea contained in a point to a topic you’re more familiar with?
    • Is there a topic related to any of the points that touch on things you know well?
    • Do any of these points jog your memory and make you think of projects you’ve worked on in the past that might be a source for a talk idea?
  • Do an informal audit of what you’ve done over the past year.
    • Were there problems you faced that there’s no good solution for?
    • Did you grow in some way that was really important, and could you help others to learn those lessons, and learn why those lessons are important?
    • Did you make use of a new technology?
    • Did you change how you do your job? Your development workflow? Your project lifecycle? Automation? Task management?
  • Go through the talks on pyvideo.org. It’s such an enormous, and enormously valuable trove. You could just scan the titles and see if something comes up. If that doesn’t work, click on a few, but don’t watch the talk: skip to the Q&A at the end. Buried in the Q&A are always these gems that are only tangentially related, and it is not uncommon to hear a speaker respond with “…but that’s another whole talk…”. They’re sometimes right.

Make it happen!

The Python User Group in Princeton (PUG-IP): 6 months in

In May, 2011, I started putting out feelers on Twitter and elsewhere to see if there might be some interest in having a Python user group that was not in Philadelphia or New York City. A single tweet resulted in 5 positive responses, which I took as a success, given the time-sensitivity of Twitter, my “reach” on Twitter (which I assume is far smaller than what might be the entire target audience for that tweet), etc.

Happy with the responses I received, I still wanted to take a baby step in getting the group started. Rather than set up a web site that I’d then have to maintain, a mailing list server, etc., I went to the cloud. I started a group on meetup.com, and started looking for places to hold our first meeting.

Meetup.com

Meetup.com, I’m convinced, gives you an enormous value if you’re looking to start a user group Right Now, Today™. For $12/mo., you get a place where you can announce future meetups, hold discussions, collect RSVPs so you have a head count for food or space or whatever, and vendors can also easily jump in to provide sponsorship or ‘perks’ in the form of discounts on services to user group members and the like. It’s a lot for a little, and it’s worked well enough. If we had to stick with it for another year, I’d have no real issue with that.

Google Groups

I set up a mailing list using Google Groups about 2-3 months ago now. I only waited so long because I thought meetup.com’s discussion forum might work for a while. After a few meetings, though, I noticed that there were always about five more people in attendance than had RSVP’d on meetup.com. Some people just aren’t going to be bothered with having yet another account on yet another web site I guess. If that’s the case, then I have two choices (maybe more, but these jumped to mind): force the issue by constantly trumpeting meetup.com’s service, or go where everyone already was. Most people have a Google account, and understand its services. Also, since the group is made up of technical people, they mostly like the passive nature of a mailing list as opposed to web forums.

If you’re setting up a group, I’d say that setting up a group on meetup.com and simultaneously setting up a Google group mailing list is the way to go if you want to get a fairly complete set of services for very little money and about an hour’s worth of time.

Meeting Space

Meeting space can come from a lot of different places, but I had a bit of trouble settling on a place at first. Princeton University is an awesome place and has a ton of fantastic places to meet with people, but if you’re not living on campus (almost no students are group members, btw), parking can be a bit troublesome, and Princeton University is famous for having little or no signage, and that includes building names, so finding where to go even if you did find parking can be problematic. So, so far, the University is out.

The only sponsor I had that was willing to provide space was my employer, but we’re nowhere near Princeton, and don’t really have the space. Getting a sponsor for space can be a bit difficult when your group doesn’t exist yet, in part because none of them have engaged with you or your group until the first meeting, when the attendees, who all work for potential sponsors, show up.

I started looking at the web site for the Princeton Public Library. I’ve been involved in the local Linux user group for several years, and they use free meeting space made available by the public library in Lawrenceville, which borders Princeton. I wondered if the Princeton Public Library did this as well, but they don’t, actually. In fact, meeting space at that location can get pretty expensive, since they charge for the space and A/V equipment like projectors and stuff separately (or they did when I started the group – I believe it’s still the case).

I believe I tweeted my disappointment about the cost of meeting at the Princeton Public Library, and did a callout on Twitter for space sponsors and other ideas about meeting space in or near Princeton. The Princeton Public Library got in touch through their @PrincetonPL Twitter account, and we were able to work out a really awesome deal where they became a sponsor, and agreed to host our group for 6 months, free of charge. Awesome!

Now, six months in, we either had to come to some other agreement with the library, or move on to a new space. After six months, it’s way easier to find space, or sponsors who might provide space, but I felt if we could find some way to continue the relationship with the library, it’d be best not to relocate the group. We wound up finding a deal that does good things for the group, the library, the local Python user community, and the evangelism of the Python language….

Knowledge for Space

Our group got a few volunteers together to commit to providing a 5-week training course to the public, held at the Princeton Public Library. Adding public offerings like this adds value to the library, attracts potential new members (they’re a member-supported library, not a state/municipality-funded one), etc. In exchange for providing this service to the library, the library provides us with free meeting space, including the A/V equipment.

If you don’t happen to have a public library that offers courses, seminars, etc., to the general public, you might be able to cut a similar deal with a local community college, or even high school. If you know of a corporation locally that uses Python or some other technology the group can speak or train people in, you might be able to trade training for meeting space in their offices. Training is a valued perk to the employees of most corporations.

How To Get Talks (or “How we stopped caring about getting talks”)

Whether you’re running a publishing outfit, a training event, or user group, getting people to deliver content is a challenge. Some people don’t think they have any business talking to what they perceive as a roomful of geniuses about anything. Some just aren’t comfortable talking in front of audiences, but are otherwise convinced of their own genius. Our group is trying to attack this issue in various ways, and so far it seems to be working well enough, though more ideas are welcome!

Basically, the group isn’t necessarily locked into traditions like “Thou shalt provide a speaker, who shalt bequeath upon our many wisdom of the ages”. Once you’ve decided as a group that having cookie-cutter meetings isn’t necessary, you start to think of all sorts of things you could all be doing together.

Below are some ideas, some in the works, some in planning, that I hope help other would-be group starters to get the ball rolling, and keep it in motion!

Projects For the Group, By the Group

Some members of PUG-IP are working together on building the pugip.org website, which is housed in a GitHub repository under the ‘pugip’ GitHub organization. This one project will inevitably result in all kinds of home-grown presentations & events within the group. As new ideas come up and new features are implemented, people will give lightning talks about their implementation, or we’ll do a group peer review of the code, or we’ll have speakers give talks about third-party technologies we might use (so, we might have two speakers each give a 30-minute talk about two different NoSQL solutions, for example. We’ve already had a great overview of about 10 different Python micro-frameworks), etc.

We may also decide to break up into pairs, and then sprint together on a set of features, or a particularly large feature, or something like that.

As of now, we’ve made enough decisions as a group to get the ball rolling. If there’s any interest I can blog about the setup that allows the group to easily share, review, and test code, provide live demos of their work, etc. The tl;dr version is we use GitHub and free heroku accounts, but new ideas come into play all the time. Just today I was wondering if we could, as a group, make use of the cloud9 IDE (http://cloud9ide.com).

The website is a great idea, but other group projects are likely to come up.

Community Outreach

PUG-IPs first official community outreach project will be the training we provide through the Princeton Public library. A few of us will collaborate on delivering the training, but the rest of the group will be involved in providing feedback on various aspects of the material, etc., so it’s a ‘whole group’ project, really. On top of increasing interactivity among the group members, outreach is also a great way to grow and diversify the group, and perhaps gain sponsorships as well!

There’s another area group called LUG-IP (a Linux user group) that also does some community outreach through a hardware SIG (special interest group), certification training sessions, and participating in local computing events and conferences. I’d like to see PUG-IP do this, too, maybe in collaboration with the LUG (they’re a good and passionate group of technologists).

Community outreach can also mean teaming up with various other technology groups, and one event I’m really looking forward to is a RedSnake meeting to be held next February. A RedSnake meeting is a combined meeting between PhillyPUG (the Philadelphia Python User Group) and Philly.rb (the Philadelphia Ruby Group). As a member of PhillyPUG I participated in last year’s RedSnake meeting, and it was a fantastic success. Probably 70+ people in attendance (here’s a pic at the end – some had already left by the time someone snapped this), and perhaps 10 or so lightning talks given by members of both organizations. We tried to do a ‘matching’ talk agenda at the meeting, so if someone on the Ruby side did a testing talk, we followed that with a Python testing talk, etc. It was a ton of fun, and the audience was amazing.

Socials

Socials don’t have to be dedicated events, per se. For example, PUG-IP has a sort of mini-social after every single meetup. We’re lucky to have our meetings located about a block away from a brewpub, so after each meeting, perhaps half of us make it over for a couple of beers and some great conversations. After a few of these socials, I started noticing that more talk proposals started to spring up.

Of course, socials can also be dedicated events. Maybe some day PUG-IP will…. I dunno… go bowling? Or maybe we’ll go as a group to see the next big geeky movie that comes out. Maybe we’ll have some kind of all-inclusive, bring-the-kids BBQ next summer. Who knows?

As a sort of sideshow event to the main LUG meetings, LUG-IP has a regularly-scheduled ‘coffee klatch’. Some of the members meet up one Sunday per month at (if memory serves) 8-11AM at a local Panera for coffee, pastries, and geekery. It’s completely informal, but it’s a good time.

Why Not Having Talks Will Help You Get Talks

I have a theory that is perhaps half-proven through my experiences with technology user groups: increasing engagement among and between the members of the group in a way that doesn’t shine a huge floodlight on a single individual (like a talk would) eventually breaks down whatever fears or resistance there is to proposing and giving a talk. Sometimes it’s just a comfort level thing, and working on projects, or having a beer, or sprinting on code, etc. — together — turns a “talking in front of strangers” experience into more of a “sharing with my buddies” one.

I hope that’s true, anyway. It seems to be. :)

Thanks For Reading

I hope someone finds this useful. It’s early on in the life of PUG-IP, but I thought it would be valuable to get these ideas out into the ether early and often before they slip from my brain. Good luck with your groups!

The Happy Idiot

Today is November 1st, and there’s an event that takes place every November called National Novel Writing Month (NaNoWriMo). I don’t believe I have the ability to really write a novel, and have no reason to think anyone would read it if I did. But I would like to make an attempt to write a blog post every day this month, and this month’s post is about The Happy Idiot. Hope you enjoy it and leave comments.

Who is the happy idiot? It’s the person in class who shrugs off fears of looking dopy and raises their hand. It’s the person who, in an architecture meeting, isn’t afraid to be wrong in asserting that a new bottleneck is quickly emerging in the design. It’s the person who gives presentations on topics they’re really only 75% comfortable with, and announces as much to the audience, inviting corrections and more input. It’s the person who invites feedback and asks questions that seem trivial, even if it exposes their ignorance.

We need more happy idiots.

I wholeheartedly accept this role, even though there are circumstances where it might be easier to keep my mouth shut and keep up appearances or it might seem beneficial to not put a dent in some perceived reputation or something like that. The problem I have with doing that is that appearances are, in my experience, largely bullshit. Reputation, in my experience, comes from doing, not from merely being perceived as smart, or good, or whatever. Execute. The rest comes from that.

Furthermore, once you enter the realm of keeping up appearances, you wind up in this horrible vicious cycle where eventually you just always have to clam up to seem smart about everything. Purposefully keeping quiet when you have no idea what’s going on – indeed, *because* you have no idea what’s going on is a close relative to lying, and has the same consequences. Eventually you’ll be cornered to execute and you’ll have no idea what to do. The fear that this will happen will eventually take over your waking hours, causing stress, and it’s all downhill from there.

On the other hand, being the happy idiot means filling in the cracks in your knowledge. It means you’re conscious of your own ignorance. It means you’ll be able to execute more effectively. This starts a positive cycle: you learn more, you execute more effectively, you begin to be perceived as smart, good, whatever, and it’s not completely unwarranted, because you’ve actually asked questions that took guts to ask and as a result you executed in smart ways. Eventually, your dumb questions aren’t perceived as being dumb anymore. Eventually, when you ask a seemingly trivial question, people stop reflexively thinking ‘how does he not know that’ and start thinking about what your brain is about to do with that little tidbit of data.

Further, it means people will trust you more. Think about it. Would you rather give a critical project to someone who absolutely never asks questions and “seems smart”, or the person who asks intelligent questions and executes?

So, I say be the happy idiot. Put yourself out there. If you’re perceived as being dumb for taking steps to be less dumb, then the problem isn’t yours, and you shouldn’t make it yours.

 

Thoughts on Python and Python Cookbook Recipes to Whet Your Appetite

Dave Beazley and myself are, at this point, waist deep into producing Python Cookbook 3rd Edition. We haven’t really taken the approach of going chapter by chapter, in order. Rather, we’ve hopped around to tackle chapters one or the other finds interesting or in-line with what either of us happens to be working with a lot currently.

For me, it’s testing (chapter 8, for those following along with the 2nd edition), and for Dave, well, I secretly think Dave touches every aspect of Python at least every two weeks whether he needs to or not. He’s just diabolical that way. He’s working on processes and threads at the moment, though (chapter 9 as luck would have it).

In both chapters (also a complete coincidence), we’ve decided to toss every scrap of content and start from scratch.

Why on Earth Would You Do That?

Consider this: when the last edition (2nd ed) of the Python Cookbook was released, it went up to Python 2.4. Here’s a woefully incomplete list of the superamazing awesomeness that didn’t even exist when the 2nd Edition was released:

  • Modules:
    • ElementTree
    • ctypes
    • sqlite3
    • functools
    • cProfile
    • spwd
    • uuid
    • hashlib
    • wsgiref
    • json
    • multiprocessing
    • fractions
    • plistlib
    • argparse
    • importlib
    • sysconfig
  • Other Stuff
    • The ‘with’ statement and context managers*
    • The ‘any’ and ‘all’ built-in functions
    • collections.defaultdict
    • advanced string formatting (the ‘format()’ method)
    • class decorators
    • collections.OrderedDict
    • collections.Counter
    • collections.namedtuple()
    • the ability to send data *into* a generator (yield as an expression)
    • heapq.merge()
    • itertools.combinations
    • itertools.permutations
    • operator.methodcaller()

* If memory serves, the ‘with’ statement was available in 2.4 via future import.

Again, woefully incomplete, and that’s only the stuff that’s in the 2.x version! I don’t even mention 3.x-only things like concurrent.futures. From this list alone, though, you can probably discern that the way we think about solving problems in Python, and what our code looks like these days, is fundamentally altered forever in comparison to the 2.4 days.

To give a little more perspective: Python core development moved from CVS to Subversion well after the 2nd edition of the book hit the shelves. They’re now on Mercurial. We skipped the entire Subversion era of Python development.

The addition of any() and all() to the language by themselves made at least 3-4 recipes in chapter 1 (strings) one-liners. I had to throw at least one recipe away because people just don’t need three recipes on how to use any() and all(). The idea that you have a chapter covering processes and threads without a multiprocessing module is just weird to think about these days. The with statement, context managers, class decorators, and enhanced generators have fundamentally changed how we think about certain operations.

Also something to consider: I haven’t mentioned a single third-party module! Mock, tox, and nosetests all support Python 3. At least Mock and tox didn’t exist in the old days (I don’t know about nose off-hand). Virtualenv and pip didn’t exist (both also support Python 3). So, not only has our code changed, but how we code, test, deploy, and generally do our jobs with Python has also changed.

Event-based frameworks aside from Twisted are not covered in the 2nd edition if they existed at all, and Twisted does not support Python 3.

WSGI, and all it brought with it, did not exist to my knowledge in the 2.4 days.

We need a Mindset List for Python programmers!

So, What’s Your Point

My point is that I suspect some people have been put off of submitting Python 3 recipes, because they don’t program in Python 3, and if you’re one of them, you need to know that there’s a lot of ground to cover between the 2nd and 3rd editions of the book. If you have a recipe that happens to be written in Python 2.6 using features of the language that didn’t exist in Python 2.4, submit it. You don’t even have to port it to Python 3 if you don’t want to or don’t have the time or aren’t interested or whatever.

Are You Desperate for Recipes or Something?

Well, not really. I mean, if you all want to wait around while Dave and I crank out recipe after recipe, the book will still kick ass, but it’ll take longer, and the book’s world view will be pretty much limited to how Dave and I see things. I think everyone loses if that’s the case. Having been an editor of a couple of different technical publications, I can say that my two favorite things about tech magazines are A) The timeliness of the articles (if Python Magazine were still around, we would’ve covered tox by now), and B) The broad perspective it offers by harvesting the wisdom and experiences of a vast sea of craftspeople.

What Other Areas Are In Need?

Network programming and system administration. For whatever reason, the 2nd edition’s view of system administration is stuff like checking your Windows sound system and spawning an editor from a script. I guess you can argue that these are tasks for a sysadmin, but it’s just not the meat of what sysadmins do for a living. I’ll admit to being frustrated by this because I spent some time searching around for Python 3-compatible modules for SNMP and LDAP and came up dry, but there’s still all of that sar data sitting around that nobody ever seems to use and is amazing, and is easy to parse with Python. There are also terminal logging scripts that would be good.

Web programming and fat client GUIs also need some love. The GUI recipes that don’t use tkinter mostly use wxPython, which isn’t Python 3-compatible. Web programming is CGI in the 2nd edition, along with RSS feed aggregation, Nevow, etc. I’d love to see someone write a stdlib-only recipe for posting an image to a web server, and then maybe more than one recipe on how to easily implement a server that accepts them.

Obviously, any recipes that solve a problem that others are likely to have that use any of the aforementioned modules & stuff that didn’t exist in the last edition would really rock.

How Do I Submit?

  1. Post the code and an explanation of the problem it solves somewhere on the internet, or send it (or a link to it) via email to PythonCookbook@oreilly.com or to @bkjones on Twitter.
  2. That’s it.

We’ll take care of the rest. “The rest” is basically us pinging O’Reilly, who will contact you to sign something that says it’s cool if we use your code in the book. You’ll be listed in the credits for that recipe, following the same pattern as previous editions. If it goes in relatively untouched, you’ll be the only name in the credits (also following the pattern of previous editions).

What Makes a Good Recipe?

A perfect recipe that is almost sure to make it into the cookbook would ideally meet most of the criteria set out in my earlier blog post on that very topic. Keep in mind that the ability to illustrate a language feature in code takes precedence over the eloquence of any surrounding prose.

What If…

I sort of doubt this will come up, but if we’ve already covered whatever is in your recipe, we’ll weigh that out based on the merits of the recipes. I want to say we’ll give new authors an edge in the decision, but for an authoritative work, a meritocracy seems the only valid methodology.

If you think you’re not a good writer, then write the code, and a 2-line description of the problem it solves, and a 2-line description of how it works. We’ll flesh out the text if need be.

If you just can’t think of a good recipe, grep your code tree(s) for just the import statements, and look for ideas by answering questions on Stackoverflow or the various mailing lists.

If you think whatever you’re doing with the language isn’t very cool, then stop thinking that a cookbook is about being cool. It’s about being practical, and showing programmers possibly less senior than yourself an approach to a problem that isn’t completely insane or covered in warts, even if the problem is relatively simple.

Slides, an App, a Meetup, and More On the Way

I’ve been busy. Seriously. Here’s a short dump of what I’ve been up to with links and stuff. Hopefully it’ll do until I can get back to my regular blogging routine.

PICC ’11 Slides Posted

I gave a Python talk at PICC ’11. If you were there, then you have a suboptimal version of the slides, both because I caught a few bugs, and also because they’re in a flattened, lifeless PDF file, which sort of mangles anything even slightly fancy. I’m not sure how much value you’ll get out of these because my presentation slides tend to present code that I then explain, and you won’t have the explanation, but people are asking, so here they are in all their glory. Enjoy!

I Made a Webapp Designed To Fail

No really, I did. WebStatusCodes is the product of necessity. I’m writing a Python module that provides an easy way for people to talk to a web API. I test my code, and for some of the tests I want to make sure my code reacts properly to certain HTTP errors (or in some cases, to *any* HTTP status code that’s not 200). In unit tests this isn’t hard, but when you’re starting to test the network layers and beyond, you need something on the network to provide the errors. That’s what WebStatusCodes does. It’s also a simple-but-handy reference for HTTP status codes, though it is incomplete (418 I’m a teapot is not supported). Still, worth checking out.

Interesting to note, this is my first AppEngine application, and I believe it took me 20 minutes to download the SDK, get something working, and get it deployed. It was like one of those ‘build a blog in -15 minutes’ moments. Empowering the speed at which you can create things on AppEngine, though I’d be slow to consider it for anything much more complex.

Systems and Devops People, Hack With Me!

I like systems-land, and a while back I was stuck writing some reporting code, which I really don’t like, so I started a side project to see just how much cool stuff I could do using the /proc filesystem and nothing but pure Python. I didn’t get too far because the reporting project ended and I jumped back into all kinds of other goodness, but there’s a github project called pyproc that’s just a single file with a few functions in it right now, and I’d like to see it grow, so fork it and send me pull requests. If you know Linux systems pretty well but are relatively new to Python, I’ll lend you a hand where I can, though time will be a little limited until the book is done (see further down).

The other projects I’m working on are sort of in pursuit of larger fish in the Devops waters, too, so be sure to check out the other projects I mention later in this post, and follow me on github.

Python Meetup Group in Princeton NJ

I started a Meetup group for Pythonistas that probably work in NYC or PA, but live in NJ. I work in PA, and before this group existed, the closest group was in Philly, an hour from home. I put my feelers out on Twitter, found some interest, put up a quick Meetup site, and we had 13 people at the first meetup (more than had RSVP’d). It’s a great group of folks, but more is always better, so check it out if you’re in the area. We hold meetings at the beautiful Princeton Public Library (who found us on twitter and now sponsors the group!), which is just a block or so from Triumph, the local microbrewery. I’m hoping to have a post-meeting impromptu happy hour there at some point.

Python Cookbook Progress

The Python Cookbook continues its march toward production. Lots of work has been done, lots of lessons have been learned, lots of teeth have been gnashed. The book is gonna rock, though. I had the great pleasure of porting all of the existing recipes that are likely to be kept over to Python 3. Great fun. It’s really amazing to see just how it happens that a 20-line recipe is completely obviated by the addition of a single, simple language feature. It’s happened in almost every chapter I’ve looked at so far.

If you have a recipe, or stumble upon a good example of some language feature, module, or other useful tidbit, whether it runs in Python 3 or not, let me know (see ‘Contact Me’). The book is 100% Python 3, but I’ve gotten fairly adept at porting things over by now :) Send me your links, your code, or whatever. If we use the recipe, the author will be credited in the book, of course.

PyRabbit is Coming

In the next few days I’ll be releasing a Python module on github that will let you easily work with RabbitMQ servers using that product’s HTTP management API. It’s not nearly complete, which is why I’m releasing it. It does some cool stuff already, but I need another helper or two to add new features and help do some research into how RabbitMQ broker configuration affects JSON responses from the API. Follow me on github if you want to be the first to know when I get it released. You probably also want to follow myYearbook on github since that’s where I work, and I might release it through the myYearbook github organization (where we also release lots of other cool open source stuff).

Python Asynchronous AMQP Consumer Module

I’m also about 1/3 of the way through a project that lets you write AMQP consumers using the same basic model as you’d write a Tornado application: write your handler, import the server, link the two (like, one line of code), and call consume(). In fact, it uses the Tornado IOLoop, as well as Pika, which is an asynchronous AMQP module in Python (maintained by none other than my boss and myYearbook CTO,  @crad), which also happens to support the Tornado IOLoop directly.

Nose and Coverage.py Reporting in Hudson

I like Hudson. Sure, it’s written in Java, but let’s be honest, it kinda rocks. If you’re a Java developer, it’s admittedly worlds better because it integrates with seemingly every Java development tool out there, but we can do some cool things in Python too, and I thought I’d share a really simple setup to get coverage.py’s HTML reports and nose’s xUnit-style reports into your Hudson interface.

I’m going to assume that you know what these tools are and have them installed. I’m working with a local install of Hudson for this demo, but it’s worth noting that I’ve come to find a local install of Hudson pretty useful, and it doesn’t really eat up too much CPU (so far). More on that in another post. Let’s get moving.

Process Overview

As mentioned, this process is really pretty easy. I’m only documenting it because I haven’t seen it documented before, and someone else might find it handy. So here it is in a nutshell:

  • Install the HTML Publisher plugin
  • Create or alter a configuration for a “free-style software project”
  • Add a Build Step using the ‘Execute Shell’ option, and enter a ‘nosetests’ command, using its built-in support for xUnit-style test reports and coverage.py
  • Check the ‘Publish HTML Report’, and enter the information required to make Hudson find the coverage.py HTML report.
  • Build, and enjoy.

Install The HTMLReport Plugin

From the dashboard, click ‘Manage Hudson’, and then on ‘Manage Plugins’. Click on the ‘Available’ tab to see the plugins available for installation. It’s a huge list, so I generally just hit ‘/’ in Firefox or cmd-F in Chrome and search for ‘HTML Publisher Plugin’. Check the box, go to the bottom, and click ‘Install’. Hudson will let you know when it’s done installing, at which time you need to restart Hudson.

Install tab

HTML Publisher Plugin: Check!

Configure a ‘free-style software project’

If you have an existing project already, click on it and then click the ‘Configure’ link in the left column. Otherwise, click on ‘New Job’, and choose ‘Build a free-style software project’ from the list of options. Give the job a name, and click ‘OK’.

Build a free-style software project.

You have to give the job a name to enable the 'ok' button :)

Add a Build Step

In the configuration screen for the job, which you should now be looking at, scroll down and click the button that says ‘Add build step’, and choose ‘Execute shell’ from the resulting menu.

Add Build Step

Execute shell. Mmmmm... shells.

This results in a ‘Command’ textarea appearing, which is where you type the shell command to run. In that box, type this:

/usr/local/bin/nosetests --with-xunit --with-coverage --cover-package demo --cover-html -w tests

Of course, replace ‘demo’ with the name of the package you want covered in your coverage tests to avoid the mess of having coverage.py try to seek out every module used in your entire application.

We’re telling Nose to generate an xUnit-style report, which by default will be put in the current directory in a file called ‘nosetests.xml’. We’re also asking for coverage analysis using coverage.py, and requesting an HTML report of the analysis. By default, this is placed in the current directory in ‘cover/index.html’.

execute shell area

Now we need to set up our reports by telling Hudson we want them, and where to find them.

Enable JUnit Reports

In the ‘Post-Build Actions’ area at the bottom of the page, check ‘Publish JUnit test result report’, and make it look like this:

The ‘**’ is part of the Ant Glob Syntax, and stands for the current working directory. Remember that we said earlier nose will publish, by default, to a file called ‘nosetests.xml’ in the current working directory.

The current working directory is going to be the Hudson ‘workspace’ for that job, linked to in the ‘workspace root’ link you see in the above image. It should mostly be a checkout of your source code. Most everything happens relative to the workspace, which is why in my nosetest command you’ll notice I pass ‘-w tests’ to tell nose to look in the ‘tests’ subdirectory of the current working directory.

You could stop right here if you don’t track coverage, just note that these reports don’t get particularly exciting until you’ve run a number of builds.

Enable Coverage Reports

Just under the JUnit reporting checkbox should be the Publish HTML Reports checkbox. The ordering of things can differ depending on the plugins you have installed, but it should at least still be in the Post-build Actions section of the page.

Check the box, and a form will appear. Make it look like this:

By default, coverage.py will create a directory called ‘cover’ and put its files in there (one for each covered package, and an index). It puts them in the directory you pass to nose with the ‘-w’ flag. If you don’t use a ‘-w’ flag… I dunno — I’d guess it puts it in the directory from where you run nose, in which case the above would become ‘**/cover’ or just ‘cover’ if this option doesn’t use Ant Glob Syntax.

Go Check It Out!

Now that you have everything put together, click on ‘Save’, and run some builds!

On the main page for your job, after you’ve run a build, you should see a ‘Coverage.py Report’ link and a ‘Latest Test Result’ link. After multiple builds, you should see a test result ‘Trend’ chart on the job’s main page as well.

job page

Almost everything on the page is clickable. The trend graph isn’t too enlightening until multiple builds have run, but I find the coverage.py reports a nice way to see at-a-glance what chunks of code need work. It’s way nicer than reading the line numbers output on the command line (though I sometimes use those too).

How ’bout you?

If you’ve found other nice tricks in working with Hudson, share! I’ve been using Hudson for a while now, but that doesn’t mean I’m doing anything super cool with it — it just means I know enough to suspect I could be doing way cooler stuff with it that I haven’t gotten around to playing with. :)

Python Packaging, Distribution, and Deployment: Volume 1

This is just Volume 1. I’ll cover as much as I can and just stop when it gets so long most people will stop reading :)

I’ve been getting to know the Python packaging and distribution landscape way better than I ever wanted to over the last couple of weeks. After 2 or 3 weeks now, I’m saddened to report that I still find it quite painful, and not a little bit murky. At least it gets clearer and not murkier as I go on.

I’m a Senior Operations Developer at myYearbook.com, which produces a good bit of Python code (and that ‘good bit’ is getting bigger by the day). We also open source lots of stuff (also growing, and not all Python). I’m researching all of this so that when we develop new internal modules, they’re easy to test and deploy on hundreds of machines, and when we decide to open source that module, it’s not painful for us to do that, and we can distribute the packages in a way that is intuitive for other users without jumping through hoops because “we do it different”. One process and pipeline to rule them all, as it were.

There are various components involved in building an internal packaging and distribution standard for Python code. Continuous integration, automated testing, and automated deployment (maybe someday “continuous deployment”) are additional considerations. This is a more difficult problem than I feel it should be, but it’s a pretty interesting one, and I’ll chronicle the adventure here as I have time. Again, this is just Volume 1.

Part 1: Packaging, Distribution, and Deployment

Let’s define the terms. By ‘packaging’ I mean assembling some kind of singular file object containing a Python project, including its source code, data files, and anything it needs in order to be installed. To be clear, this would include something like a setup.py file perhaps, and it would not include external dependencies like a NoSQL server or something.

By ‘distribution’, I mean ‘how the heck do you get this beast pushed out to a couple hundred machines or more?’

Those two components are necessary but not sufficient to constitute a ‘deployment’, which I think encompasses the last two terms, but also accounts for things like upgrades, rollbacks, performing start/stop/restarts, running unit tests after it gets to its final destination but before it kills off the version that is currently running happily, and other things that I think make for a reliable, robust application environment.

With definitions out of the way, let’s dive into the fun stuff.

Part 2: Interdependencies

Some knowledge of packaging in Python is helpful when you go to discuss distribution and deployment. The same goes for the other two components. When you start out looking into the various technologies involved, at some point you’re going to look down and notice that pieces of your brain appear to have dropped right out of your ears. Then you’ll reach to pull out your hair only to realize that your brain hasn’t fallen out your ears: it has, in fact, exploded.

If you’re not careful, you’ll find yourself thinking things like ‘if I don’t have a packaging format, how can I know my distribution/deployment method? If I don’t have a deployment method, how do I know what package format to use?’ It’s true that you can run into trouble if you don’t consider the interplay between these components, but it’s also true that the Python landscape isn’t really all that treacherous compared to other jungles I’ve had to survive in.

I believe the key is to just take baby steps. Start simple. Keep the big picture in mind, but decide early to not let the perfect be the enemy of the good. When I started looking into this, I wanted an all-singing all-dancing, fully-automated end-to-end, “developer desktop to production” deployment program that worked at least as well as those hydraulic vacuum thingies at the local bank drive-up windows. I’ll get there, too, but taking baby steps probably means winding up with a better system in the end, in part because it takes some time and experience to even identify the moving parts that need greasing.

So, be aware that it’s possible to get yourself in trouble by racing ahead with, say, eggs as a package format if you plan to use pip to do installation, or if you want to use Fabric for distribution of tarballs but are in a Windows shop with no SSH servers or tarball extractors.

Part 3: Package Formats

  • tar.gz (tarballs)
  • zip
  • egg
  • rpm/deb/ebuild/<os-specific format here>
  • None

If you choose not to decide, you still have made a choice…

Don’t forget that picking a package format also includes the option to not use a package format. I’ve worked on projects of some size that treated a centralized version control system as a central distribution point as well. They’d manually log into a server, do an svn checkout (back when svn was cool and stuff), test it, and if all was well, they’d flip a symlink to point at the newly checked out code and restart. Deployment was not particularly automated (though it could’ve been), but some aspects of the process were surprisingly good, namely:

  • They ran a surprisingly complete set of tests on every package, on the system it was to be deployed on, without interrupting the currently running service. As a result, they had a high level of confidence that all dependencies were met, the code acted predictably, and the code was ‘fit for purpose’ to the extent that you can know these things from running the available tests.
  • Backing out was Mind-Numbingly Easy™ – since moving from one version to the next consisted of changing where a symlink pointed to and restarting the service, backing out meant doing the exact same thing in reverse: point the symlink back at the old directory, and restart.

I would not call that setup “bad”, given the solutions I’ve seen. It just wasn’t automated at all to speak of. It beats to hell what I call the “Pull & Pray” deployment scenario, in which your running service is a VCS checkout, and you manually log in, cd to that directory, do a ‘pull’ or ‘update’ or whatever the command does an in-place update of the code in your VCS, and then just restarting the service. That method is used in an astonishingly large number of projects I’ve worked on in the past. Zero automation, zero testing, and any confidence you might find in a solution like that is, dare I say, hubris.

Python Eggs

I don’t really want to entertain using eggs and the reasoning involves understanding some recent history in the Python packaging and distribution landscape. I’ll try to be brief. If you decide to look deeper, here’s a great post to use as a starting point in your travels.

distutils is built into Python. You create a setup.py file, you run ‘python setup.py install’, and distutils takes over and does what setup.py tells it to. That is all.

Setuptools was a response to features a lot of people wanted in distutils. It’s built atop distutils as a collection of extensions that add these features. Included with setuptools is the easy_install command, which will automatically install egg files.

Setuptools hasn’t been regularly and actively maintained in a year or more, and that got old really fast with developers and other downstream parties, so some folks forked it and created ‘distribute‘, which is setuptools with all of the lingering commits applied that the setuptools maintainer never applied. They also have big plans for distribute going forward. One is to do away with easy_install in favor of pip.

pip, at time of writing, cannot install egg files.

So, in a nutshell, I’m not going to lock myself to setuptools by using eggs, and I’m not going down the road of manually dealing with eggs, and pip doesn’t yet ‘do’ eggs, so in my mind, the whole idea of eggs being a widely-used and widely-supported format is in question, and I’m just not going there.

If I’m way out in left field on any of that, please do let me know.

Tarballs

The Old Faithful of package formats is the venerable tarball. A 30-year-old file format compressed using a 20-year-old compression tool. It still works.

Tarball distribution is dead simple: you create your project, put a setup.py file inside, create a tarball of the project, and put it up on a web server somewhere. You can point easy_install or pip at the URL to the tarball, and either tool will happily grab it, unpack it, and run ‘python setup.py install’ on it. In addition, users can also easily inspect the contents of the file without pip or easy_install using standard tools, and wide availability of those tools also makes it easy to unpack and install manually if pip or easy_install aren’t available.

Python has a tar module as well, so if you wanted to bypass every other available tool you could easily use Python itself to script a solution that packages your project and uploads it with no external Python module dependencies.

Zip Files

I won’t go into great detail here because I haven’t used zip files on any regular basis in years, but I believe that just about everything I said about tarballs is true for zip files. Python has a zipfile module, zip utilities are widely available, etc. It works.

Distro/OS-specific Package Formats

I’ve put this on the ‘maybe someday’ list. It’d be great if, in addition to specifying Python dependencies, your package installer could also be aware of system-level dependencies that have nothing to do with Python, except that your app requires them. :)

So, if your application is a web app but requires a local memcache instance, RPM can manage that dependency.

I’m not a fan of actually building these things though, and I don’t know many people who are. Sysadmins spend their entire early careerhood praying they’ll never have to debug an RPM spec file, and if they don’t, they should.

That said, the integration of the operations team into the deployment process is, I think, a good thing, and leveraging the tools they already use to manage packages to also manage your application is a win for everyone involved. Sysadmins feel more comfortable with the application because they’re far more comfortable with the tools involved in installing it, and developers are happy because instead of opening five tickets to get all of the system dependencies right, they can hand over the RPM or deb and have the tool deal with those dependencies, or at least have the tool tell the sysadmin “can’t install, you need x, y, and z”, etc.

Even if I went this route someday, I can easily foresee keeping tarballs around, at least to keep developers from having to deal with package managers if they don’t want/need to, or can’t. In the meantime, missing system-level dependencies can be caught when the tests are run on the machine being deployed to.

Let Me Know Your Thoughts

So that’s it for Volume 1. Let me know your thoughts and experiences with different packaging formats, distribution, deployment, whatever. I expect that, as usual in the Python community, the comments on the blog post will be as good as or better than the post itself. :)