What “Batteries Included” Means

When I first got into Python, I read lots of blog posts that mentioned that Python was “the batteries included language”, but those same posts were short on any explanation of what that really meant. A few years and lots of projects later, I think I’m now qualified to at least give a beginner a basic understanding of what people mean when they say that.

What it means

Python has what’s called the “Standard Library”, which is a collection of modules to make some set of tasks within a particular problem domain simpler on you. The standard library modules are all part of the standard Python installation — you don’t have to add it to your installation. Here is a short list of things I’ve used Python for, and the standard library modules I used to get the work done:

  • Wrote a simple filesystem backup routine that works on my Linux and Solaris servers, as well as my Mac laptop. For this I used the os, stat, bz2, gzip, time, datetime, and tar modules. Oh – and optparse!
  • The first iteration of the loghetti Apache log file analysis tool used one external module, which I’m replacing now with optparse. The other modules used are urlparse, cgi, datetime, operator, re, sys, and mmap.
  • I’ve written a couple of simple web API clients using nothing more than urllib/urllib2, ElementTree, and various bits of the xml package (xml.minidom comes to mind).
  • I wrote a MySQL backup script using the sys, os, time, shutil, glob, tarfile, and optparse modules.

There are built-in modules for XML/HTML parsing, url parsing, network communications, threading and multiprocessing, image and audio manipulation, and lots of other tasks you’re likely to come across. The story of Python cannot be told from the standard library alone, but you can do an awful lot of work with what’s provided.

Why you care

You care because you want to write the code that solves the problem at hand without worrying about a whole lot of low-level details like socket communications and memory management. I mean, it’s nice to know that Python exposes those low-level details to you should you ever get a wild hair, but if you frequently had occasion to really need that, you’d probably code in C.

Perhaps ironically, you also care about the batteries included because of what it means for those batteries that *aren’t* included. Chances are the Python-driven applications and external modules you wind up using make very heavy use of the standard library modules, making the code for those add-ons simpler to read and understand. It has the potential to make them more reliable and consistent as well.

I’d rather see a threaded application using Python’s threading module rather than trying to code the threading implementation by hand. I’d rather see a Python web server using some descendant of Python’s socket module than coding socket operations by hand. Having things like socket and threading in the standard library means that tools across various problem domains that happen to require certain common functionality work in a standard way where that functionality is concerned.

The same holds true for your own code. If you need to write a Jabber messaging server in Python, and six months later you need to write a queue-based networked job dispatching server, you’re going to be using similar modules, and you might even be able to reuse some of your old Jabber server code. Reuse for the win!

Remember LEGO? I was obsessed with LEGO. I remember becoming so intimately familiar with every type of block, window, platform, person, light post, wheel, and car base that envisioning my own custom-made Grand Galaxy Cruiser was easy, and actually building it (from pieces of probably 10 different LEGO sets) was only slightly more difficult. The standard library, in essence, is your LEGO set. Really, instead of “batteries included” Python’s mantra could well be “There’s a block for that” :)

It’s not all beer and skittles

Even in LegoLand there are pieces that don’t fit together the way you’d like. I wanted to build a house once with car windshields in place of windows. Turns out it’s not so simple.

Python 3 improves things a lot in terms of how the standard library is organized, but if you have an external module your code depends on, it might not be ready for Python 3, which means (like me and many others), you’re stuck in Python 2-land for a while. It’s not a big deal really, but I do find that often times I need more than one standard module to do what should really be handled in one.

One example of this is the urllib and urllib2 modules which have an overlapping set of features and problems they address. As a result, you’ll often see these two modules used together. In my own code I’ve had to use both of these modules in addition to the cgi module, *and* the urlparse module. In Python 3, I’d only need urllib. Yay!

XML is another place where I’ve needed multiple modules, and again there has been some consolidation in Python 3.

In the end, this doesn’t get in the way of getting work done very much. It’s just a pattern you start to notice as you work through more of the standard library and use it for your own projects.

Missing Batteries

Python will sometimes surprise you with what’s included. There’s a json module, for example, and the sqlite3 module, both of which are nice to have. But while there’s a whole section of the library devoted to protocols, neither LDAP nor SNMP are represented, and I need both of them :(

It might also be nice to have more native file format support. YAML, for one, would be useful.

I also wrote a while back about how nice it would be to have a bash-like ‘select’ feature in Python. There’s already a cmd module which is a pretty good tool for creating interactive command line programs, but without a select-like feature, it’s a little limited.

At the end of the day…

No language provides every single tool that every single developer could possibly need, nor do they implement the tools they do have in the perfect way for all developers who will ever come along. I’ve written lots of code in Perl, PHP, and enough in Java, C++, and C to know that Python does a fantastic job at making my life easier as a developer, and I’m really encouraged by what I’m seeing in Python 3.

Intro to Python 101 For Beginners

If you code Python already, go somewhere else. I’m only talking to complete and total newbies to the language right now. I want to show them the stuff that I wished someone had put in one nice, neat blog post for easy consumption when I got started with the language. If that’s what you’re looking for, look no further. Here’s what you need to know.

Lay of the Land

Python seems like a bit of a strange place at first. Most new programming languages do. This is because most modern interpreted languages like Ruby and Python like to “eat their own dog food”, so Ruby’s web site uses a Ruby-based CMS, and the documentation is generated with Ruby-based tools (I believe it’s all RDoc). Python’s site is written in Python, but isn’t a specific framework. The documentation is presented using Sphinx, a documentation engine written in Python.

Python, at time of writing, has a 2.x distribution and a 3.x distribution available for download. There’s a chance I’ll be flamed for saying so, but download 2.x. Most of the blog posts you’re likely to turn to for help over the course of your first year with Python are still going to be 2.x-specific, and if your code involves external modules, a good number of them haven’t ported to 3.x yet. Hang tight – 3.x is awesome, but 2.x pretty much rocks too.

Documentation for Python is pretty good, and the first bit of documentation you need (after this post) is the Python Tutorial. It’ll get you rolling with the basic syntax of the language, data types, equality operators, conditionals, the works.

With the basics of the language under your belt, you need two other bits of documentation:

You’ll want to know about the modules that are included with the Python distribution before you go out seeking external modules. You can see a complete list of built-in modules at the Module Index. It’ll give you a good idea of what you can do with the language right out of the gate without any additional downloads.

You’ll also want to bookmark the Standard Library documentation. This is the meaty stuff, and is where I spend most of my doc-reading hours. If you know the module you want docs on, just type it on the end of the base url and you’re there. So if you want to know about the ‘threading’ module, go to http://docs.python.org/library/threading

One tip about the documentation on python.org: don’t use the search box they provide. The search functionality is slow, and after all that time, it’s almost certainly *not* going to give you what you were looking for. Go to Google and search there. If you know that what you’re looking for is on docs.python.org, then append “site:docs.python.org” to your Google search term. Works like a charm.

How to do ‘x’ with Python

The first project I wanted to use Python for involved LDAP, and there’s no LDAP-related module built into Python. Finding the right module to use was my first challenge. There used to be a resource known as the ‘Cheese Shop”, but it’s now been rolled into the Python Package Index, a.k.a “PyPI”. These packages are not endorsed by python.org or anything, PyPI is just an interface to help you figure out what the available options are. The search box on this page *is* useful, but the issue then becomes which module to choose — even for a relatively obscure requirement like “LDAP” there are lots of modules.

The obvious choice for LDAP is python-ldap, and it seems to be the canonical choice for those needing this support. Also, a Google search for “python and ldap” returns several articles about, and the home page for, the python-ldap module. So, in less than five minutes you can usually figure it out, but don’t download the module yet!

You can (it’s completely optional but worth it) install “setuptools”, which gives you a tool called “easy_install” that’ll run out to PyPI and get whatever module you want, and install it on your system. I install most modules this way and don’t remember ever having any major issues with it. That said, some of the cool kids lately have taken a liking to an easy_install replacement called pip. I admit it looks very nice, but I guess until easy_install bites me I’m not going to be super motivated to switch. Pip takes some additional steps to make it more reliable and cause fewer issues in the face of spotty wireless connectivity and things like that. It also tries to make output easier to understand, it supports package uninstalls, and several other niceties. It seems to be the future.

Community Support

Python coders are a fairly outspoken, though friendly lot, in my experience. If you ask a reasonable question, you’ll get a reasonable answer, often in the form of a link to where you need to be documentation-wise, but not with the nasty RTFM aftertaste other communities leave behind. For added comfort, attach a “link to docs welcome!” to your request for help. Yummy!

There are lots of Python mailing lists, the most helpful one for beginners (and beyond) is probably the tutor list. I learn something new every day lurking on that list. The answers you get there tend to be authoritative, from folks who are working on the Python language itself, so it’s not just another ‘net forum where the blind often times are leading the blind.

Two great sources for ideas on how to put your code together or help venturing into new territory with Python are Stack Overflow and the Python Cookbook. Python Cookbook is a great source for ideas, and Stack Overflow is a really good Q&A interface where you can get good advice to hard problems (or really easy ones).

There are IRC channels for Python, and I’m a heavy user of IRC in general, but I don’t venture into the Python IRC channels much. With all of the other resources available, I don’t typically miss it.

Ok, wtf does this traceback mean?

Tracebacks can look daunting if you’re new to the language (unless you come from a Java background). They’re not all that tough. Here’s some output from a Python interpreter interactive session (which you’ll learn about if you read the tutorial linked above):

>>> d
{'China': 'Shanghai', 'USA': 'NY'}
>>> d['USA']
'NY'
>>> d['foo']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'foo'
>>>

This is a short traceback, but you parse them in the same basic manner as long ones 90% of the time. The first thing I look at is the last line of the traceback, which tells me the kind of error that was encountered. In this case it was a KeyError. I’ve seen error at least once for every hair left on my head, but if you don’t know what it means, check out the handy dandy page that describes all of Python’s built-in exceptions! Exceptions are just Python code themselves, and you can create your own for your own use in your own code, by extending one of the existing exceptions, or the base Exception class.

After about a month of coding in Python regularly, your eyes will start to jump to the right places in a Python traceback, and you’ll find that the exception names actually are pretty intuitive, and since you can employ them yourself or extend them, you’ll probably get to know them quickly without using a reference.

Really? Whitespace?

You’ve probably heard about Python’s treatment of whitespace before. Its reputation for placing significance on whitespace in source code precedes the language, and I guess it turns some people off. I actually didn’t care, because I figured as long as I indented my code the way I normally did, the whitespace issue wouldn’t bother me much, and I was right.

If you code in Perl, C, or Java, they don’t require indentation, but you almost certainly indent anyway because the whitespace is significant to *you* even if it’s not significant in the language. If you write readable code, nicely indented, you’ll probably forget about Python’s whitespace requirements after your first hour coding in it.

The rules aren’t all that stringent either. This code works fine:

def foo(c):
                     return c**2

if __name__ == "__main__":
         print foo(2)

There’s no consistency in this code: my function indents like 16 spaces or something, and then my if statement at the bottom doesn’t. All Python wants you to do is be consistent: if line 1 of your function is indented 4 spaces, indent the rest of them 4 spaces. Now you’re probably thinking “Well who doesn’t do that?” Exactly. It’s not draconian, it’s practical. You could think of Python’s whitespace requirement as filtering out people you’d rather not code beside anyway :)

Books

I feel like I own most of the Python books in existence through 2008. I bought the printed library reference in (I think) 2000. The best ones?

The one that really got me going with Python was Dive Into Python, which is available free and readable online. I bought the printed book just to support the author’s work. Mark Pilgrim writes using a style that makes you feel like you’re sitting next to him at a conference in a really lame talk and he’s whispering all kinds of programming goodness at you and showing you programming porn on his laptop screen. He’s excited to be talking to you about Python in that book. Truly inspired.

Perhaps my favorite “huge tome”-style book that covers pretty much everything from beginning to advanced material is Core Python by Wesley Chun. He covers more topics more deeply than most other books. I half expected this to be very broad and not very deep coverage, but I still pick that book up quite often when I want to understand some feature of the language better. His explanations are, for some reason, more clear and concise than most. Maybe they just fit my brain better.

After that I went straight into the more specialized Python books, but those are mostly dated now (Python Network Programming, for one, could use an update. I’d buy it again, probably), though I did purchase Programming Python, and Python Cookbook as a sort of quick reference for doing things I’d never done before. I don’t use them much, to be honest.

The only other book I really got excited about was Tarek Ziade’s Expert Python Programming. It’s not a beginner’s book, but it has a lot of really great material in it for those building larger applications with Python. Highly recommended.

Must Read

If you’ve worked in another language, it probably goes without saying, but you’ll learn much more quickly if you can read some source code, and/or work on a project with more senior coders than yourself. Reading code and tweaking it is probably the fastest way to learn about any language.

Move Forward

I hope this puts you in the right direction with Python. Bookmark all of those links, follow the tutorial, and before you know it you’ll be master of all you survey.

Enjoy!

If You Don’t Date Your Work, It Sucks.

I probably get more upset than is reasonable when I come across articles with no date on them. I scroll furiously for a few minutes, try to see if the date was put in some stupid place like the fine print written in almost-white-on-white at the bottom of the post surrounded by ads. Then I skim the article looking for references to software versions that might clue me in on how old this material is. Then I check the sidebars to see if there’s some kind of “About this Post” block. Finally, I make a mental note of the domain in a little mental list I use to further filter my Google searches in the future. Then I close the browser window in disgust. If it weren’t completely gross and socially unacceptable to do so, I would spit on the floor every time this happened.

Why would you NOT date your articles? In almost every single theme for every single content management solution written in any language and backed by any database, “Date” is a default element. Why would you remove it? It is almost guaranteed to be more work to remove it. Why would you go through actual work to make your own writing less useful to others?

What happens when you don’t date your articles?

  1. People have no idea whether your article has anything to do with what they’re working on.  If you wrote an article about the Linux kernel in 1996, it’s of no use to me *now*, even if it was pretty hardcore at the time.
  2. Readers are forced to skim your article looking for references to software versions to see if your article is actually meaningful to them or not. Why make it hard for people to know whether your article is useful? The only reason I can think of is that you already know your articles are old, so not dating them insures that people at least skim enough to see some of the ads on your site. You are irreversibly lame if you do this.
  3. It causes near seizures in people like me who really hate when you don’t date your work, as well as all of your past teachers, who no doubt demanded that you sign and date your work.
  4. Every time you don’t date an article online, a seal pup is clubbed to death in the arctic, and a polar bear gets stranded on a piece of ice.

At some point, I will make an actual list of web sites that regularly do not date their work. A sort of hall of shame for sites that fail to link their writing to some kind of time-based context. If you have sites you’d like to add, let me know in the comments.

Head first into javascript (and jQuery)

So, I had to take a break from doing the Code Katas just as I was getting to the really cool one about Bloom Filters. The reason for the unexpected break from kata-ing was that I had a project thrown into my lap. I say “project” not because it was some huge thing that needed doing — lots of people reading this could probably have done it in a matter of a few hours — but because it involved two things I’ve never done any real work with: javascript, and jQuery.

My task? Well, first I had to recreate a page from a graphic designer’s mockup. So, take a JPEG image and create the CSS and stuff to make that happen. Already I’m out of my comfort zone, because historically I’m a back-end developer more comfortable with threading than CSS (95% of my code is written in Python and is daemonized/threaded/multiprocess/all-of-the-above or worse), but at least I’ve done enough CSS to get by.

Once the CSS was done, I was informed that I’d now need to take the tabular reporting table I just created and make it sort on any of the columns, get the data via AJAX calls to a flat file that would store the JSON for the dummy data, create nice drop-down date pickers so date ranges in the report could be chosen by the end user, page the data using a flickr-style pager so only 20 lines would show up on a page, and alternate the row colors in the table (and make sure that doesn’t break when you sort!).

How to learn javascript and/or jQuery REALLY fast

How exactly do you learn enough javascript and jQuery to get this done in a matter of a few days (counting from *after* the CSS part was done)? Here are some links you should keep handy if you have a situation like this arise:

  • If Douglas Crockford says it, listen. I’d advise you start here (part I of a 4-part intro to javascript). That site also has his ‘Advanced Javascript’ series. He also wrote a book, which is small enough to read quickly, and well done.
  • Packt has a lot of decent resources for jQuery. Specifically, this article helped me organize what I was doing in my head. The code itself had some rather glaring issues — you’re not going to cut-n-paste this code and deploy it to production, but coming from scorched earth, I really learned a lot.
  • After the project was already over I found this nice writeup that covers quick code snippets and demos illustrating some niceties like sliding panels and disappearing table rows and how to do them with jQuery.
  • jQuery itself has some pretty decent documentation for those times when your cut-n-pasted code looks a little suspect or you’re just sure there’s a better way. Easy to read and concise.

Why I Wrote My Own Sorting/Paging in jQuery

Inevitably, someone out there is wondering why I didn’t just use tablesorter and tablesorter.pager, or Flexigrid, or something like that. The answer, in a nutshell, is paging. Sorting and paging operations, I learned both by experience and in my reading, *NEED* to know about each other. If they don’t, you’ll get lots of weird results, like sorting on just one page (or, sorting on just one page until you click to another page, which will look as expected, and then click back), or pages with rows on them that are just plain wrong, or… the list goes on. This is precisely the problem that the integrated “all-sorting-all-paging” tools like tablesorter try to solve. The issue is that I could not find a SINGLE ONE that did not have a narrow definition of what a pager was, and what it looked like.

I wanted (well, I was required to mimic the mockup, so “needed”) a flickr-style pager — modified. I needed to have each page of the report represented at the bottom of the report table by a block with the proper number in the block. The block would be clickable, and clicking it would show the corresponding page of data. This is more or less what Flickr does, but I didn’t need the “previous” and “next” buttons, and I didn’t need the “…” they use (rather effectively) to cut down on the number of required pager elements. So… just some blocks with page numbers. That’s it.

I started out using tablesorter for jQuery, and it worked great — it does the sorting for you, manages the alternating row colors, and is a pretty flexible sorter. Then I got to the paging part, and things went South pretty fast. While tablesorter.pager has a ‘moveToPage’ function, it’s not exposed so you can bind it to a CSS selector like the ‘moveToPrevious’, ‘moveToLast’, ‘moveToNext’ and other functions are. So, I tried to hack it into the pager code myself. I got weird results (though I feel more confident about approaching that now than I did even three days ago). There wasn’t any obvious way to do anything but give the user *only* first/last/previous/next buttons to control the paging. I moved on. I googled, I asked on jQuery IRC, I even wrote the developer of tablesorter. I got nothing.

I looked at 4 or 5 different tools, and was shocked to find the same thing! I didn’t go digging into the code of all of them, but their documentation all seemed to be in some kind of weird denial about the existence of flickr-style paging altogether!

So, I wrote my own. It wasn’t all that difficult, really. The code that worked was only slightly different from the code I’d fought with early on in the process. It just took some reading to get some of the basic tricks of the trade under my belt, and I got a tip or two from one of the gurus at work as well, and I was off to the races!

Lessons Learned

So, one thing I have to say for my boss is that he knows better than to throw *all* of those things at me at once. Had he come to me and said he wanted an uber-ajaxian reporting interface from outer space from the get-go, I might not have responded even as positively as I did (and I would rate my response as ‘tepid, but attempting a positive outlook’) . It’s best to draw me in slowly, a task at a time, so I can get some sense of accomplishment and some feedback along the way instead of feeling like I still have this mountain to climb before it’s over.

I certainly learned that this javascript and jQuery (and AJAX) stuff isn’t really black magic. Once you get your hands dirty with it it’s kinda fun. I still don’t ever want to become a front end developer on a full-time basis (browser testing is where I *really* have zero patience, either for myself or the browsers), but this experience will serve me well in making my own projects a little prettier and slicker, and nicer to use. It’ll also help me understand more about what the front end folks are dealing with, since there’s tons of javascript at myYearbook.

So, I hope this post keeps some back end scalability engineer’s face from turning white when they’re given a front-end AJAX project to do. If you’ve ever had a similar situation happen to you (not necessarily related to javascript, but other technologies you didn’t know until you were thrown into a project), let’s hear the war stories!!