PyCon 2011 Predictions

PyCon 2011 is right around the corner. Are you going? You should. The talks are sick. You can still register – it’s too late to be an early bird, but registration is still open!

Well, I am, and I’m here to get the rumor mill started by sharing some predictions for this year’s PyCon.

Packaging

While last year there was some hubbub around packaging, most notably the poster images declaring that pip and distribute were ‘the new black’, this year that’s old news. The pip and distribute mantra will bear repeating for those who are still behind the times of course, but this year I think there will be more hand-wringing over packaging, probably focusing more on the data and distribution mechanism than the tools. I, for one, am looking forward to any light Tarek Ziade and other’s on the distutils2 team can shed on new metadata elements, new build procedures (I hear setup.py is going away someday), and what’s going on with distutils, distribute, pip, distutils2, PyPI, etc.

My prediction? Packaging, while not very exciting, will garner a lot of attention again this year.

Python 3

Dave Beazley and I will be doing a tutorial on Python 3, but besides that, I think there’s going to be a lot of hallway (and lounge) discussion about all things Python 3. This would mirror the clamor around the topic recently online, which I don’t really expect to slow down. Python 3 is, of course, discussed every year, but I think this year’s PyCon will be Python 3′s Woodstock: the time and place where minds merge onto an idea together and harmonize and all that stuff.

My prediction? We’ll see a good number of commitments from projects to port to Python 3, GSOC projects related to Python 3, and more funny looks at projects not getting their porting underway in 2011 as a direct result of PyCon 2011.

Scale

Multiprocessing and threading will be hot topics again for two reasons: First, lots of people just look at the GIL and reflexively get heartburn without doing any actual analysis to see if they even need to care about it in the context of a particular application. Second, the uptake for Python in ‘web scale’ environments continues to grow, so that’s bound to be reflected in the attendee demographics if you will, and therefore the perceived interest in the topic.

PyPy is going to loom large at this year’s conference. Some may decide after PyCon 2011 that PyPy is the future for them. Unladen Swallow will be declared dead, in spite of some eager undergrads making empty promises about picking it up. Most who care will opt for the project that appears to be gaining steam, not losing it.

Generator-based coroutines will continue to be a really cool but relatively unused oddity, and Eventlet and the like will continue to have their adherents and go with the flow.

Event-based/async frameworks will garner some attention. Tornado, gUnicorn, and others will do well. Twisted followers will say that all others are either redundant or fatally flawed.

My prediction? Tornado and ZeroMQ will make a big splash, this being the first year it’s being covered (I think). People will still advocate multiprocessing over threading after the conference, and Twisted will start to see declines in usership after PyCon 2011.

Potpourri

  • Python core development contributions will increase thanks to efforts by the PSF and core development communities, though not without some growing pains.
  • Michael Foord’s ‘mock‘ library will take off as a de facto standard module in the toolbelts of Python developers everywhere.
  • Fabric’s parallel branch will be merged as the default either before or as a result of activity at PyCon 2011.
  • Flask will announce a merger with another web framework in 2011, in part due to conversations and activity at PyCon.
  • PyCharm will become recognized as the preeminent Python IDE as a result of everyone getting to PyCon only to find that everyone else they were going to advocate it to is already using it.
  • Some outside the scientific community will attend a Python-Science talk anyway and come away with valuable techniques they can apply to distributed computing problems. Some outside the web scalability domain will attend related talks anyway and take something back to the enterprise computing space. Some outside the cloud space will attend a cloud talk and find a cool project to put in the cloud. As a result:
  • At least 30 new Python projects will be created as a direct result of PyCon 2011.

Python 3: Informal String Formatting Performance Comparison

If you haven’t heard the news, Dave Beazley and I have officially begun work on the next edition of the Python Cookbook, which will be completely overhauled using absolutely nothing but Python 3. Yay!

Right now, I’m going through some string formatting recipes from the 2nd edition to see if they still work, and if Python 3 offers any preferred alternatives to the solutions provided. As usual, it turns out that the answer to that is often ‘it depends’. For example, you might decide on a slower solution that’s more readable. Conversely, you might need to run an operation in a loop a million times and really need the speed.

New string formatting operations like the built-in format() function (separate from the str.format method) and the format mini-language are available in 2.6, and made nicer in 2.7. All of it is backported from the 3.x tree to my knowledge, and I’ll be using a Python 3.2b2 interpreter session for my examples.

I want to focus specifically on string alignment here, because there are very obviously multiple ways to solve alignment needs. Here’s an example solution from the 2nd edition:

>>> print '|' , 'hej'.ljust(20) , '|' , 'hej'.rjust(20) , '|' , 'hej'.center(20) , '|'
| hej             |             hej |       hej       |

Note that this is of course in Python 2.x syntax, but this works in Python 3.2 if you just make it a function call instead of a statement (so, just add parens and it works). The string methods used here are still in Python 3.2, with no notices of deprecation or preference for newer methods available now. That said, this looks messy to me, and so I wondered if I could make it more readable without losing performance, or at least without losing so much performance that it’s not worth any gains in the area of readability.

Single String Formatting

Here are three ways to get the same string alignment behavior in Python 3.2b2:

>>> '{:+<20s}'.format('hej')
'hej+++++++++++++++++'
>>> format('hej', '+<20s')
'hej+++++++++++++++++'
>>> 'hej'.ljust(20, '+')
'hej+++++++++++++++++'

Ok, so they all work the same. Now I’m going to wrap each one in a function and use the timeit module to help me get an idea what the difference is in terms of performance.

>>> def runit():
...     format('hej', '+<20s')
...
>>> def runit2():
...     'hej'.ljust(20, '+')
...
>>> def runit3():
...     '{:+<20s}'.format('hej')
...
>>> timeit(stmt=runit3, number=1000000)
0.6168370246887207
>>> timeit(stmt=runit3, number=1000000)
0.6109819412231445
>>> timeit(stmt=runit3, number=1000000)
0.6166291236877441
>>> timeit(stmt=runit2, number=1000000)
0.49651098251342773
>>> timeit(stmt=runit2, number=1000000)
0.4870288372039795
>>> timeit(stmt=runit2, number=1000000)
0.49135899543762207
>>> timeit(stmt=runit, number=1000000)
0.7751290798187256
>>> timeit(stmt=runit, number=1000000)
0.7771239280700684
>>> timeit(stmt=runit, number=1000000)
0.7805869579315186

Turns out using the old, tried and true str.* methods are fastest in this case, though I think in a more complex case like the recipe from the 2nd edition I’d opt for something more readable if I had the chance.

One String, Three Ways

Let’s look at a more complex case. Let’s take each of the methodologies used in runit, runit2, and runit3, and see how things pan out when we want to do something like the 2nd edition recipe. I’ll start with the bare interpreter operation to compare the output:


>>> '|' + format('hej', '+<20s') + '|' + format('hej', '+^20s') + '|' + format('hej', '+>20s') + '|'
'|hej+++++++++++++++++|++++++++hej+++++++++|+++++++++++++++++hej|'
>>> '|' + 'hej'.ljust(20, '+') + '|' + 'hej'.center(20, '+') + '|' + 'hej'.rjust(20, '+') + '|'
'|hej+++++++++++++++++|++++++++hej+++++++++|+++++++++++++++++hej|'
>>> '|{0:+<20s}|{0:+^20s}|{0:+>20s}|'.format('hej')
'|hej+++++++++++++++++|++++++++hej+++++++++|+++++++++++++++++hej|'

Unless you go through the rigamarole of creating a sequence and using ‘|’.join(myseq), the last method seems the most readable to me. I’d really just like to use the built-in print function with a “sep=’|’” argument, but that won’t cover the pipes at the beginning and end of the string unless I’ve missed something.

Here are the functions and timings:


>>> def threeways():
...     '|' + format('hej', '+<20s') + '|' + format('hej', '+^20s') + '|' + format('hej', '+>20s') + '|'
...

>>> def threeways2():
...     '|' + 'hej'.ljust(20, '+') + '|' + 'hej'.center(20, '+') + '|' + 'hej'.rjust(20, '+') + '|'
...

>>> def threeways3():
...     '|{0:+<20s}|{0:+^20s}|{0:+>20s}|'.format('hej')
...

>>> timeit(stmt=threeways, number=1000000)
2.4910600185394287
>>> timeit(stmt=threeways, number=1000000)
2.50291109085083
>>> timeit(stmt=threeways, number=1000000)
2.4913830757141113
>>> timeit(stmt=threeways2, number=1000000)
1.9027390480041504
>>> timeit(stmt=threeways2, number=1000000)
1.8975908756256104
>>> timeit(stmt=threeways2, number=1000000)
1.8957319259643555
>>> timeit(stmt=threeways3, number=1000000)
1.311446189880371
>>> timeit(stmt=threeways3, number=1000000)
1.3099820613861084
>>> timeit(stmt=threeways3, number=1000000)
1.3031558990478516

The threeways3 function has a bit of an advantage in not having to muck with concatenation at all, and this probably explains the difference. Changing threeways() to use a list and '|'.join() brought it from about 2.50 to about 2.30. Better. Changing threeways2() in the same way was also a small improvement from ~1.90 to ~1.77. No big wins there, and they’re not particularly readable in either case. For this one arguably trivial corner case, the new formatting mini-language wins in both performance and (IMO) readability.

Big Assumptions

This of course assumes I didn’t overlook something in creating the comparison functions, that there’s not yet a different way to do this that blows all of my work out of the water. If you see a completely different way to do this that’s both readable and performant, or I did something bone-headed, please let me know in the comments. :)