Lessons Learned Porting Dateutil to Python 3

The dateutil module is a very popular third-party (pure) Python module that makes it easier (and in some cases, possible) to perform more advanced manipulations on dates and date ranges than simply using some combination of Python’s ‘included batteries’ like the datetime, time and calendar modules.

Dateutil does fuzzy date matching, Easter calculations in the past and future, relative time delta calculations, time zone manipulation, and lots more, all in one nicely bundled package.

I decided to port dateutil to Python 3.

Why?

For those who haven’t been following along at home, David Beazley and I are working on the upcoming Python Cookbook 3rd Edition, which will contain only Python 3 recipes. Python 2 will probably only get any real treatment when we talk about porting code.

When I went back to the 2nd edition of the book to figure out what modules are used heavily that might not be compatible with Python 3, dateutil stuck out. It’s probably in half or more of the recipes in the ‘Time and Money’ chapter in the 2nd Edition. I decided to give it a look.

How Long Did it Take?

Less than one work day. Seriously. It was probably 4-5 hours in total, including looking at documentation and getting to know dateutil. I downloaded it, I ran 2to3 on it without letting 2to3 do the in-place edits, scanned the output for anything that looked ominous (there were a couple of things that looked a lot worse than they turned out to be), and once satisfied that it wasn’t going to do things that were dumb, I let ‘er rip: I ran 2to3 and told it to go ahead and change the files (2to3 makes backup copies of all edited files by default, by the way).

What Was the Hardest Part?

Well, there were a few unit tests that used the base64 module to decode some time zone file data into a StringIO object before passing the file-like object to the code under test (I believe the code under test was the relativedelta module). Inside there, the file-like StringIO object is subjected to a bunch of struct.unpack() calls, and there are a couple of plain strings that get routed elsewhere.

The issue with this is that there are NO methods inside the base64 module that return strings anymore, which makes creating the StringIO object more challenging. All base64 methods return Python bytes objects. So, I replaced the StringIO object with a BytesIO object, all of the struct.unpack() calls “just worked”, and the strings that were actually needed as strings in the code had a ‘.decode()’ appended to them to convert the bytes back to strings. All was well with the world.

What Made it Easier?

Two things, smaller one first:

First, Python built-in modules for date handling haven’t been flipped around much, and dateutil doesn’t have any dependencies outside the standard library (hm, maybe that’s 2 things right there). The namespaces for date manipulation modules are identical to Python 2, and I believe for the most part all of the methods act the same way. There might be some under-the-hood changes where things return memoryview objects or iterators instead of lists or something, but in this and other porting projects involving dates, that stuff has been pretty much a non-event most of the time

But the A #1 biggest thing that made this whole thing take less than a day instead of more than a week? Tests.

Dateutil landed on my hard drive with 478 tests (the main module has about 3600 lines of actual code, and the tests by themselves are roughly 4000 lines of code). As a result, I didn’t have to manually check all kinds of functionality or write my own tests. I was able to port the tests fairly easily with just a couple of glitches (like the aforementioned base64 issue). From there I felt confident that the tests were testing the code properly.

In the past couple of days since I completed the ‘project’, I ported some of the dateutil recipes from the 2nd edition of the book to Python 3, just for some extra assurance. I ported 5 recipes in under an hour. They all worked.

Had You Ported Stuff Before?

Well, to be honest most of my Python 3 experience (pre-book, that is) is with writing new code. To gain a broader exposure to Python 3, I’ve also done lots of little code golf-type labs, impromptu REPL-based testing at work for things I’m doing there, etc. I have ported a couple of other small projects, and I have had to solve a couple of issues, but it’s not like I’ve ever ported something the size of Django or ReportLab or something.

The Best Part?

I had never seen dateutil in my life.

I had read about it (I owned the Python Cookbook 2nd Edition since its initial release, after all), but I’d never been a user of the project.

The Lessons?

  1. This is totally doable. Stop listening to the fear-inducing rantings of naysayers. Don’t let them hold you back. The pink ponies are in front of you, not behind you.
  2. There are, in fact, parts of Python that remain almost unchanged in Python 3. I would imagine that even Django may find that there are swaths of code that “just works” in Python 3. I’ll be interested to see metrics about that (dear Django: keep metrics on your porting project!)
  3. Making a separation between text and data in the language is actually a good thing, and in the places where it bytes you (couldn’t resist, sorry), it will likely make sense if you have a fundamental understanding of why text and data aren’t the same thing. I predict that, in 2012, most will view complainers about this change the same way we view whitespace haters today.

“I Can’t Port Because…”

If you’re still skeptical, or you have questions, or you’re trying and having real problems, Dave and I would both love for *you* to come to our tutorial at PyCon. Or just come to PyCon so we can hack in the hallway on it. I’ve ported, or am in the process of porting, 3 modules to Python 3. Dave has single-handedly ported something like 3-5 modules to Python 3 in the past 6 weeks or so. He’s diabolical.

I’d love to help you out, and if it turns out I can’t, I’d love to learn more about the issue so we can shine a light on it for the rest of the community. Is it a simple matter of documentation? Is it a bug? Is it something more serious? Let’s figure it out and leverage the amazing pool of talent at PyCon to both learn about the issue and hopefully get to a solution.