Lessons Learned Porting Dateutil to Python 3

The dateutil module is a very popular third-party (pure) Python module that makes it easier (and in some cases, possible) to perform more advanced manipulations on dates and date ranges than simply using some combination of Python’s ‘included batteries’ like the datetime, time and calendar modules.

Dateutil does fuzzy date matching, Easter calculations in the past and future, relative time delta calculations, time zone manipulation, and lots more, all in one nicely bundled package.

I decided to port dateutil to Python 3.

Why?

For those who haven’t been following along at home, David Beazley and I are working on the upcoming Python Cookbook 3rd Edition, which will contain only Python 3 recipes. Python 2 will probably only get any real treatment when we talk about porting code.

When I went back to the 2nd edition of the book to figure out what modules are used heavily that might not be compatible with Python 3, dateutil stuck out. It’s probably in half or more of the recipes in the ‘Time and Money’ chapter in the 2nd Edition. I decided to give it a look.

How Long Did it Take?

Less than one work day. Seriously. It was probably 4-5 hours in total, including looking at documentation and getting to know dateutil. I downloaded it, I ran 2to3 on it without letting 2to3 do the in-place edits, scanned the output for anything that looked ominous (there were a couple of things that looked a lot worse than they turned out to be), and once satisfied that it wasn’t going to do things that were dumb, I let ‘er rip: I ran 2to3 and told it to go ahead and change the files (2to3 makes backup copies of all edited files by default, by the way).

What Was the Hardest Part?

Well, there were a few unit tests that used the base64 module to decode some time zone file data into a StringIO object before passing the file-like object to the code under test (I believe the code under test was the relativedelta module). Inside there, the file-like StringIO object is subjected to a bunch of struct.unpack() calls, and there are a couple of plain strings that get routed elsewhere.

The issue with this is that there are NO methods inside the base64 module that return strings anymore, which makes creating the StringIO object more challenging. All base64 methods return Python bytes objects. So, I replaced the StringIO object with a BytesIO object, all of the struct.unpack() calls “just worked”, and the strings that were actually needed as strings in the code had a ‘.decode()’ appended to them to convert the bytes back to strings. All was well with the world.

What Made it Easier?

Two things, smaller one first:

First, Python built-in modules for date handling haven’t been flipped around much, and dateutil doesn’t have any dependencies outside the standard library (hm, maybe that’s 2 things right there). The namespaces for date manipulation modules are identical to Python 2, and I believe for the most part all of the methods act the same way. There might be some under-the-hood changes where things return memoryview objects or iterators instead of lists or something, but in this and other porting projects involving dates, that stuff has been pretty much a non-event most of the time

But the A #1 biggest thing that made this whole thing take less than a day instead of more than a week? Tests.

Dateutil landed on my hard drive with 478 tests (the main module has about 3600 lines of actual code, and the tests by themselves are roughly 4000 lines of code). As a result, I didn’t have to manually check all kinds of functionality or write my own tests. I was able to port the tests fairly easily with just a couple of glitches (like the aforementioned base64 issue). From there I felt confident that the tests were testing the code properly.

In the past couple of days since I completed the ‘project’, I ported some of the dateutil recipes from the 2nd edition of the book to Python 3, just for some extra assurance. I ported 5 recipes in under an hour. They all worked.

Had You Ported Stuff Before?

Well, to be honest most of my Python 3 experience (pre-book, that is) is with writing new code. To gain a broader exposure to Python 3, I’ve also done lots of little code golf-type labs, impromptu REPL-based testing at work for things I’m doing there, etc. I have ported a couple of other small projects, and I have had to solve a couple of issues, but it’s not like I’ve ever ported something the size of Django or ReportLab or something.

The Best Part?

I had never seen dateutil in my life.

I had read about it (I owned the Python Cookbook 2nd Edition since its initial release, after all), but I’d never been a user of the project.

The Lessons?

  1. This is totally doable. Stop listening to the fear-inducing rantings of naysayers. Don’t let them hold you back. The pink ponies are in front of you, not behind you.
  2. There are, in fact, parts of Python that remain almost unchanged in Python 3. I would imagine that even Django may find that there are swaths of code that “just works” in Python 3. I’ll be interested to see metrics about that (dear Django: keep metrics on your porting project!)
  3. Making a separation between text and data in the language is actually a good thing, and in the places where it bytes you (couldn’t resist, sorry), it will likely make sense if you have a fundamental understanding of why text and data aren’t the same thing. I predict that, in 2012, most will view complainers about this change the same way we view whitespace haters today.

“I Can’t Port Because…”

If you’re still skeptical, or you have questions, or you’re trying and having real problems, Dave and I would both love for *you* to come to our tutorial at PyCon. Or just come to PyCon so we can hack in the hallway on it. I’ve ported, or am in the process of porting, 3 modules to Python 3. Dave has single-handedly ported something like 3-5 modules to Python 3 in the past 6 weeks or so. He’s diabolical.

I’d love to help you out, and if it turns out I can’t, I’d love to learn more about the issue so we can shine a light on it for the rest of the community. Is it a simple matter of documentation? Is it a bug? Is it something more serious? Let’s figure it out and leverage the amazing pool of talent at PyCon to both learn about the issue and hopefully get to a solution.

  • http://wingware.com/ Stephan Deibel

    Two years ago (Wow! _Two_ years ago already!) I wrote up a little blog post on getting some code to run under both Python 2.x and 3.x:

    http://pythonology.blogspot.com/2009_02_01_archive.html

    Glad to hear it’s still pretty easy! ;-)

  • Robert

    Thank you for post this. The naysayers are getting tiring.

  • http://gedmin.as Marius Gedminas

    So, you have a codebase that works with Python 3.x. What now? Burn your bridges, abandon your existing Python 2.x users and upload the new 3.x-only version on PyPI? Most maintainers don’t want to do that.

  • m0j0

    @Marius

    Seriously?

    First, there’s no reason you can’t write code that works in both Python 2 and Python 3. I’ve done that as well, and to assert that one must burn bridges is a little overly dramatic, as well as incorrect.

    Second, there’s also no reason (depending on the size/complexity of the project) one can’t maintain a separate 2.x/3.x branches.

    Third, have you ever looked for projects using Python 3? They don’t seem to have issues supporting both!

    Fourth, some maintainers may also choose to put their 2.x versions into some ‘legacy’ mode whereby new features are added to the 3.x version, and the 2.x version gets only bugfixes. This is not particularly unusual in software development in general.

    Fifth, what are you putting forth as the alternative? Waiting until 3.6 comes out to port your 2.6 code? Forever be 5 versions behind? I’d much rather port now and take one of the above paths than contemplate what it would take to get from 2.6 all the way to, say, 3.4 or 3.6. I also wouldn’t want to deal with the nightmare of porting while trying to keep the currently active 2.x branch moving forward, nor would I want to see it languish. In short, being early seems easier and less fuss to manage and control than being behind the curve.

    Sixth, what “most maintainers” want will eventually become irrelevant if they want to continue to be maintainers of a project that remains relevant. There are already Python 3 modules available that do useful things. If the old 2.x modules don’t get a move on, they’ll eventually just be replaced by users wanting to move to 3.x in their projects.

  • Laurie Clark-Michalek

    I ported a library to python3 a while back. It wasn’t something I had ever looked at before, but I needed to use it in a project. And so I ran 2to3 on it, and fixed a little error with bytes -> strings from a socket, and it was all fixed. That took me an hour, 2 hours tops. However, there were _zero_ tests. I ended up writing tests for the module, which took quite a while. The simple fact of the matter is that you cannot be sure that you have ported correctly until you have run a comprehensive test suite over the module. You were very lucky m0j0, to find a module with such good tests :)

  • http://www.markus-gattol.name Markus Gattol

    Thanks for porting dateutil over to Python 3, very much appreciated! Here’s some article http://diveintopython3.org/case-study-porting-chardet-to-python-3.html in the area of 2to3 which some might also find quite interesting and maybe helpful as they port code to Python 3.

  • http://225tms.net Tomas Sedovic

    Are you planing to push it forward?

    It would be awesome if this got into the official release. And would you be willing to post the port somewhere? I could use dateutil on Py3k right now.

  • http://gedmin.as Marius Gedminas

    No, I do not advocate waiting until 3.6.

    Writing code that works in both 2.x and 3.x is, as you say, possible, and that would be my preferred solution. Another good choice would be using Distribute and having it run 2to3 on your codebase automatically from setup.py.

    My objection was to the simplified “just run 2to3 once, fix anything that’s broken, and you’re done” model.

  • m0j0

    @Tomas — the work has been sent back to the maintainer, Gustavo Niemeyer, who was very appreciative. He said the work would be made public soon.

  • http://225tms.net Tomas Sedovic

    @m0j0, that’s really good news. Cheers mate!