Regular Expressions with Python’s “re” Module

If you’re moving over from PHP, Perl, Ruby or something similar, don’t be intimidated by all the Python regular expression documentation. It doesn’t really have to be complicated or even all that much different from Perl (though it can be, if you want to go there).

Here’s a search and replace I ripped out of a Perl script for use in a Python script that replaces it. It insures that any MAC address fed to it has two digits in every field. So, for example, this would change “0:c:e:fe:d0:ae” to “00:0c:0e:fe:d0:ae”. This is good if you need to insert the value into a PostgreSQL column of type ‘macaddr’, or you just want to be consistent.

Perl: $macaddr =~ s/\b([0-9a-f])\b/0\1/ig

Python: macaddr = re.sub(r'(?i)\b([0-9a-f])\b', r'0\1', macaddr)

There are a few differences when moving to Python. First, there’s only one assignment operator in Python (to my knowledge – comment to correct me if I’m wrong) – so we’re calling a function instead of using “=~”. That’s fine with me. Less cryptic symbols are better.

Second, part of calling a function also means that the operation is explicit: we’re doing substitution using the “sub” method. There’s no “s/” like there is in Perl.

Third, there’s also no “/ig” in Python like at the end of the Perl example. The “i” means “ignore case”, and in Python, that indication (the “(?i)”) goes next to the pattern in question instead of at the end of the line. That’s easier for my brain to parse. I like to read what I’m doing in my native language (English), and if you think in that context, then reading regexes in Perl is kinda like reading in German, not English.

Finally, calling a function also means that the pattern and the thing you want to apply it to are separate arguments to the function instead of things that are delimited by more “/” characters. In fact, in Python, the only slashes of any kind appear only in the regular expression syntax. None of the actual language syntax contains a slash.

Though there are lots of differences in just this one very very simple example, I’ll also note that the actual regex syntax itself (the parts inside quotes for the Python example), are not different at all except for the addition in the Python example of the “ignore case” operator “(?i)”!

Technorati Tags: , , , , , ,

Social Bookmarks: