What I learned about Python Today – eval()

I was writing some Python yesterday, and came across an issue that I thought was going to send me back to the drawing board.

I was using a module that, given an Apache access log, returns line objects with the fields of the line as attributes of the line object. It was certainly usable as-is, but I wanted more granular parsing of the fields, and if there were query string arguments (like “?f0o=bar&page=stories”, etc), I wanted those broken up for easy access later as well.

I created a simple ruleset builder so I could pass arguments to my script and have them become rules that would filter the log and return the interesting bits according to the ruleset. So now we have two objects: the line object that has attributes like line.ip, and a rule object that only has three attributes: the attribute of the line you want, and the value of that attribute you want to filter on – and there’s also a comparison operator attribute, but right now it only holds an “=”. It’s a work in progress :-)

So this means that you can do something like this:

getattr(line, rule.attr)

And if you passed “ip=192.168.1.2″ on the command line, then rule.attr will be “ip”, and the above will be parsed as “line.ip”, and you’ll get the expected result.

This fails for any attribute of “line” that isn’t a simple string – anything that has to be parsed as some kind of an expression. Like, say, references to keys and elements of dictionaries. I used cgi.parse_qs to parse my query string so I could access the different keys and values of the query string, and filter my logs using site-specific things like “zip=10016″ or something. Of couse, cgi.parse_qs returns a dictionary, which I called “urldata”. So, if you want to filter on “line.urldata[‘zip’][0]”, you should just find a way to assign that to rule.attr, right?

Wrong. getattr doesn’t work that way. It doesn’t look up the dictionary element, it just tags it onto the end of “line” and hopes for the best. It doesn’t evaluate expressions. If you wanted to get an element of a dictionary that is an attribute of “line” using getattr(), you’d have to do this:

getattr(line, rule.attr)['zip'][0]

Where “rule.attr” is just “urldata”.

Well that stinks for my purposes, because I don’t want a given type of argument passed in by the user to cause a special case in my code. I was thinking of alternative models to do all of this, but as usual, Doug had an answer right off the top of his head. His ability to do that sickens me at times. ;-P

The solution was to replace getattr(line, rule.attr) with eval(rule.attr, line.__dict__)

In this case, rule.attr = “urldata[‘zip’][0]’, but it’s not treated as “just a string”. In the case above, “line.__dict__” is a namespace used to search for and evaluate “urldata[‘zip’][0]”. The beauty of this is that as long as the value of rule.attr is defined in line.__dict__, rule.attr can be any argument of any type, and this one line of code will handle it.

That worked wonderfully.

  • sam

    Unsafe.

    What about a query like:

    ?sys.exit()=foo

    (with appropriate URL-encoding, which I can’t do off the top of my head). Even worse would be an attacker using this to open up sensitive files or information and send them down the pipe, by spinning off a rogue service.

    eval(), in Python, is a very powerful tool, much like a chainsaw.

  • http://www.protocolostomy.com m0j0

    there is no sys.exit in the provided namespace arg to eval(). Chainsaws can be effective and safe if you respect them and read the manual prior to use. 😉

  • inopua

    Then what about

    ?import sys;sys.exit();=whatever

    ?

  • http://www.protocolostomy.com m0j0

    I understand the religion of not using things like eval(), but please at least read the documentation about how that function works, and take the time to test your hypotheses before you dismiss out of hand my use of it.

    In addition, it’s not like I’m extolling the virtues of eval as a general safety net. It works in this instance, and I’m not opposed to entertaining other ideas about what would work. However, I’m not going to rewrite this code just because eval *can* be bad *IF USED INAPPROPRIATELY*. I’ll remind everyone that just because something has the potential to be used unwisely does not make any use of that thing unwise. You can apply that to a great many things, technical or otherwise. Like a chainsaw, for example.

    I can’t profess to have read the code that implements eval(). I can’t profess to know that it can’t be broken as I’ve used it. However, I’m *confident* that it’s safe the way I’ve used it, and furthermore, the context of this application is that it is run by a user against *their own logs*, so we should also consider (in *addition* to the rest) what the cost of failure is in this case.

  • http://farmdev.com/ Kumar McMillan

    90% of the time when I see people using eval(), I see a better way to go instead. Here’s why:

    1. eval is unsafe. no matter how hard you try, you will most likely leave some gaping security hole somewhere. Especially if you are parsing a web log! Try grepping for “woot” in your log. This is the first entry I see in my log:

    GET /path/to/mysite/woots.txt HTTP/1.1″ 404 366 “-” “Java/1.6.0_01

    the Internet is a crazy place.

    2. eval makes for unreadable code. If I were to see this written in someone else’s code: eval(rule.attr, line.__dict__) I would have to stop and figure out what the hell is going on. What are all possible values of rule.attr? even if those are in comments of the docstring I am going to be annoyed that I had to read lengthy documentation where some more readable code would suffice.

    How about this instead?

    >>> import cgi
    >>> import re
    >>> expr = re.compile(r’\s*(=|>|>> input_expr = expr.split(‘ip=192.168.1.2′)
    >>> input_expr
    [‘ip’, ‘=’, ‘192.168.1.2’]
    >>> attr, op, val = input_expr
    >>> line_parsed = cgi.parse_qs(‘ip=192.168.1.2&ip=192.168.1.240′)
    >>> line_parsed
    {‘ip': [‘192.168.1.2′, ‘192.168.1.240’]}
    >>> line_matched = False
    >>> for parsed_val in line_parsed.get(attr, []):
    … if op == ‘=':
    … if parsed_val == val:
    … line_matched = True
    … break
    … elif op == ‘>':
    … if parsed_val > val:
    … line_matched = True
    … break
    … elif op == ‘<‘:
    … if parsed_val >> line_matched
    True
    >>>

    much more readable, IMHO, and tailored better to what you are trying to do (that is, given this description of what you’re trying to do:))

  • http://farmdev.com/ Kumar McMillan

    # comment form swallowed my regex, replace _gt_ and _lt_ with the symbols:
    expr = re.compile(r’\s*(=|_gt_|_lt_)\s*’)

  • nis

    I am interested to know what the speed difference is using your 250MB file between eval and getattr for the simple cases that work with either. Just curious.