Loghetti Beta – An Apache Log Filter

I’m thinking about just making this an open source project hosted someplace like Google Code or something, because there are folks much smarter than myself who can probably do wonders with the code I’ve put together here. Loghetti takes an Apache combined format access log and a few options as arguments, throws your log lines through a strainer, and leaves you with the bits you actually *want* (kinda like spaghetti, but for logs) 😉

It’s written in Python, and the two dependencies it has are included in the tarball at the bottom. The dependencies are an altered version of Kevin Scott’s apachelogs.py file (I’ve added more granular log line parsing), and Doug Hellmann’s CommandLineApp, which really made creating a CLI application a breeze, since it handles things like autogenerating options, help output, etc automatically without me having to mess with optparse.

So right now, I use it for offline reporting on what’s in my log files, and it’s great for that. I can run, for example:

./loghetti.py –code=500 access.log

And get a listing of the log lines that have an http response code of 500. You can get fancier of course:

./loghetti.py –ip= –urldata=foo:bar –month=1 –day=31 –hour=16 access.log

And that’ll return lines where the client IP is, with the date specified using the date-related options. The “–urldata” option allows you to filter log lines on the query string part of the URL. So, in the above case, it’ll match if you have something like “&foo=bar” in the query string of the URL.

There are tons of features I’d like to support, but before I do, I feel compelled to address its performance on large log files. Once you throw this at a log file greater than about 50MB, it’s not a great real-time troubleshooting tool. I believe I’d be better off ripping some of the parsing out of apachelogs.py and making it conditional (for example, don’t bother parsing out all of that date information if the user hasn’t asked to filter on it).

Anyway, it’s still useful as it is, so let me know your thoughts on this, and if it’s something you have a use for or would like to help out with, I’ll set up a project for it. For now, you can Download Loghetti

  • http://kentsjohnson.com Kent Johnson

    Thanks, this looks like it will be handy. You could make the expensive properties lazy fairly easily, e.g. with this recipe:

    Perhaps you want to group the ‘laziness’ so asking for e.g. year will also populate month, day, etc.

    Other small optimizations in ApacheLogLine.__init__():
    – Only call self.request_line.split(‘ ‘) once, e.g.
    self.http_method, self.url, self.http_vers = self.request_line.split(‘ ‘)
    – Move self.rex to a class attribute, there is no need to compile and store this for every line!

    The output is IMO more readable if ApacheLogLine.__str__() uses ‘ ‘.join() rather than ‘,’.join().