A simple nanny script in Python
I have a support issue with a provider of mine, but was able to reverse engineer the problem and put in a stop-gap measure to keep it from ruining my weekend. The issue is a misconfigured daemon supplied by the provider, and occasionally, this daemon just goes away. I don’t know much about the daemon, but the underlying system is standard CentOS, so what I really needed is a way to detect if the daemon failed, and then restart it if that’s the case. The script that does this exists in every shop I’ve ever worked in, and is traditionally called a “nanny script”.
There are actually some nice looking projects that deal with this issue and others, but I didn’t really have time to read all the docs (yet), and I wasn’t sure it wasn’t overkill — but it might be nice to have a daemon instead of a script running from cron.
Anyway, I was shocked that I was unable to find a simple nanny script out on the web – in *any* language. Maybe my google-fu is out of whack. So I went ahead and wrote one up *very* quickly using Python. If you need a script to run every minute or few out of cron and restart a misbehaving daemon if it’s not running, feel free to use my nanny script.
If you’ve got root on the box, you should look at /etc/inittab, and man inittab.
If you’re not root, you’ll need your own process to monitor another process.
I’ve not heard that called a ‘nanny script’. I’ve heard it called a ‘watchdog process’. For example, I found a perl watchdog script here:
http://snippets.dzone.com/posts/show/1737
I didn’t look too hard for python watchdog scripts, but I’m sure they’re out there.
I was shocked too. Here’s one that presented at PyCon this year:
http://supervisord.org/
There was a good presentation on supervisor at PyCon this year: http://supervisord.org/
I *might* consider running a small, non-resource-intensive, non-production daemon out of inittab, but generally I try to avoid running daemons out of inittab if I can help it, and generally it’s not hard to avoid. I know there are apps (even large commercial ones) that *recommend* this practice, but I think there needs to be one of those “…Considered Harmful” articles written about that. I don’t profess to know that it’s a bad idea *always*, but I’ve seen it cause as many problems as it solves. If you’ve done this a lot without issue, congratulations, but when you have issues, you’ll understand precisely what I’m talking about
Of course, the problem with any solution that seeks to restart failed daemons in an automated way at all is that it has a tendency to put off the debugging process indefinitely, which is bad. In my case, though, I’m just looking to keep something running until some support goon gets around to fixing the issue.
Give Monit (http://www.tildeslash.com/monit/) a whirl, I set it up a few days ago on my server, only took me 5 minutes to work out how to use it and it works great.
Eh I’m not sure if I’m doing something wrong but I don’t seem to see it on the download page… unless both Konq and Firefox are playing up on me ??
My bad. Try it again. I apparently used Drupal’s Project module incorrectly when trying to make that available. It’s fixed now.
Ah so it is
Cheers !