Using a robots.txt File With Django and Apache (on Webfaction)
I’ve developed in a few different environments, including multi-tier ones with middle tier Java app servers and stuff, but it always seemed pretty straightforward to serve something directly from disk. And in the case of PHP, everything is served from disk. There’s no middleware to speak of, so you can throw a robots.txt file in place and it “just works”. With Django, it’s slightly different because of two things:
- Django shouldn’t be serving static content (and therefore makes it a little inconvenient though not impossible to do so).
- Django works kinda like an application server that expects to receive URLs, and expects there to be some configuration in place telling it how to deal with that URL.
If you have Django serving static content, you’re wasting resources, so I’m not covering that here. My web host is webfaction, and they give you access to the configuration of your own Apache instance in addition to your Django installation’s configuration (in fact, I’m just running an svn checkout of django-trunk), so this gives you a lot of flexibility in how you deal with static files like CSS, images, or a robots.txt file. To handle robots.txt on my “staging” version of my site, I added the following lines to my apache httpd.conf file:
LoadModule alias_module modules/mod_alias.so <Location "/robots.txt"> SetHandler None </Location> alias /robots.txt /home/myusername/webapps/mywsgiapp/htdocs/robots.txt
If you don’t add mod_alias, you’ll get an error saying that the keyword “alias” is a misspelling or is not supported by Apache. I use “<Location>” here instead of “<File>” or “<Directory>” because I’m applying the rule only to incoming requests for “/robots.txt” explicitly, and it isn’t likely that I’ll have more than one way of reaching that file, since I’m not aware of engines that look for robots.txt in some other way. <Directory> applies rules to an entire directory and its subdirectories, and <File> applies rules to a file on disk so the rules will apply even if there’s more than one URL that maps to the file.

You may want to look at this application, which allows you to manage your robots.txt rules through the admin and generates the file on the fly for you from within Django:
http://bitbucket.org/jezdez/django-robots/src/
Thanks — I’ll check it out!
You aren’t clear on whether you might be using mod_python, mod_wsgi or something else. In general, using ‘SetHandler None’ inside of ‘Location’ directive for URL is only needed for mod_python, it is not needed for mod_wsgi or other hosting mechanisms. For more information of handling overlaying of static files such as robots.txt and favicon.ico when using mod_wsgi see ‘http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines#Hosting_Of_Static_Files'.
Thanks, Graham,
I am, in fact, using mod_wsgi. I simply missed that bit in the documentation. Thanks for pointing it out.