Using a robots.txt File With Django and Apache (on Webfaction)

I’ve developed in a few different environments, including multi-tier ones with middle tier Java app servers and stuff, but it always seemed pretty straightforward to serve something directly from disk. And in the case of PHP, everything is served from disk. There’s no middleware to speak of, so you can throw a robots.txt file in place and it “just works”. With Django, it’s slightly different because of two things:

  1. Django shouldn’t be serving static content (and therefore makes it a little inconvenient though not impossible to do so).
  2. Django works kinda like an application server that expects to receive URLs, and expects there to be some configuration in place telling it how to deal with that URL.

If you have Django serving static content, you’re wasting resources, so I’m not covering that here. My web host is webfaction, and they give you access to the configuration of your own Apache instance in addition to your Django installation’s configuration (in fact, I’m just running an svn checkout of django-trunk), so this gives you a lot of flexibility in how you deal with static files like CSS, images, or a robots.txt file. To handle robots.txt on my “staging” version of my site, I added the following lines to my apache httpd.conf file:

LoadModule alias_module modules/mod_alias.so
<Location "/robots.txt">
 SetHandler None
</Location>
alias /robots.txt /home/myusername/webapps/mywsgiapp/htdocs/robots.txt

If you don’t add mod_alias, you’ll get an error saying that the keyword “alias” is a misspelling or is not supported by Apache. I use “<Location>” here instead of “<File>” or “<Directory>” because I’m applying the rule only to incoming requests for “/robots.txt” explicitly, and it isn’t likely that I’ll have more than one way of reaching that file, since I’m not aware of engines that look for robots.txt in some other way. <Directory> applies rules to an entire directory and its subdirectories, and <File> applies rules to a file on disk so the rules will apply even if there’s more than one URL that maps to the file.

  • http://www.b-list.org/ James Bennett

    You may want to look at this application, which allows you to manage your robots.txt rules through the admin and generates the file on the fly for you from within Django:

    http://bitbucket.org/jezdez/django-robots/src/

  • m0j0

    Thanks — I’ll check it out!

  • http://blog.dscpl.com.au Graham Dumpleton

    You aren’t clear on whether you might be using mod_python, mod_wsgi or something else. In general, using ‘SetHandler None’ inside of ‘Location’ directive for URL is only needed for mod_python, it is not needed for mod_wsgi or other hosting mechanisms. For more information of handling overlaying of static files such as robots.txt and favicon.ico when using mod_wsgi see ‘http://code.google.com/p/modwsgi/wiki/ConfigurationGuidelines#Hosting_Of_Static_Files’.

  • m0j0

    Thanks, Graham,

    I am, in fact, using mod_wsgi. I simply missed that bit in the documentation. Thanks for pointing it out.

  • http://just-digital.net/ Django developer

    Thanks for posting m0j0, infact I’ve used this setup for /favicon.ico and /humans.txt too.