Lessons Learned While Creating a Generic Taxonomy App for Django

So, when I first picked up a guitar, the first song I sat down to learn, by ear, was Stairway to Heaven, not “Twinkle, Twinkle, Little Star”. So goes my experience with Django 🙂

The Background

I was humming along on my recreation of LinuxLaboratory.org. I got a simple blog in place in just a couple of days, a code-sharing app in place a few days later (if that), and a very simple CMS I threw together using flatpages. A good bit of the base code I used came from the 2nd edition of “Practical Django Projects”, but I soon veered off in other directions, and started analyzing the work I’d already done a bit more closely.

One of the things that was glaringly obvious to me was that my method of classifying content was a little schizophrenic. I had three separate apps to represent different types of content, which is great, but each separate app had its own “Category” model. Yuck. On top of that, I was using django-tagging to enable tagging in addition to the categorization each app supported.

The Problem

So… for one type of classification (Categories), it’s built into the specific application, and for the other (tagging), it’s not built in, but it’s pretty tightly coupled. There are a few fundamental drawbacks to this approach:

First, you have to make a pretty big commitment to these things. The easiest way to implement them in your app is to add support for them at the outset, because adding them in later is going to be a bit of a headache. Categories aren’t quite so bad — I implemented them the way the book does, which is with a ManyToMany field. In Django, when you create a ManyToMany field in a model, there’s no corresponding field in that model’s table in the database. Instead, Django creates a lookup table for you, which is nice, because it means you *could* add categories at a later time without *too* much trouble. Tags use the django-tagging app, which implements tags as a multi-valued field in the database table representing the model that will use the tags. So adding this in later is a little bit more of a hassle.

The second issue is that this approach doesn’t treat classification in a consistent manner. One is in the app, the other is a separate app, one is a field, the other is a model, one affects the model’s table, the other doesn’t, etc. One place where this inconsistency becomes obvious is in your templates, where you’re likely to want to give users the ability to browse by category, or browse by tag. Browsing by category across all the different content types is going to be pretty tough if they all have their own implementation. Tags are a little easier, but it’s still a little cumbersome.

The third issue is specific to Categories, and has to do with maintenance: if I come up with some fantastic idea for the Category implementation (like, I dunno, subcategories?), I have to implement it separately in all of the apps that are using categories. No Bueno™.

The Dream

Wouldn’t it be nice if you could just say “give me a list of the taxonomy types used to classify this piece of content, whatever it is, whichever app it comes from, and also a list of the taxonomy terms involved”? Wouldn’t it be nice if you could just add support for categories and tags using a single app that doesn’t add anything to your existing tables? Wouldn’t it be nice to be able to come up with your own taxonomies and, perhaps, hierarchical taxonomies and more complex relationships? Wouldn’t it be nice to be able to say “here’s a category name, show me all of the content associated with it, sorted by content type” and then change your mind and say “no wait, show it to me sorted by title, with a content type indicator over here”, and then change your mind again and say “er, how about showing the taxonomy type, then the term, then all of the content objects under that type:term pairing”?

The Solution

I think it would, so I started creating this beast that I just call “taxonomy”. Right now it’s pretty simplistic, and it’ll likely change slightly based on some things I’ve learned, but surprisingly, I think my first shot at it is really darn close! I’ve stopped being surprised at how quickly I’m able to prototype in Django: getting this together, including the creation of the models, getting it into the admin interface, and getting it linked with any random content type from any app that wants to use it within the admin interface (to add taxonomies and labels to a piece of content in the ‘edit’ interface) took probably 4 hours, including time to read documentation and fall down a few times.

The admin interface for taxonomy lets you create a taxonomy, so if none of your apps currently don’t support the notion of a “Category”, you can go create a taxonomy called “Category”. Once that’s there, you can create a “taxonomy term”, where you’d select the “type” for this term (your new Category), and then a term. So if your term was “Django”, then you would have just created a category called “Django” that could be used by any other app/model in your project. The same, of course, would go for tags, and whatever other classification devices you want.

There’s support for parent-child relationships at the taxonomy term level (so you can have subcategories, or even subtags if you want, etc. I guess you could even categorize tags, and tag categories! They’re coming to take me away!!!). I haven’t given much thought to having hierarchical relationships for the taxonomies themselves. That would be a little overboard, no? I’m interested to hear realistic use cases for that 🙂

Once you’ve created a taxonomy and a term, the next thing to do is figure out how to associate your actual content to it. So, if our taxonomy is “Category” and the category name (the “term”) is “Django”, the way I’ve implemented it is that you’d go into the edit interface for the article, and a form for associating it with your category appears. This was created using a GenericInlineModelAdmin, which was a gem of a find in the documentation. Inlines let you easily create a form to update a piece of content using concepts and attributes from other models, and even other applications. If you don’t know much about Django, this sounds like a big mess, but in reality, it’s fairly elegant.

I’ve done some testing to see that I can pull things out of the database and associate things properly in the presentation layer, but I’d like to work on making it smoother before I go releasing code or anything like that…. which reminds me that I *did* look and ask around about an app that maybe already did this and came up dry. If anyone sees this and says “why not just use x”, let me know, because it’s not really a goal to write code for the sake of coding. I actually thought this was an interesting feature and couldn’t find it.

Lessons Learned #1: URLConf is a Choice, Not a Requirement

First, I learned that it’s completely possible to create an application that doesn’t have a URLConf at all. Currently, taxonomies actually work in testing, and there’s no URLConf. There actually *will* be one when I figure out how I want the data to be used on my own site, and how to enable users to do whatever they want with it as well. One thought, for example, is that it would be really awesome to be able to go to “/categories/django” and have my app somehow “just know” that “categories”, when singularized, is “category”, which is a taxonomy. From there, the taxonomy app takes over, and magic happens. I have faith that I can make this happen without having the word “taxonomy” in the url. We’ll see.

Anyway, the point is that you don’t have to have a URLconf, and that hadn’t really occurred to me. For the record, django-tagging also doesn’t have a URLconf.

Lessons Learned #2: ContentTypes Let Your Models Be All-Knowing

The second thing I learned was that, using the ContentTypes framework within Django, it’s possible to create a model that will deal with data, and relationships to data, in a dynamic way, such that you don’t have to know what type of data your models will be working with at the time you create them.

For example, my taxonomy app can be used with my blog’s “Entry” model, my code-sharing app’s “Snippet” model, and my CMSs “Page” model. If I pass the app to you, you can use it for your news site’s “Story” model, your ad network’s “Ad” model, and your Twitter clone’s “Tweet” model. No problem. This is in the docs, but here’s what I’ve done:

class Taxonomy(models.Model):
 """A facility for creating custom content classification types"""
 type = models.CharField(max_length=50, unique=True)

class TaxonomyTerm(models.Model):
 """Terms are associated with a specific Taxonomy, and should be generically usable with any contenttype"""
 type = models.ForeignKey(Taxonomy)
 term = models.CharField(max_length=50)
 parent = models.ForeignKey('self', null=True,blank=True)

class TaxonomyMap(models.Model):
 """Mappings between content and any taxonomy types/terms used to classify it"""
 term        = models.ForeignKey(TaxonomyTerm, db_index=True)
 type        = models.ForeignKey(Taxonomy, db_index=True)
 content_type = models.ForeignKey(ContentType, verbose_name='content type', db_index=True)
 object_id      = models.PositiveIntegerField(db_index=True)   
 object         = generic.GenericForeignKey('content_type', 'object_id')

Note that I’ve removed some stuff from the model defs — what you see here are just the fields, which are the relevant bit for what I’m explaining.

The TaxonomyMap model (a model is a class definition, by the way) has foreign keys to map to a human readable ‘term’ and ‘type’ in the other models. TaxonomyMap is just to store mappings between content objects and taxonomies (lower-level details of this might change to make it cleaner/more efficient – I know it’s not perfect). So, how does my app know that I’m storing a mapping to an “Entry” from my blog app? How does it get the id for that Entry? What’s going on?

Well, Django stores a list of every content type used by Django and any installed apps, and I’ve made a foreign key to ContentType so I can access the content type of the object that’s being dealt with and get its ID. I also have a “GenericForeignKey” field, which essentially creates a “dynamic” foreign key to the table that represents the object that’s being dealt with, so if I’m dealing with an “Entry” object from “monk” (which is the name of my blog app), then the foreign key will point to “monk_entry”, which is the table that stores my blog entries. When you create a taxonomy, and a term, and associate them to a piece of content, the resulting rows in the affected tables look like this:

mysql> select * from taxonomy_taxonomy;
+----+--------------+
| id | type         |
+----+--------------+
|  1 | TestCategory |
+----+--------------+
1 row in set (0.01 sec)

mysql> select * from taxonomy_taxonomyterm;
+----+---------+----------+-----------+
| id | type_id | term     | parent_id |
+----+---------+----------+-----------+
|  1 |       1 | TestTerm |      NULL |
+----+---------+----------+-----------+
1 row in set (0.00 sec)

mysql> select * from taxonomy_taxonomymap;
+----+---------+---------+-----------------+-----------+
| id | term_id | type_id | content_type_id | object_id |
+----+---------+---------+-----------------+-----------+
|  1 |       1 |       1 |              10 |         2 |
+----+---------+---------+-----------------+-----------+
1 row in set (0.00 sec)

Note that the table for the model that’s using the taxonomy app is untouched. Only taxonomy tables are used.

Seeing this, you might think that it’d be hard to put a form in the admin interface for arbitrary content types to classify them with taxonomies. Not so — which brings me to more lessons I learned.

Lessons Learned #3: Collecting Data About a Model Without Extending the Model and Creating Database Badness

If you have a model (we’ll use “Entry” again), and it has a core set of attributes, but you want to associate data with instances of this model not represented in the model definition (like, say, a taxonomy, for an arbitrary example), you can add a form to the admin interface for that model in about 5 minutes. This rocks, for those who didn’t know, because the alternative would either involve really ugly code, or really ugly data (you’d have to store the taxonomies in the table for the model, creating either tons of duplicate data, or multi-valued fields… and you’d still have duplicate data).

Typically, it seems that the normal use case for this is to relate models in the admin interface that are part of the same application and are explicitly related through a direct foreign key reference. This might even be enforced in the case of “InlineModelAdmin” objects, but I haven’t dealt with those personally. However, while reading about “GenericInlineModelAdmin” objects, it occurred to me that it shouldn’t matter that the related items are from different apps. I tried it, and it worked. Here’s what I did:

from django.contrib import admin
from django.contrib.contenttypes import generic
from monk.models import Entry
from taxonomy.models import TaxonomyMap

class TaxonomyMapInline(generic.GenericTabularInline):
   model = TaxonomyMap

class EntryAdmin(admin.ModelAdmin):
   prepopulated_fields = { 'slug': ['title'] }
   inlines = [ TaxonomyMapInline, ]

admin.site.register(Entry, EntryAdmin)

Again, I’ve edited out the irrelevant bits. The above comes from my blog app’s admin.py file. What I did was created an “inline” called “TaxonomyMapInline”, and then associated that inline with the “EntryAdmin” ModelAdmin object using ModelAdmin’s ‘inlines’ attribute, which takes a Python list, which means you can keep adding more inlines all day long if you like.

The result is that, when I go to edit a blog entry, there’s now a form at the bottom that lets the user select a taxonomy type and term (i.e. “Category” “Django”), and associate it with the post. When I added the inline to the admin.py file, it was a test to see what would happen. Since TaxonomyMap doesn’t hold anything but numeric IDs, I assumed I would have to go back and manually map the IDs to human readable values. Not true. Apparently, if the field being presented in the admin form maps to a ForeignKey field, Django automagically does the lookup for you and presents the human-readable text! And, when you save, it converts everything back to numeric IDs before going to the database, so everything “just works”. So the work I thought I’d be doing myself was already done for me!