PyYaml with Aliases and Anchors

I didn’t know this little tidbit until yesterday and want to get it posted so I can refer to it later.

I have this YAML config file that’s kinda long and has a lot of duplication in it. This isn’t what I’m working on, but let’s just say that you have a bunch of backup targets defined in your YAML config file, and your program rocks because each backup target can be defined to go to a different destination. Awesome, right?

Well, it might be, but it might also just make your YAML config file grotesque (and error-prone). Here’s an example:

Backups:
    Home_Jonesy:
        host: foo
        dir: /Users/jonesy
        protocol: ssh
        keyloc: ~/.ssh/id_rsa.pub
        Destination:
            host: bar
            dir: /mnt/array23/homes/jonesy
            check_space: true
            min_space: 80G
            num_archives: 4
            compress: bzip2
    Home_Molly:
        host: eggs
        dir: /Users/molly
        protocol: sftp
        keyloc: ~/.ssh/id_rsa.pub
        Destination:
            host: bar
            dir: /mnt/array23/homes/jonesy
            check_space: true
            min_space: 80G
            num_archives: 4
            compress: bzip2

Now with two backups, this isn’t so bad. But if your environment has 100 backup targets and only one destination, or…. heck — even if there are three destinations — should you have to write out the definition of those same three destinations for each of 100 backup targets? What if you need to change how one of the destinations is connected to, or the name of a destination changes, or array23 dies?

Ideally, you’d be able to reference the same definition in as many places as you need it and have things “just work”, and if something needs to change, you just change it in one place. Enter anchors and aliases.

An anchor is defined just like anything else in YAML with the exception that you get to label the definition block using “&labelname”, and then you can (de)reference it elsewhere in your config with “*labelname”. So here’s how our above configuration would look:

BackupDestination-23: &Backup_To_ARRAY23
    host: bar
    dir: /mnt/array23/homes/jonesy
    check_space: true
    min_space: 80G
    num_archives: 4
    compress: bzip2
Backups:
    Home_Jonesy:
        host: foo
        dir: /Users/jonesy
        protocol: ssh
        keyloc: ~/.ssh/id_rsa.pub
        Destination: *Backup_To_ARRAY23
    Home_Molly:
        host: eggs
        dir: /Users/molly
        protocol: sftp
        keyloc: ~/.ssh/id_rsa.pub
        Destination: *Backup_To_ARRAY23

With only two backup targets, the benefit is small, but keep trying to imagine this config file with about 100 backup targets, and only one or two destinations. This removes a lot of duplication and makes things easier to change and maintain (and read!)

The cool thing about it is that if you already have code that reads the YAML config file, you don’t have to change it at all — PyYaml expands everything for you. Here’s a quick interpreter session:

>>> import yaml
>>> from pprint import pprint
>>> stream = file('foo.yaml', 'r')
>>> cfg = yaml.load(stream)
>>> pprint(cfg)
{'BackupDestination-23': {'check_space': True,
                          'compress': 'bzip2',
                          'dir': '/mnt/array23/homes/jonesy',
                          'host': 'bar',
                          'min_space': '80G',
                          'num_archives': 4},
 'Backups': {'Home_Jonesy': {'Destination': {'check_space': True,
                                             'compress': 'bzip2',
                                             'dir': '/mnt/array23/homes/jonesy',
                                             'host': 'bar',
                                             'min_space': '80G',
                                             'num_archives': 4},
                             'dir': '/Users/jonesy',
                             'host': 'foo',
                             'keyloc': '~/.ssh/id_rsa.pub',
                             'protocol': 'ssh'},
             'Home_Molly': {'Destination': {'check_space': True,
                                            'compress': 'bzip2',
                                            'dir': '/mnt/array23/homes/jonesy',
                                            'host': 'bar',
                                            'min_space': '80G',
                                            'num_archives': 4},
                            'dir': '/Users/molly',
                            'host': 'eggs',
                            'keyloc': '~/.ssh/id_rsa.pub',
                            'protocol': 'sftp'}}}

…And notice how everything has been expanded.

Enjoy!