<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Musings of an Anonymous Geek &#187; Hacks</title>
	<atom:link href="http://www.protocolostomy.com/category/hacks/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.protocolostomy.com</link>
	<description>Made with only the finest 1's and 0's</description>
	<lastBuildDate>Thu, 03 Nov 2011 04:08:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Nose and Coverage.py Reporting in Hudson</title>
		<link>http://www.protocolostomy.com/2010/12/02/nose-and-coverage-py-reporting-in-hudson/</link>
		<comments>http://www.protocolostomy.com/2010/12/02/nose-and-coverage-py-reporting-in-hudson/#comments</comments>
		<pubDate>Thu, 02 Dec 2010 14:21:39 +0000</pubDate>
		<dc:creator>bkjones</dc:creator>
				<category><![CDATA[Big Ideas]]></category>
		<category><![CDATA[Hacks]]></category>
		<category><![CDATA[Productivity]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Sysadmin]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Testing]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://www.protocolostomy.com/?p=851</guid>
		<description><![CDATA[I like Hudson. Sure, it&#8217;s written in Java, but let&#8217;s be honest, it kinda rocks. If you&#8217;re a Java developer, it&#8217;s admittedly worlds better because it integrates with seemingly every Java development tool out there, but we can do some cool things in Python too, and I thought I&#8217;d share a really simple setup to [...]]]></description>
			<content:encoded><![CDATA[<p>I like Hudson. Sure, it&#8217;s written in Java, but let&#8217;s be honest, it kinda rocks. If you&#8217;re a Java developer, it&#8217;s admittedly worlds better because it integrates with seemingly every Java development tool out there, but we can do some cool things in Python too, and I thought I&#8217;d share a really simple setup to get coverage.py&#8217;s HTML reports and nose&#8217;s xUnit-style reports into your Hudson interface.</p>
<p>I&#8217;m going to assume that you know what these tools are and have them installed. I&#8217;m working with a local install of Hudson for this demo, but it&#8217;s worth noting that I&#8217;ve come to find a local install of Hudson pretty useful, and it doesn&#8217;t really eat up too much CPU (so far). More on that in another post. Let&#8217;s get moving.</p>
<h3>Process Overview</h3>
<p>As mentioned, this process is really pretty easy. I&#8217;m only documenting it because I haven&#8217;t seen it documented before, and someone else might find it handy. So here it is in a nutshell:</p>
<ul>
<li>Install the <a href="http://wiki.hudson-ci.org/display/HUDSON/HTML+Publisher+Plugin">HTML Publisher</a> plugin</li>
<li>Create or alter a configuration for a &#8220;free-style software project&#8221;</li>
<li>Add a Build Step using the &#8216;Execute Shell&#8217; option, and enter a &#8216;nosetests&#8217; command, using its built-in support for xUnit-style test reports and coverage.py</li>
<li>Check the &#8216;Publish HTML Report&#8217;, and enter the information required to make Hudson find the coverage.py HTML report.</li>
<li>Build, and enjoy.</li>
</ul>
<h3>Install The HTMLReport Plugin</h3>
<p>From the dashboard, click &#8216;Manage Hudson&#8217;, and then on &#8216;Manage Plugins&#8217;. Click on the &#8216;Available&#8217; tab to see the plugins available for installation. It&#8217;s a huge list, so I generally just hit &#8216;/&#8217; in Firefox or cmd-F in Chrome and search for &#8216;HTML Publisher Plugin&#8217;. Check the box, go to the bottom, and click &#8216;Install&#8217;. Hudson will let you know when it&#8217;s done installing, at which time you need to restart Hudson.</p>
<div id="attachment_855" class="wp-caption alignnone" style="width: 310px"><a href="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-11.01-AM.png"><img class="size-medium wp-image-855 " title="HTML Publisher Plugin" src="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-11.01-AM-300x164.png" alt="Install tab" width="300" height="164" /></a><p class="wp-caption-text">HTML Publisher Plugin: Check!</p></div>
<h3><strong>Configure a &#8216;free-style software project&#8217;</strong></h3>
<p>If you have an existing project already, click on it and then click the &#8216;Configure&#8217; link in the left column. Otherwise, click on &#8216;New Job&#8217;, and choose &#8216;Build a free-style software project&#8217; from the list of options. Give the job a name, and click &#8216;OK&#8217;.</p>
<div id="attachment_856" class="wp-caption alignnone" style="width: 310px"><a href="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-17.13-AM.png"><img class="size-medium wp-image-856 " title="build-project" src="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-17.13-AM-300x255.png" alt="Build a free-style software project." width="300" height="255" /></a><p class="wp-caption-text">You have to give the job a name to enable the &#39;ok&#39; button <img src='http://www.protocolostomy.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p></div>
<h3>Add a Build Step</h3>
<p>In the configuration screen for the job, which you should now be looking at, scroll down and click the button that says &#8216;Add build step&#8217;, and choose &#8216;Execute shell&#8217; from the resulting menu.</p>
<div id="attachment_857" class="wp-caption alignnone" style="width: 310px"><a href="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-27.18-AM.png"><img class="size-medium wp-image-857" title="add build" src="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-27.18-AM-300x145.png" alt="Add Build Step" width="300" height="145" /></a><p class="wp-caption-text">Execute shell. Mmmmm... shells.</p></div>
<p>This results in a &#8216;Command&#8217; textarea appearing, which is where you type the shell command to run. In that box, type this:</p>
<pre class="brush: bash; title: ; notranslate">
/usr/local/bin/nosetests --with-xunit --with-coverage --cover-package demo --cover-html -w tests
</pre>
<p>Of course, replace &#8216;demo&#8217; with the name of the package you want covered in your coverage tests to avoid the mess of having coverage.py try to seek out every module used in your entire application.</p>
<p>We&#8217;re telling Nose to generate an xUnit-style report, which by default will be put in the current directory in a file called &#8216;nosetests.xml&#8217;. We&#8217;re also asking for coverage analysis using coverage.py, and requesting an HTML report of the analysis. By default, this is placed in the current directory in &#8216;cover/index.html&#8217;.</p>
<p><a href="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-44.17-AM.png"><img class="alignnone size-full wp-image-864" title="exec shell truncated" src="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-44.17-AM.png" alt="execute shell area" width="423" height="239" /></a></p>
<p>Now we need to set up our reports by telling Hudson we want them, and where to find them.</p>
<h3>Enable JUnit Reports</h3>
<p>In the &#8216;Post-Build Actions&#8217; area at the bottom of the page, check &#8216;Publish JUnit test result report&#8217;, and make it look like this:</p>
<p><a href="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-48.16-AM.png"><img class="alignnone size-full wp-image-866" title="Screen shot 2010-12-02 at ,Dec 2 -48.16 AM" src="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-48.16-AM.png" alt="" width="539" height="125" /></a></p>
<p>The &#8216;**&#8217; is part of the <a href="http://ant.apache.org/manual/Types/fileset.html">Ant Glob Syntax</a>, and stands for the current working directory. Remember that we said earlier nose will publish, by default, to a file called &#8216;nosetests.xml&#8217; in the current working directory.</p>
<p>The current working directory is going to be the Hudson &#8216;workspace&#8217; for that job, linked to in the &#8216;workspace root&#8217; link you see in the above image. It should mostly be a checkout of your source code. Most everything happens relative to the workspace, which is why in my nosetest command you&#8217;ll notice I pass &#8216;-w tests&#8217; to tell nose to look in the &#8216;tests&#8217; subdirectory of the current working directory.</p>
<p>You could stop right here if you don&#8217;t track coverage, just note that these reports don&#8217;t get particularly exciting until you&#8217;ve run a number of builds.</p>
<h3>Enable Coverage Reports</h3>
<p>Just under the JUnit reporting checkbox should be the Publish HTML Reports checkbox. The ordering of things can differ depending on the plugins you have installed, but it should at least still be in the Post-build Actions section of the page.</p>
<p>Check the box, and a form will appear. Make it look like this:</p>
<p><a href="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-58.52-AM.png"><img class="alignnone size-full wp-image-867" title="html report form" src="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-58.52-AM.png" alt="" width="954" height="116" /></a></p>
<p><span style="font-size: 13.3333px;"> </span></p>
<p><span style="font-size: 13.3333px;">By default, coverage.py will create a directory called &#8216;cover&#8217; and put its files in there (one for each covered package, and an index). It puts them in the directory you pass to nose with the &#8216;-w&#8217; flag. If you don&#8217;t use a &#8216;-w&#8217; flag&#8230; I dunno &#8212; I&#8217;d guess it puts it in the directory from where you run nose, in which case the above would become &#8216;**/cover&#8217; or just &#8216;cover&#8217; if this option doesn&#8217;t use Ant Glob Syntax. </span></p>
<h3>Go Check It Out!</h3>
<p>Now that you have everything put together, click on &#8216;Save&#8217;, and run some builds!</p>
<p>On the main page for your job, after you&#8217;ve run a build, you should see a &#8216;Coverage.py Report&#8217; link and a &#8216;Latest Test Result&#8217; link. After multiple builds, you should see a test result &#8216;Trend&#8217; chart on the job&#8217;s main page as well.</p>
<p><a href="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-08.56-AM.png"><img class="alignnone size-full wp-image-870" title="jobpage" src="http://www.protocolostomy.com/wp-content/uploads/2010/12/Screen-shot-2010-12-02-at-Dec-2-08.56-AM-e1291299370205.png" alt="job page" width="800" height="203" /></a></p>
<p>Almost everything on the page is clickable. The trend graph isn&#8217;t too enlightening until multiple builds have run, but I find the coverage.py reports a nice way to see at-a-glance what chunks of code need work. It&#8217;s way nicer than reading the line numbers output on the command line (though I sometimes use those too).</p>
<h3>How &#8217;bout you?</h3>
<p>If you&#8217;ve found other nice tricks in working with Hudson, share! I&#8217;ve been using Hudson for a while now, but that doesn&#8217;t mean I&#8217;m doing anything super cool with it &#8212; it just means I know enough to suspect I could be doing way cooler stuff with it that I haven&#8217;t gotten around to playing with. <img src='http://www.protocolostomy.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://www.protocolostomy.com/2010/12/02/nose-and-coverage-py-reporting-in-hudson/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Per-machine Bash History</title>
		<link>http://www.protocolostomy.com/2010/05/10/per-machine-bash-history/</link>
		<comments>http://www.protocolostomy.com/2010/05/10/per-machine-bash-history/#comments</comments>
		<pubDate>Mon, 10 May 2010 19:12:52 +0000</pubDate>
		<dc:creator>bkjones</dc:creator>
				<category><![CDATA[Hacks]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Productivity]]></category>
		<category><![CDATA[Sysadmin]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.protocolostomy.com/?p=775</guid>
		<description><![CDATA[I do work on a lot of machines no matter what environment I&#8217;m working in, and a lot of the time each machine has a specific purpose. One thing that really annoys me when I work in an environment with NFS-mounted home directories is that if I log into a machine I haven&#8217;t used in [...]]]></description>
			<content:encoded><![CDATA[<p>I do work on a lot of machines no matter what environment I&#8217;m working in, and a lot of the time each machine has a specific purpose. One thing that really annoys me when I work in an environment with NFS-mounted home directories is that if I log into a machine I haven&#8217;t used in some time, none of the history specific to that machine is around anymore.</p>
<p>If I had a separate ~/.bash_history file on each machine, this would likely solve the problem. It&#8217;s pretty simple to do as it turns out. Just add the following lines to ~/.bashrc:</p>
<pre>srvr=`hostname`
export HISTFILE="/home/jonesy/.bash_history_${srvr}"
</pre>
<p>Don&#8217;t be alarmed when you source ~/.bashrc and you don&#8217;t see the file appear in your home directory. Unless you&#8217;ve configured things otherwise, history is only written at the end of a bash session. So go ahead and source bashrc, run a few commands, end your session, log back in, and the file should be there.</p>
<p>I&#8217;m not actually sure if this is going to be a great idea for everyone. If you work in an environment where you run the same commands from machine to machine, it might be better to just leave things alone. For me, I&#8217;m running different psql/mysql connection commands and stuff like that which differ depending on the machine I&#8217;m on and the connection perms it has.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.protocolostomy.com/2010/05/10/per-machine-bash-history/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Programmers that&#8230; can&#8217;t program.</title>
		<link>http://www.protocolostomy.com/2010/03/15/programmers-that-cant-program/</link>
		<comments>http://www.protocolostomy.com/2010/03/15/programmers-that-cant-program/#comments</comments>
		<pubDate>Tue, 16 Mar 2010 00:06:19 +0000</pubDate>
		<dc:creator>bkjones</dc:creator>
				<category><![CDATA[Big Ideas]]></category>
		<category><![CDATA[Hacks]]></category>
		<category><![CDATA[Other Cool Blogs]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.protocolostomy.com/?p=724</guid>
		<description><![CDATA[So, I happened across this post about hiring programmers, which references two other posts about hiring programmers. There seems to be a demand for blog posts about hiring programmers, but that&#8217;s not why I&#8217;m writing this. I&#8217;m writing because there was this sort of nagging irony that I couldn&#8217;t help but stumble upon. In a [...]]]></description>
			<content:encoded><![CDATA[<p>So, I happened across <a href="http://lateral.netmanagers.com.ar/weblog/posts/BB881.html">this post</a> about hiring programmers, which references two other posts about hiring programmers. There seems to be a demand for blog posts about hiring programmers, but that&#8217;s not why I&#8217;m writing this. I&#8217;m writing because there was this sort of nagging irony that I couldn&#8217;t help but stumble upon.</p>
<p>In a <a href="http://www.joelonsoftware.com/items/2005/01/27.html">blog post</a>, Joel Spolsky talks about the mathematical inaccuracies associated with claims of &#8220;only hiring the top 1%&#8221;. It seemed pretty obvious to me that whether or not you&#8217;re hiring the top 1% of all programmers is pretty much unknowable, and when managers say they hire &#8220;the top 1%&#8221;, I assume they&#8217;re talking about the top 1% of their applicants. Note too that I always thought it was idiotic to point this out, because, well, isn&#8217;t that what you&#8217;re SUPPOSED to do? You&#8217;re not very well going to aim for the middle &amp; hope for the best are you?</p>
<p>Apparently I&#8217;ve been giving too much credit to management. There I go giving people with ties on the benefit of the doubt again.</p>
<p>Then, in another <a href="http://www.codinghorror.com/blog/2007/02/why-cant-programmers-program.html">blog post</a>, Jeff Atwood talks about how it&#8217;s very difficult to even get interviews with programmers who can <em>actually program</em>. The problem is real.</p>
<p>The original blog post that pointed me at the two others is one by Roberto Alsina where he talks about his own methods for weeding out the non-programmers. He&#8217;s clearly seen the issue as well.</p>
<p>But if you open all three of these posts in separate tabs and read them, you&#8217;re likely to come away with the same basic problem I did:</p>
<ul>
<li>Who the hell are these managers who can&#8217;t figure out a dead simple statistics problem?</li>
<li>How can a person fairly inept at simple math be qualified to make a hiring decision for anything but a summer intern?</li>
</ul>
<p>That sorta blew my mind a little. But it blew my mind a lot when Atwood started describing the problems that interviewees *couldn&#8217;t* perform in an interview! One task described by <a href="http://imranontech.com/2007/01/24/using-fizzbuzz-to-find-developers-who-grok-coding/">Imran</a> was called a &#8216;FizzBuzz&#8217; question. Here&#8217;s one such question:</p>
<blockquote><p>Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.</p></blockquote>
<p>Here&#8217;s the part that blew my mind: He says, and I quote:</p>
<blockquote><p>Most good programmers should be able to write out on paper a program which does this in a under a couple of minutes.</p>
<p>Want to know something scary ? – the majority of comp sci graduates can’t. I’ve also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.</p></blockquote>
<p>That&#8217;s amazing to me. I decided to quickly pop open a Python prompt and see if I could do it:</p>
<pre>
<div id="_mcePaste">&gt;&gt;&gt; for i in range(1,101):</div>
<div id="_mcePaste">...     if (i % 3 == 0) and (i % 5 == 0):</div>
<div id="_mcePaste">...             print i,'FizzBuzz'</div>
<div id="_mcePaste">...     elif i % 3 == 0:</div>
<div id="_mcePaste">...             print i, 'Fizz'</div>
<div id="_mcePaste">...     elif i % 5 == 0:</div>
<div id="_mcePaste">...             print i, 'Buzz'</div>
<div id="_mcePaste">...     else:</div>
<div id="_mcePaste">...             print i</div>
<div id="_mcePaste">...</div>
</pre>
<p>Note that I&#8217;ve taken the liberty of printing out the numbers in addition to the required words. I&#8217;m playing the role of interviewer and interviewee here, and wanted to be able to easily verify that things were correct, since there was no time for unit testing <img src='http://www.protocolostomy.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>Turns out it worked on the first try! That was pasted directly from my terminal screen. I didn&#8217;t time myself, but it took far less than 5 minutes. This leads to my other question, of course, which is &#8220;if you&#8217;re going to complain about CS degree holders not writing good code, maybe it&#8217;s time to open the doors to non-CS degree holders?&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.protocolostomy.com/2010/03/15/programmers-that-cant-program/feed/</wfw:commentRss>
		<slash:comments>23</slash:comments>
		</item>
		<item>
		<title>PyYaml with Aliases and Anchors</title>
		<link>http://www.protocolostomy.com/2009/12/22/pyyaml-with-aliases-and-anchors/</link>
		<comments>http://www.protocolostomy.com/2009/12/22/pyyaml-with-aliases-and-anchors/#comments</comments>
		<pubDate>Tue, 22 Dec 2009 13:10:11 +0000</pubDate>
		<dc:creator>bkjones</dc:creator>
				<category><![CDATA[Hacks]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Sysadmin]]></category>
		<category><![CDATA[Technology]]></category>

		<guid isPermaLink="false">http://www.protocolostomy.com/?p=680</guid>
		<description><![CDATA[I didn&#8217;t know this little tidbit until yesterday and want to get it posted so I can refer to it later. I have this YAML config file that&#8217;s kinda long and has a lot of duplication in it. This isn&#8217;t what I&#8217;m working on, but let&#8217;s just say that you have a bunch of backup [...]]]></description>
			<content:encoded><![CDATA[<p>I didn&#8217;t know this little tidbit until yesterday and want to get it posted so I can refer to it later.</p>
<p>I have this YAML config file that&#8217;s kinda long and has a lot of duplication in it. This isn&#8217;t what I&#8217;m working on, but let&#8217;s just say that you have a bunch of backup targets defined in your YAML config file, and your program rocks because each backup target can be defined to go to a different destination. Awesome, right?</p>
<p>Well, it might be, but it might also just make your YAML config file grotesque (and error-prone). Here&#8217;s an example:</p>
<pre>Backups:
    Home_Jonesy:
        host: foo
        dir: /Users/jonesy
        protocol: ssh
        keyloc: ~/.ssh/id_rsa.pub
        Destination:
            host: bar
            dir: /mnt/array23/homes/jonesy
            check_space: true
            min_space: 80G
            num_archives: 4
            compress: bzip2
    Home_Molly:
        host: eggs
        dir: /Users/molly
        protocol: sftp
        keyloc: ~/.ssh/id_rsa.pub
        Destination:
            host: bar
            dir: /mnt/array23/homes/jonesy
            check_space: true
            min_space: 80G
            num_archives: 4
            compress: bzip2</pre>
<p>Now with two backups, this isn&#8217;t so bad. But if your environment has 100 backup targets and only one destination, or&#8230;. heck &#8212; even if there are three destinations &#8212; should you have to write out the definition of those same three destinations for each of 100 backup targets? What if you need to change how one of the destinations is connected to, or the name of a destination changes, or array23 dies?</p>
<p>Ideally, you&#8217;d be able to reference the same definition in as many places as you need it and have things &#8220;just work&#8221;, and if something needs to change, you just change it in one place. Enter anchors and aliases.</p>
<p>An anchor is defined just like anything else in YAML with the exception that you get to label the definition block using &#8220;&amp;labelname&#8221;, and then you can (de)reference it elsewhere in your config with &#8220;*labelname&#8221;. So here&#8217;s how our above configuration would look:</p>
<pre>BackupDestination-23: &amp;Backup_To_ARRAY23
    host: bar
    dir: /mnt/array23/homes/jonesy
    check_space: true
    min_space: 80G
    num_archives: 4
    compress: bzip2
Backups:
    Home_Jonesy:
        host: foo
        dir: /Users/jonesy
        protocol: ssh
        keyloc: ~/.ssh/id_rsa.pub
        Destination: *Backup_To_ARRAY23
    Home_Molly:
        host: eggs
        dir: /Users/molly
        protocol: sftp
        keyloc: ~/.ssh/id_rsa.pub
        Destination: *Backup_To_ARRAY23</pre>
<p>With only two backup targets, the benefit is small, but keep trying to imagine this config file with about 100 backup targets, and only one or two destinations. This removes a lot of duplication and makes things easier to change and maintain (and read!)</p>
<p>The cool thing about it is that if you already have code that reads the YAML config file, you don&#8217;t have to change it at all &#8212; PyYaml expands everything for you. Here&#8217;s a quick interpreter session:</p>
<pre>&gt;&gt;&gt; import yaml
&gt;&gt;&gt; from pprint import pprint
&gt;&gt;&gt; stream = file('foo.yaml', 'r')
&gt;&gt;&gt; cfg = yaml.load(stream)
&gt;&gt;&gt; pprint(cfg)
{'BackupDestination-23': {'check_space': True,
                          'compress': 'bzip2',
                          'dir': '/mnt/array23/homes/jonesy',
                          'host': 'bar',
                          'min_space': '80G',
                          'num_archives': 4},
 'Backups': {'Home_Jonesy': {'Destination': {'check_space': True,
                                             'compress': 'bzip2',
                                             'dir': '/mnt/array23/homes/jonesy',
                                             'host': 'bar',
                                             'min_space': '80G',
                                             'num_archives': 4},
                             'dir': '/Users/jonesy',
                             'host': 'foo',
                             'keyloc': '~/.ssh/id_rsa.pub',
                             'protocol': 'ssh'},
             'Home_Molly': {'Destination': {'check_space': True,
                                            'compress': 'bzip2',
                                            'dir': '/mnt/array23/homes/jonesy',
                                            'host': 'bar',
                                            'min_space': '80G',
                                            'num_archives': 4},
                            'dir': '/Users/molly',
                            'host': 'eggs',
                            'keyloc': '~/.ssh/id_rsa.pub',
                            'protocol': 'sftp'}}}</pre>
<p>&#8230;And notice how everything has been expanded.</p>
<p>Enjoy!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.protocolostomy.com/2009/12/22/pyyaml-with-aliases-and-anchors/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Python, PostgreSQL, and psycopg2&#8242;s Dusty Corners</title>
		<link>http://www.protocolostomy.com/2009/12/01/python-postgresql-and-psycopg2s-dusty-corners/</link>
		<comments>http://www.protocolostomy.com/2009/12/01/python-postgresql-and-psycopg2s-dusty-corners/#comments</comments>
		<pubDate>Wed, 02 Dec 2009 03:07:59 +0000</pubDate>
		<dc:creator>bkjones</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[Hacks]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Sysadmin]]></category>

		<guid isPermaLink="false">http://www.protocolostomy.com/?p=655</guid>
		<description><![CDATA[Last time I wrote code with psycopg2 was around 2006, but I was reacquainted with it over the past couple of weeks, and I wanted to make some notes on a couple of features that are not well documented, imho. Portions of this post have been snipped from mailing list threads I was involved in. [...]]]></description>
			<content:encoded><![CDATA[<p>Last time I wrote code with psycopg2 was around 2006, but I was reacquainted with it over the past couple of weeks, and I wanted to make some notes on a couple of features that are not well documented, imho. Portions of this post have been snipped from mailing list threads I was involved in.</p>
<h3>Calling PostgreSQL Functions with psycopg2</h3>
<p>So you need to call a function. Me too. I had to call a function called &#8216;myapp.new_user&#8217;. It expects a bunch of input arguments. Here&#8217;s my first shot after misreading some piece of some example code somewhere:</p>
<pre class="brush: python; title: ; notranslate">
qdict = {'fname': self.fname, 'lname': self.lname, 'dob': self.dob, 'city': self.city, 'state': self.state, 'zip': self.zipcode}

sqlcall = &quot;&quot;&quot;SELECT * FROM myapp.new_user( %(fname)s, %(lname)s,
%(dob)s, %(city)s, %(state)s, %(zip)s&quot;&quot;&quot; % qdict

curs.execute(sqlcall)
</pre>
<p>There&#8217;s no reason this should work, or that anyone should expect it to work. I just wanted to include it in case someone else made the same mistake. Sure, the proper arguments are put in their proper places in &#8216;sqlcall&#8217;, but they&#8217;re not quoted at all.</p>
<p>Of course, I foolishly tried going back and putting quotes around all of those named string formatting arguments, and of course that fails when you have something like a quoted &#8220;NULL&#8221; trying to move into a date column. It has other issues too, like being error-prone and a PITA, but hey, it was pre-coffee time.</p>
<p>What&#8217;s needed is a solution whereby psycopg2 takes care of the formatting for us, so that strings become strings, NULLs are passed in a way that PostgreSQL recognizes them, dates are passed in the proper format, and all that jazz.</p>
<p>My next attempt looked like this:</p>
<pre class="brush: python; title: ; notranslate">
curs.execute(&quot;&quot;&quot;SELECT * FROM myapp.new_user( %(fname)s, %(lname)s,
%(dob)s, %(city)s, %(state)s, %(zip)s&quot;&quot;&quot;, qdict)
</pre>
<p>This is, according to some articles, blog posts, and at least one reply on the psycopg mailing list &#8220;the right way&#8221; to call a function using psycopg2 with PostgreSQL. I&#8217;m here to tell you that this is not correct to the best of my knowledge.The only real difference between this attempt and the last is I&#8217;ve replaced the &#8220;%&#8221; with a comma, which turns what *was* a string formatting operation into a proper SELECT with a psycopg2-recognized parameter list. I thought this would get psycopg2 to &#8220;just work&#8221;, but no such luck. I still had some quoting issues.</p>
<p>I have no idea where I read this little tidbit about psycopg2 being able to convert between Python and PostgreSQL data types, but I did. Right around the same time I was thinking &#8220;it&#8217;s goofy to issue a SELECT to call a function that doesn&#8217;t really want to SELECT anything. Can&#8217;t callproc() do this?&#8221; Turns out callproc() is really the right way to do this (where &#8220;right&#8221; is defined by the DB-API which is the spec for writing a Python database module). Also turns out that psycopg2 can and will do the type conversions. Properly, even (in my experience so far).</p>
<p>So here&#8217;s what I got to work:</p>
<pre class="brush: python; title: ; notranslate">
callproc_params = [self.fname, self.lname, self.dob, self.city, self.state, self.zipcode]

curs.callproc('myapp.new_user', callproc_params)
</pre>
<p>This is great! Zero manual quoting or string formatting at all! And no &#8220;SELECT&#8221;. Just call the procedure and pass the parameters. The only thing I had to change in my code was to make my &#8216;self.dob&#8217; into a datetime.date() object, but that&#8217;s super easy, and after that psycopg2 takes care of the type conversion from a Python date to a PostgreSQL date. Tomorrow I&#8217;m actually going to try calling callproc() with a list object inside the second argument. Wish me luck!</p>
<h3>A quick cursor gotcha</h3>
<p>I made a really goofy mistake. At the root of it, what I did was share a connection *and a cursor object* among all methods of a class I created to abstract database operations out of my code. So, I did something like this (this is not the exact code, and it&#8217;s untested. Treat it like pseudocode):</p>
<pre class="brush: python; title: ; notranslate">
class MyData(object):
   def __init__(self, dsn):
      self.conn = psycopg2.Connection(dsn)
      self.cursor = self.conn.cursor()

   def get_users_by_regdate(self, regdate, limit):
      self.cursor.arraysize = limit
      self.cursor.callproc('myapp.uid_by_regdate', regdate)
      while True:
         result = self.cursor.fetchmany()
         if not result:
            break
         yield result

   def user_is_subscribed(self, uid):
      self.cursor.callproc('myapp.uid_subscribed', uid)
      result = self.cursor.fetchone()
      val = result[0]
      return val
</pre>
<p>Now, in the code that uses this class, I want to grab all of the users registered on a given date, and see if they&#8217;re subscribed to, say, a mailing list, an RSS feed, a service, or whatever. See if you can predict the issue I had when I executed this: </p>
<pre class="brush: python; title: ; notranslate">
    db = MyData(dsn)
    for id in db.get_users_by_regdate([joindate]):
        idcount += 1
        print idcount
        param = [id]
        if db.user_is_subscribed(param):
            print &quot;User subscribed&quot;
            skip_count += 1
            continue
        else:
            print &quot;Not good&quot;
            continue
</pre>
<p>Note that the above is test code. I don&#8217;t actually want to continue to the top of the loop regardless of what happens in production <img src='http://www.protocolostomy.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  </p>
<p>So what I found happening is that, if I just commented out the portion of the code that makes a database call *inside* the for loop, I could print &#8216;idcount&#8217; all the way up to thousands of results (however many results there were). But if I left it in, only 100 results made it to &#8216;db.user_is_subscribed&#8217;. </p>
<p>Hey, &#8217;100&#8242; is what I&#8217;d set the curs.arraysize() to! Hey, I&#8217;m using the *same cursor* to make both calls! And with the for loop, the cursor is being called upon to produce one recordset while it&#8217;s still trying to produce the first recordset! </p>
<p>Tom Roberts, on the psycopg list, states the issue concisely: </p>
<blockquote><p>The cursor is stateful; it only contains information about the last<br />
query that was executed.  On your first call to &#8220;fetchmany&#8221;, you fetch a<br />
block of results from the original query, and cache them.  Then,<br />
db.user_is_subscribed calls &#8220;execute&#8221; again.  The cursor now throws away all<br />
of the information about your first query, and fetches a new set of<br />
results.  Presumably, user_is_subscribed then consumes that dataset and<br />
returns.  Now, the cursor is position at end of results.  The rows you<br />
cached get returned by your iterator, then you call fetchmany again, but<br />
there&#8217;s nothing left to fetch&#8230;</p>
<p>&#8230;So, the lesson is if you need a new recordset, you create a new cursor.</p></blockquote>
<p>Lesson learned. I still think it&#8217;d be nice if psycopg2 had more/better docs, though. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.protocolostomy.com/2009/12/01/python-postgresql-and-psycopg2s-dusty-corners/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

