Archive for the ‘Productivity’ Category

Design Patterns in System Administration

Sunday, August 3rd, 2008

Most readers of my blog know that I consult, in addition to usually having a day job. I started my career working for a consulting firm, and couldn’t let go of the endless fascinating problems that exist in the “technological landscape”, and in addition, the seemingly endless numbers of ways to solve them. I’ve learned more than tons about how people, and institutions, approach technical problems in system design, and maybe more importantly, how they think about the problems and solutions.

I’ve worked in huge enterprises (several Fortune 100 companies), academia (cs.princeton.edu, to be more exact), government (gfdl.noaa.gov, for example), and a few startups and small businesses. I also grew up around small business around the time that technology was starting to become affordable enough to creep into even small offices (I helped run wire for my father’s first modem-connected office network around 1988 or so, and my mother’s office — admittedly much larger — had a mainframe and a few terminals, from which ascii posters of JFK and MLK were printed and hung on my walls when I was as young as 7 or 8). Observing and working with people to solve technical problems continues to bring me a lot of joy, and present plenty of challenges.

Over the years, I’ve done a decent bit of what I’ll loosely call “programming”. 10 years ago I might not have qualified that, but working for 6 years in support of graduate computer science research has a way of humbling a guy (and, really, for most grad students, actually *doing* 6 years of graduate research is probably just as humbling, if not more). One thing I’ve tried to do is keep up with trends in how programs are deployed, how the teams of workers in what are considered separate problem domains interact to get the applications to be useful to people, how the systems are organized, and how programs are designed, and finally, how to program…. um… “better” (for some undefined but surely long-winded definition of that term). As I’m starting to witness something of a convergence of programming and systems work (at least in my neck of the woods), programming is something I’m spending even more time doing, and learning.

Design patterns, whether in the context of extreme programming, agile methodologies, or whatever the project management philosophy is, appear to be extremely useful, but I’ve wondered why there doesn’t appear to be any movement in the system administration community toward defining some patterns for solving problems in the realm of systems infrastructure architecture. A few years back I stumbled upon infrastructures.org, which I think is an excellent general methodology for building infrastructures, but I think a fuller treatment of the topic could be had. Preferably one that addresses a broader set of problems prevalent in a wider variety of environments. For example, I found the tools and methodologies there to map perfectly in government and academic environments, and portions of that work can be mapped onto small business problems, but it leaves enterprise environments, and some larger government environments with some unanswered questions or unaddressed problems.

I don’t blame the folks at infrastructures.org — on the contrary, I applaud their work! But why has it been so difficult to find solutions to problems those nice folks just didn’t have, or didn’t have to focus on in their part of the organization?

So much of what we do is tribal knowledge, or knowledge earned “the hard way” — in the trenches, at 4am, on a Sunday, uphill, both ways… etc., but while many of these stories sound similar enough to discern a pattern, and while horror stories at conferences are universally met with “me too”, and “you should’ve done x, y and z, and it wouldn’t have been an issue”, I have yet to see these patterns codified in any meaningful way in a single work, or perhaps, an organized volume of works (no, mailing lists do *not* constitute an organized volume of works).

If something as complex and diverse as programming can have patterns applied to it, I have to believe that the same could hold true for building systems. If there were such a work, it could potentially serve as a de facto “best practices” reference — one that could be referred to by both technicians and higher-level decision makers, define a common language that both could understand, and help overcome some of the inevitable “people issues” that sysadmins (and, indeed, managers) often blame for a lack of forward movement.

Does such a work exist? Is this in the works now? Though I try to keep my finger on the pulse of the publishing market, I have yet to see any real commitment to the idea that a large swath of problems in systems can be solved using variants of pre-defined patterns. It’s not that we’re not using them, of course, and it’s not that there aren’t large numbers of us who could probably recite them off the top of our heads, but if you’re one of those people, you’re a “senior” system administrator (or better), and if that’s the case, imagine what your career might’ve been like if you had such a reference, and also, let me know what the “you” with 1 year of sysadmin experience would’ve loved to have, or what the “you” of today would love to see the junior folks reading.

OSCON Day 2: Launching a Startup in 3 Hours

Tuesday, July 22nd, 2008

Launching a Startup in 3 Hours was a great talk given by Andrew Hyde (of techstars.org) and Gavin Doughtie (of Google). Both of the speakers are heavily involved in the recent trend of doing “Startup Weekends”, and techstars.org is an organization that hosts startup weekends all around the US (and I think internationally as well - Andrew mentioned one in Germany if I heard correctly).

The first half of the talk was about the general concept of a startup weekend, the problems it avoids (”we’ve been working for 9 months and haven’t launched anything”), the problems it brings up (”If you’re not using Java, you’re an idiot, so count me out!!”), and lots of details about how to organize, how to assign roles, and some common tools they use (like Basecamp and whatever your IM of choice is). There was also talk of legal issues, how (basically) to think about forming the company with the people involved, and decisions that need to be made at a business level aside from just the coding.
IMG_4514.JPG

The second half of the talk wasn’t a talk at all. Instead, people who had ideas stood up, presented their idea in a couple of sentences, and once the ideas were out there, we were told to break into groups and get to work! So people would get up and move over to the person whose idea they liked, and they’d start brainstorming. I decided to head out after about 30 minutes of observing and talking with people about ideas, but when I left, there were probably 6-8 groups of people engrossed in conversations, and the energy level was very high. Overall, it was a really exciting experience!

Day 1 of OSCON Begins, and More Tips for Conference-goers

Monday, July 21st, 2008

I got an early start. Too early. But I’m from the west coast, so my body thinks I slept in. I wandered around a bit, took a few pics which you can see at my Flickr OSCON set, and I discovered a couple of things that might be of interest:

  • The starbucks in the conference center charges over $2 for a small cup of joe. There’s a starbucks right across the street (you can see it from the breakfast area - seriously, it’s 5 seconds away), and they charge less than $2 for a medium (grande). That’s less than I pay at home.
  • The ATM outside the starbucks charges $3 for cash. I’ll report back when I find a cheaper one, but most places seem to take plastic here.
  • Every computer involved in this conference, from registration to the video screens that dot the common areas, are running Windows XP. Just sayin’.
  • The light rail system is free to go just about anywhere except for the airport, so there’s no excuse not to get out and see Portland and take in the food and beer and stuff.
  • For beer-lovers, not only is there the Oregon Brewers Festival starting at the tail end of this conference, but there’s apparently another festival that we missed *last* weekend!! Keep that in mind when you’re planning to come to OSCON next year.

Show Me Your Python SysAdmin One-Liners!

Wednesday, July 16th, 2008

Ah, the lazyweb. Today, I’m putting together content for a class I’m teaching on basic Linux administration, but during my meeting with a group of trainees to determine the scope of the course, they requested that I completely skip any coverage of “perl -e” one-liners, and show them the Python equivalents. Of course, I found this page, which has a few, but I figured I’d put out the call for more, just to get a good collection of ideas, and a higher-level idea of how people are using Python for system administration for ‘quick-n-dirty’ jobs. If I get a bunch of interesting ones, I’ll collect them all somewhere for easy reference (or add them to the wiki linked above?), so link this callout wherever pythonistas can be found.

Oddly enough, my experience with Python has me going in the completely opposite direction: I don’t write as many one-liners as I did with perl. If it’s not obvious to me how to do something with sed, awk, grep, find, xargs, and the “regular” tools, I write a Python script. I’ve tried remembering some things I used nasty Perl one-liners for, but I guess they were sufficiently nasty that I’ve forgotten them.

By the way, if you’re a sysadmin who writes their tools using Python, do consider giving a talk at this year’s PyWorks conference in November!

Useful stuff - 2008 - first half

Friday, July 11th, 2008

Having a Google account is sometimes useful in ways you hadn’t planned for. For example, at a few different employers I’ve been at, I’ve had to prepare for reviews by providing a list of accomplishments to my supervisor. One decent tool for generating this list is email, though it can take some time. Another useful tool is the Web History feature of your Google account.

Though this isn’t necessarily indicative of everything I’ve accomplished in the first half of 2008 per se, it’s definitely indicative of the types of things I’ve generally been into so far this year, and it’s interesting to look back. What does your Web History say?

  • Gearman - this is used by some rather large web sites, notably Digg. It reminds me a little of having Torque and Maui, but geared toward more general-purpose applications. In fact, it was never clear to me that PBS/Maui couldn’t actually do this, but I didn’t get far enough into Gearman to really say that authoritatively.
  • How SimpleDB Differs from a Relational Database - Links off to some very useful takes on the “cloud” databases, which are truly fascinating creatures, but have a vastly different data management philosophy from the relational model we’re all used to.
  • Reblog - I found this in the footer of someone’s blog post. It’s kinda neat, but to be honest, I think you can do similar stuff using the Flock browser.
  • Google Finance APIs and Tools - did I ever mention that I had a Series 7 & 63 license two months after my 20th birthday? I love anything that I can think for very long periods of time about, where there’s lots and lots and LOTS of data to play with, where you can make correlations and answer questions nobody even thought to ask. Of course, soon after finding this page I found the actual Google Finance page, which answers an awful lot of potential questions. The stock screener is actually what I was looking to write myself, but with the data freely available, I’m sure it won’t be long before I find something else fun to do with it. I’m not a fan of Google’s “Feeds” model, but I’ve dealt with it before, and will do it again if it means getting at this data.
  • Bitpusher - it was recommended to me as an alternative to traditional dedicated server hosting. Worth a look.
  • S3 Firefox Organizer - This is a firefox plugin that provides an interface that looks a lot like an FTP GUI or something, but allows you to move files to and from “buckets” in Amazon’s S3 service.
  • Boto - A python library for writing programs that interact with the various Amazon Web Services. It’s not particularly well-documented, and it has a few quirks, but it is useful.
  • OmniGraffle - A Visio replacement for Apple OS X. I like it a lot better than Visio, actually. It has tons of contributed templates. You shouldn’t have any trouble making the switch. A little pricey, but I plunked down the cash, and have not been disappointed.
  • The Python Queue Module according to Doug - Doug Hellmann’s Python Module of the Week (PyMOTW) should be published in dead tree form some day. I happen to have some code that could make better use of queuing if it were a) written in Python, and b) used the Queue module. I was a little put off by the fact that every single tutorial I found on this module assumed you wanted to use threading, which I actually don’t, because I’m not smart enough…. though the last person I told that to said something to the effect of “the fact that you believe that means you’re smart enough”. Heh.
  • MySQL GROUP modifiers - turns out this isn’t what I needed for the problem I was trying to solve, but the “WITH ROLLUP” feature was new to me at the time I found it, and it’s kinda cool.
  • Wordpress “Subscribe to Comments” plugin - Baron suggested that it would be good to have this, and I had honestly not even thought about it. But looking around, this is the only plugin of its kind that I found, and it’s only tested up to WP 2.3x, and I’m on 2.5x. This is precisely why I hate plugins (as an end user, anyway. Loghetti supports plugins) ;-)
  • Lifeblogging - I had occasion to go back and flip through some of the volumes of journals I’ve kept since age 12, wondering if it might be time to digitize those in some form. I might digitize them, but they will *not* be public I don’t think. Way too embarrassing.
  • ldapmodrdn - for a buddy who hasn’t yet found all of the openldap command line tools. You can’t use ‘ldapmodify’ (to my knowledge) to *rename* an entry.
  • Django graphs - I haven’t yet tried this, because I’m still trying to learn Django in what little spare time I have, but it looks like there’s at least some effort towards this out there in the community. I have yet to see a newspaper that doesn’t have graphs *somewhere* (finance, sports, weather…), so I’m surprised Django doesn’t have something like this built-in.
  • URL Decode UDF for MySQL - I’ve used this. It works really well.
  • Erlang - hey, I’m game for anything. If I weren’t, I’d still be writing all of my code in Perl.
  • The difference between %iowait in sar and %util in iostat - I use both tools, and wanted the clarification because I was writing some graphing code in Python (using Timeplot, which rocks, by the way), and stumbled upon the question. Google to the rescue!
  • OSCON ‘08 - I’m going. Are you going? I’m also going to the Oregon Brewers Festival on the last day of OSCON, as I did in ‘06. Wonderful!
  • Explosion at one of my hosting providers - didn’t affect me, but… wow!
  • hypertable - *sigh* someday…when there’s time…
  • Small-scale hydro power - Yeah, I’m kind of a DIYer at heart. I do some woodworking, all my own plumbing, painting, flooring, I brew my own beer, I cook, I collect rain in big barrels, power sprinklers using pool runoff to give my lawn a jumpstart in spring… that kind of stuff. One day I noticed water coming out of a downspout fast enough to leap over one of my rain barrels and thought there must be some way to harness that power. Sadly, there really isn’t, so I did some research. It’s non-trivial.
  • You bet your garden - I also do my own gardening and related experiments.
  • RightScale Demo - WATCH YOUR VOLUME - a screencast showing off RightScale’s features. Impressive considering the work it would take me, a lone admin, to set something like this up. The learning curve involved in effectively/efficiently managing/scaling/monitoring/troubleshooting EC2 is non-trivial.
  • Homebrew Kegerator - Maybe if this startup is bought out I can actually afford this thing to put my homebrewed beer in. The 30-year-old spare fridge in the basement is getting a little… gamey.
  • The pound proxy daemon - I use this. It works well enough, but I’ve crashed it under load, too. I’ve also had at least one hosting provider misconfigure it on my behalf, and I had to go and tell them how to fix it :-/
  • Droid Sans Mono - a fantastic coding font. Installing this font is in my post-install routine for all of my desktops.
  • Generator tricks for systems programmers - David Beazley has made available a lot of Python source code and presentation slides from what I imagine was a great talk (if you’re a systems guy, which I am).
  • The Wide Finder Saga - I found this just as I was writing Loghetti. There are still some things in Mr. Lundh’s code that I haven’t implemented, but it was a fantastic lesson.
  • Using gnu sort for IP addresses - I’ve used sort in a lot of different ways over the years… but not for IP addresses. This is a nice hack for pulling this off with sort, but it doesn’t scale very well when you have millions of them, due to the sort utility’s ‘divide and conquer’ method of sorting.
  • Writing an Hadoop/MapReduce Program in Python - this got me over the hump.
  • Notes on using EC2/S3 - This got me over some other small humps
  • BeautifulSoup - found while searching for the canonical way to screen scrape with Python. I’d done it a million times in Perl, and you can do it with httplib and regex and stuff in Python if you want, but this way is at least a million times nicer.

Well, that’s a decent enough summary I guess. As you can see, I’ve been doing a good bit of Python scripting. Most of my code these days is written in Python instead of Perl, in part because I was given the choice, and in part because Python fits my brain and makes me want to write more code, to push myself more. I’ve also been dealing with things involving “cloud” computing and “scalability” — like Hadoop, and EC2/S3. I haven’t done as much testing of the Google utility computing services, but I’ve used their various APIs for some things.

So what’s in your history?

Do I Even Care About the iPhone 3G?

Wednesday, July 9th, 2008

Steve Jobs is one of the best presenters you could ever hope to see. He’s great at tapping into that part of your brain that makes you just want whatever it is he’s holding. But this time, it was a little different.

You see, I already have an iPhone. I bought one in February. I didn’t buy the very first model that came out because it was lacking some stuff that was really important to me - most notably, it only had Apple apps, which was severely limiting, and IMAP support was limited to Yahoo! accounts, which was absurd. With those two obstacles out of the way, I found it useful enough to spend my employer’s money on, but not my own. In the end, it was a business decision, and I still think an iPhone is a better deal than a Blackberry hands down. Especially the new 3G, which has addressed some “enterprise” concerns. Thing is, I don’t care about any of that.

I want some really really really simple things that I haven’t heard anything about, and I want one thing that is perhaps slightly harder but essential.

The slightly-harder-but-essential thing is voice commands. I can hardly believe that were on the third generation of a phone without having voice commands. You can get a $30 Nokia made in 2002 that has voice commands for crying out loud. Without voice commands, it’s unclear to me how this phone is useful in a hands-free environment at all. Have I missed a feature somewhere? Is there something I can plug my iPhone into while I’m driving that will parse the voice commands and do the right thing with the iPhone? I know that if you have an Acura TL the car itself parses the voice commands, but I don’t know if there’s some generic thing that *doesn’t* cost $40k that’ll do the same basic job? Anyone?

The other stuff consists mainly of small application features:

  • The ability to bookmark or otherwise somehow save “Directions” in the Maps application. This way, if I’m driving, following the directions in Maps, and need to search for a gas station or coffee shop, I don’t then have to go back and punch in the information again to get my directions back.
  • Why the heck doesn’t mail let you read in landscape mode?!?!?!
  • I’d *REALLY* like to be able to send and receive photos in text messages. I don’t use it often, but when you need it, you need it.
  • The ‘.com’ shortcut should pretty much *always* be visible on the keyboard.
  • Make email alerts a per-account setting instead of only alerting for the default or for all accounts or whatever it is the iphone does now. Let me treat email accounts like phone contacts and assign different alert settings for each account just like I set different ring tones for different contacts.
  • Let me bookmark phone numbers so I can just hit a button on my home screen to dial them (in the absence of voice commands).
  • Make the bluetooth support do some neat trick that’ll make it be actually worth turning on.
  • I do a lot of system administration, and I’d love a usable, locally-installed ssh client that I don’t have to perform surgery to install. I don’t want to hack my phone, really. I also refuse to use a web interface to access an ssh client. If you’re doing that, stop right now, and go change every password you have.

On the development front, it would actually be really nice if they supported maybe *one* parsed scripting language for iPhone development. Even if they did like AppEngine and provided a somewhat stripped version of Python it would be something I could use. But that’s a rant for another day. :)

Cloud computing hype overload

Monday, July 7th, 2008

I’ve been working with what I used to call “utility computing” tools for about 6-9 months. However, for about the past 2 months, I’ve been seeing the term “cloud computing” all over the place, and there is so much buzz surrounding it that it’s reaching that magical point best described using Alan Greenspan’s words: “Irrational Exuberance”.

When Alan Greenspan used those words to describe the attitudes of investors toward the markets, what he was basically saying was that there were people who didn’t really know what they were doing, putting more money than they ought, into things they knew relatively little about. Further, he was saying that the decisions people were making with regards to where to put their money were a) bad, or at least b) not based on sound reasoning, or the ‘facts on the ground’.

This, I think, is where we are at with “cloud computing”. The blog post that put me over the edge is this one, for the record. I read Sean’s writings often enough, but this one strikes me as being a little off, a little sensationalistic, not based in reality, and a little misleading.

Maybe he just didn’t put enough qualifiers in there. His post might make more sense if he limited its scope and provided more facts, but I guess it’s just an opinion piece so he decided not to go that route, and that’s his prerogative I guess.

By limiting the scope, I mean he should’ve realized that there are millions of web sites currently scaling quite nicely without the use of cloud computing. In addition, some of the new ones that are having issues are also not using cloud computing, and when they hit bumps in the road, they make it through, and the great thing is that they also share their stories, and those stories indicate that a cloud (or, the current cloud offerings) wouldn’t have helped much (there’s lots of other evidence of that too). What would’ve helped is if they had paid more attention to:

  • monitoring
  • initial infrastructure design
  • their own app code and app design
These aren’t issues that cloud computing takes away. What’s more, cloud computing is something of a moving target, many of the solutions aren’t as mature as you’d want them to be if you’re betting the house on them (EC2 only recently got “elastic IPs” and persistent storage is still not there, AppEngine only supports Python and has some rather severe limitations on functionality of your app), and they introduce a potentially large learning curve both in terms of how the individual services work, as well as how the heck to make your app fit into the cloud solution of your choosing. Think SimpleDB scales? Well, it does, but it’s also not a relational database, and doesn’t guarantee…. much of anything, including data integrity. You can’t interface with it using the drivers, interfaces, and language you’re used to using, either, because it’s not just a mysql wrapper or something - it’s a new beast entirely. Enjoy!
This is not to mention, of course, that some people have absolutely no choice but to scale without the help of the cloud, because corporate policy, common sense, or other forces mean that they can’t have their data passing through non-corporate-owned machines and/or networks. Also, Sean omits any mention of the cost factor, which is often a huge driver in getting startups to use these services, but may not really make the move “worth it” in some cases.
Anyway, in short, all I’m really saying is that it’s disingenuous to say that the future of web computing is “the cloud” because “only the cloud can scale”. That’s just silly. Non-cloud infrastructures can scale fine depending on the balance between the demands of the application and the funds available. The future of web computing will probably involve shared, utility computing architectures, but the future doesn’t depend on cloud computing.

This is how I want all project web sites to look…

Thursday, July 3rd, 2008

My brain has a set of rules that software project websites get tested against. Each time a project site fails to comply with a rule, I get ever-so-slightly more annoyed, and ever-so-slightly less likely to use the software in question (if there are alternatives, this is even maybe not so “slightly”). 

I thought I’d list these rules because I suspect others are like me: we’re extremely busy, we work too many hours, and are involved with too many projects to spend hours trying to figure out what some piece of code someone mentioned once in IRC actually does. 

But first, know that this site actually complies with just about every single rule there is, so it’s a great template to work from if your site needs brushing up. 

  • First and foremost, tell me, right away, what this thing does, the problem it solves, and (at a high level) the approach taken to solve the problem. 
  • Tell me the language it’s written in. If it’s open source, and it’s written in a language I hack in, *and* it solves a problem I need solved, maybe I can help out, or be encouraged that if something flakes, I can fix it, or at least speak the developer’s language if I have to describe the issue to the folks upstream. 
  • Tell me what OS is required, and preferably what OS/version is tested with. 
  • Give me a full list of dependencies with links to go get them, or give me a link to “Dependencies”, or to an install document that lists them. 
  • Tell me the current version, and the date it was released. Beta versions and dates are nice too. If there is a timed release schedule, tell me that. 
  • Keep the information up-to-date. I shouldn’t have to wonder if your software is going to work under OS X 10.5 or RHEL 5, or if your plugin will work under the latest version of Drupal/Django/Moodle/MySQL/Joomla/Firefox…
  • BONUS: a very simple architectural drawing that shows me exactly what components make up the whole. The one for CouchDB is as good as any I’ve ever seen (assuming it’s accurate). 
  • BONUS: if screenshots are applicable, use them. They communicate a million times more information using a million times less real estate and bandwidth. They can communicate things you didn’t even know you were communicating. Of course, that could be good or bad, but it keeps you honest, and customers like that :-) 
For kicks, here are a few things I see sometimes on project web sites that I wish they *wouldn’t* do: 
  • DON’T require me to understand how something like Trac or some other tool works in order to get at the information about your software project. Navigation should not assume I’m a developer, it should assume I’m a prospective user who will leave if they can’t read the menu. If you want to use a project management tool to do your work, more power to you, but as a prospective customer, it’s none of my business — don’t drag me into your personal hell! I just want the software! 
  • DON’T be satisfied with the Sourceforge page as your project’s “homepage”. The problem with doing that is twofold: first, Sourceforge kinda sucks, and occasionally becomes unusable. Second, it doesn’t provide a simple way for you to give me information, nor a simple way for me to find it even if you produce said information using their tools. Also, it’s bad form. If you haven’t committed to the project enough to give it a proper site, well… 
  • DON’T put some kind of “Coming Soon” page with a bunch of information with *NO DATE*, because I’m going to go ahead and assume that this thing is vaporware, and that the “coming soon” post is 3 years old. Nothing in this world is more annoying than time-sensitive information being plastered on a web site with no date. 
  • DO NOT — I repeat — DO NOT force me to download a 20MB tarball to get at the documentation. That’s not how things work. I get to see what I’m downloading *before* I download it. You’ll save me some time, and save yourself some bandwidth, and you’ll have more accurate statistics about how many people download and use your software, because the numbers won’t be skewed by folks who were forced to download the package to get at the documentation. 
All of that said, I probably won’t use CouchDB, even though I love their project’s site. Javascript makes my brain explode, so mixing them with something like a database, which to me is the digital embodiment of sanity itself, is… insane. But if you’re someone who can deal with this concoction, I encourage you to check out CouchDB — at the very least, you can figure out if it might be a fit for you without clicking from their home page a single time. That just rocks. 

Simple S3 Log Archival

Tuesday, June 3rd, 2008

UPDATE: if anyone knows of a non-broken syntax highlighting plugin for wordpress that supports bash or some other shell syntax, let me know :-/

Apache logs, database backups, etc., on busy web sites, can get large. If you rotate logs or perform backups regularly, they can get large and numerous, and as we all know, large * numerous = expensive, or rapidly filling disk partitions, or both.

Amazon’s S3 service, along with a simple downloadable suite of tools, and a shell script or two can ease your life considerably. Here’s one way to do it:

  1. Get an Amazon Web Services account by going to the AWS website.
  2. Download the ‘aws’ command line tool from here and install it.
  3. Write a couple of shell scripts, and schedule them using cron.

Once you have your Amazon account, you’ll be able to get an access key and secret key. You can copy these to a file and aws will use them to authenticate operations against S3. The aws utility’s web site (in #2 above) has good documentation on how to get set up in a flash.

With items 1 and 2 out of the way, you’re just left with writing a shell script (or two) and scheduling them via cron. Here are some simple example scripts I used to get started (you can add more complex/site-specific stuff once you know it’s working).

The first one is just a simple log compression script that gzips the log files and moves them out of the directory where the active log files are. It has nothing to do with Amazon web services. You can use it on its own if you want:


#!/bin/bash

LOGDIR='/mnt/fs/logs/httplogs'
ARCHIVE='/mnt/fs/logs/httplogs/archive'
cd $LOGDIR
if [ $? -eq 0 ]; then
for i in `find . -maxdepth 1 -name "*_log.*" -mtime +1`; do
gzip $i
done

mv $LOGDIR/*.gz $ARCHIVE/.
else
echo "Failed to cd to log directory"
fi

Before launching this in any kind of production environment, you might want to add some more features, like checking to make sure the archive partition has enough space before trying to copy things to it and stuff like that, but this is a decent start.

The second one is a wrapper around the aws ’s3put’ command, and it moves stuff from the archive location to S3. It checks a return code, and then if things went ok, it deletes the local gzip files.


#!/bin/bash

cd /mnt/fs/logs/httplogs/archive
for i in `ls *.gz`; do
s3put addthis-logs/ $i
if [ $? -eq 0 ]; then
echo "Moved $i to s3"
rm -f $i
continue
else
echo "Failed to move $i to s3... Continuing"
fi
done

I wish there was a way in aws to check for the existence of an object in a bucket without it trying to cat the file to stdout, but I don’t think there is. This would be a more reliable check than just checking the return code. I’ll work on that at some point.

Scheduling all of this in cron is an exercise for the user. I purposely created two scripts to do this work, so I could run the compression script every day, but the archival script once every week or something. You could also write a third script that checks your disk space in your log partition and runs either or both of these other scripts if it gets too high.

I used ‘aws’ because it was the first tool I found, by the way. I have only recently found ‘boto‘, a Python-based utility that looks like it’s probably the equivalent of the Perl-based ‘aws’. I’m happy to have found that and look forward to giving it a shot!

Funny what you learn about yourself when you buy an iPhone

Tuesday, June 3rd, 2008

Not ripping off xkcd - this is seriously the best graphic I\'ve ever generated.

This is *not* a ripoff of xkcd (though I read that regularly, and so should you) - this is seriously the best graphic I can come up with, and it does the job. Yesterday I looked at doing all kinds of stuff to my iPhone. I wanted to see if I could get Python and a full-fledged Django installation on my iPhone and create the first web 2.0 application created completely from the bathroom. Just kidding, but I wanted to do some pretty evil stuff. Turns out that, as of now, there isn’t a clean simple way to get a lot of stuff to work without hacking the iPhone in some way, or resorting to things that are completely ridiculous. I’m sorry, but there has to be an easier way to get SSH and a terminal on there. It’s just not that critical right now, and this thing was damned expensive.

I decided to wait it out and see what comes our way over the summer.