Category: Uncategorized

Way Better Python Indentation In Vim

By m0j0, November 13, 2009 10:27 pm

Just wanted to share this tip I just stumbled across:

If you code in Python, and in your ~/.vimrc file you have ’set smartindent’, change that to ’set nosmartindent’ and add the following two lines (I’m told it can be combined into a one-liner, but I use two lines, so…)

filetype plugin on
filetype indent on

The indentation support for Python is much better. What I need to do now is write a little vim-fu that will tell vim to decrease the line indent if I enter two newlines or something, since there are no block delimeters (like braces or brackets) to tell vim about the block level.

Python, Creating XML, Recursion, and Order

By m0j0, November 4, 2009 9:05 pm

I love being challenged every day. Today I ran across a challenge that has several solutions. However, most of them are hacks, and I never feel like I really solved the problem when I implement a hack. There’s also that eerie feeling that this hack is going to bite me later.

I have an API I need to talk to. It’s pure XML-over-HTTP (henceforth XHTTP). Actually, the first issue I had was just dealing with XHTTP. All of the APIs I’ve dealt with in the past were either XMLRPC, SOAP, which have ready-made libraries ready to use, or they used GET parameters and the like, which are pretty simple to deal with. I’ve never had to actually construct an HTTP request and just POST raw XML.

It’s as easy as it should be, really. I code in Python, so once you actually have an XML message ready to send, you can use urllib2 to send it in about 2 lines of code.

The more interesting part is putting the request together. I thought I had this beat. I decided to use the xml.dom.minidom module and create my Document object by going through the DOMImplementation object, because it was the only thing I found that allowed me to actually specify a DTD programmatically instead of hackishly tacking on hard-coded text. No big deal.

Now that I had a document, I needed to add elements to it. Some of these XML queries I’m sending can get too long to write a function that manually creates all the elements, then adds them to the document, then creates the text nodes, then adds them to the proper element… it’s amazingly tedious to do. I really wish Python had something like PHP’s XMLWriter, which lets you create an element and its text in one line of code.

Tedium drives me nuts, so rather than code this all out, I decided to create a dictionary that mirrored the structure of my query, with the data for the text nodes filled in by variables.

query_params = {'super_special_query':
                   {
                      'credentials': {'username': user, 'password': password, 'data_realm': realm},
                      'result_params': {'num_results': setsize, 'order_by': order_by},
                       query_type: query_dict
                    }
                }

def makeDoc():
    impl = getDOMImplementation()
    dt = impl.createDocumentType("super_special", None, 'super_special.dtd')
    doc = impl.createDocument(None, "super_special", dt)
    return doc

def makeQuery(doc, query_params, tag=None):
    """
        @doc is an xml.minidom.Document object
        @query_params is a dictionary structure that mirrors the structure of the xml.
        @tag used in recursion to keep track of the node to append things to next time through.

    """

    if tag is None:
        root = doc.documentElement
    else:
        root = tag

    for key, value in query_params.iteritems():
        tag = doc.createElement(key)
        root.appendChild(tag)
        if isinstance(value, dict):
            makeQuery(doc, value, tag)
        else:
            root.appendChild(tag)
            tag_txt = doc.createTextNode(value)
            tag.appendChild(tag_txt)

    return doc.toxml()

doc = makeDoc()
qxml = makeQuery(doc, query_params)

This is simplistic, really. I don’t need to deal with attributes in my queries, for example. But it is generic enough that if I need to send different types of queries, all that’s required is creating another dictionary to represent it, and passing that through the same makeQuery function to create the query.

Initial testing indicated success, but that’s why you can’t rely on only simple initial tests. Switching things up immediately exposed a problem: The API server validated my query against a DTD that enforced strict ordering of the elements, and Python dictionaries do not have the same notion of “order” that you and I do.

So there’s the code. If nothing else, it’s a less-contrived example of what you might actually use recursion for. Tomorrow I have to figure out how to enforce the ordering. One idea is to have a separate list to consult for the ordering of the elements. It requires an extra outer loop to go through the list, get the name of the next tag, then use that value to ask the dictionary for any related values. Seemed like a good way to go, but I had a bit of difficulty figuring out how to make that work at all. Maybe with fresh eyes in the AM it’ll be more obvious — that happens a lot, and is always a nice surprise.

Ideas, as always, are hereby solicited!

Shell Scripting: Bash Arrays

By m0j0, January 15, 2009 11:50 pm

I’m actually not a huge fan of shell scripting, in spite of the fact that I’ve been doing it for years, and am fairly adept at it. I guess because the shell wasn’t really intended to be used for programming per se, it has evolved into something that sorta kinda looks like a programming language from a distance, but gets to be really ugly and full of inconsistencies and spooky weirdness when viewed up close. This is why I now recode in Python where appropriate and practical, and just about all new code I write is in Python as well.

One of my least favorite things about Bash scripting is arrays, so here are a few notes for those who are forced to deal with them in bash. 

First, to declare an array variable, you can assign directly to a variable name, like this: 

myarr=('foo' 'bar' 'baz')

Or, you can use the ‘declare’ bash built-in: 

declare -a myarr=('foo' 'bar' 'baz')

The ‘-a’ flag says you want to declare an array. Notice that when you assign elements to an array like this, you separate the elements with spaces, not commas. 

Arrays in bash are zero-indexed, so to echo the value of the first element of myarr, we do this: 

echo ${myarr[0]}

Now that you have an array, and it has values, at some point you’ll want to loop over it and do something with each value in the array. Almost anyone who utilizes an array will at some point want to do this. There’s a little bit of confusion for the uninitiated in this area. For whatever reason, there is more than one way to list out all of the elements in an array. What’s more, the two different ways act different if they are used inside of double quotes (wtf?). To illustrate, cut-n-paste this to a script, and then run the script: 

#!/bin/bash
myarr=('foo' 'bar' 'baz')
echo ${myarr[*]}
echo ${myarr[@]}
echo "${myarr[*]}"
echo "${myarr[@]}" # looks just like the previous line's output
for i in "${myarr[*]}"; do # echoes one line containing all three elements
   echo $i
done
for i in "${myarr[@]}"; do  # echoes one line for each element of the array.
   echo $i
done

Odd but true. The “@” expands each element of the array to its own “word”, while the “*” expands the entire set of elements to a single word. 

Another oddity — to get just a count of the elements in the array, you do this: 

echo ${#myarr[*]} 

Of course, this also works: 

echo ${#myarr[@]}

And the funny thing here is, these two methods do not appear to produce different results when inside of double quotes. I’d be hard pressed, of course, to figure out a use for counting the entire set of array elements as “1″, but it still seems a little inconsistent. 

Also note that you don’t have to count the elements in the array – you can count the length of any element in the array, too: 

echo ${#myarr[0]} 

That’ll return 3 for the array we defined above. 

Have fun!

WordPress 2.7 – Ahhhhhh!

By m0j0, December 11, 2008 8:12 am

I guess Wordpress doesn’t consider the changes they’ve made in 2.7 (released today) to be big enough to warrant a change to the major version number (which would make it 3.0). However, there are a few features now built-in that I’ve been dreaming about for so long that simply incrementing the second number seems to sell this version short. At least they named it after one of my favorite jazz musicians. This release is called “Coltrane”. Nice.

My top two feature requests: Check!

First and foremost, the number one thing on my list of desired features is now a reality: I can make bulk changes to the categories of my posts. So, when I add a category to Wordpress, and then realize that lots of my old posts really belong there, I don’t have to go searching around and changing them by hand. I still might take a stab at doing back-end automation here, by scripting a tool that’ll search the content of all of my posts, and if the content has, say, 2 out of 3 terms in my search criteria, it’ll add the post to the category, using whatever database trickery is necessary. However, this solves almost all of my needs (save my need to hack things, sometimes for its own sake).

The other feature I’ve been wanting for a long time is also now a reality: replying to comments without having to go to the post page to do it. You can now moderate and reply to comments right in the dashboard.

This, for me, is huge. I’ve been waiting for these two particular features since about 2005.

More Baked-in Goodness

Some other niceties are now built-in that used to be addon modules in Wordpress, which is great, because I’m always worried about third-party modules breaking and being abandoned as new WP releases come out. The nicest for me, as someone who maintains their own wp install, is the automated WP upgrade. Used to be an addon, now built in.

Another nice feature, if you *are* someone who doesn’t mind third party modules, is that now you can browse available modules, and install them, without leaving the wp interface.

Yes, another complete redesign

The admin interface has been completely overhauled, again. The last time they did this, a buddy and I discussed it, and although he felt one or two things were nicer, I felt that they had not addressed the biggest problems with the interface. Well, they fixed it by doing something I didn’t actually expect: they admitted defeat.

Instead of overhauling the interface, they’ve empowered the user to do it for themselves. Want the editor to fit the width of the browser window? No problem. Never use all of those features in the editing interface? Get rid of them. Only just noticed all those news items in the dashboard? Make them more prominent. You can do all of this by dragging and dropping things around, or collapsing them to ‘icon-only’ view.

I am writing this in 2.7, and in the editor interface, I definitely feel like more of what I need is readily available instead of buried somewhere in the countless blocks and sections and whatnot – which reminds me that there’s also a new (and quite nice) menu interface – also a part of the interface you can customize.

Check out the video and notes on the Wordpress site. The tour video does a great job of giving a quick rundown of the new features I’ve mentioned here, and lots and lots of features I *didn’t* cover.

MySQL Problem and Solution Posts: r0ck.

By m0j0, November 18, 2008 4:30 pm

Taming MySQL is… challenging. Especially in very large, fast-growth, ‘always-on’ environments. It’s one of those things where you seemingly can never know all there is to know about it. That’s why I really like coming across posts like this one from FreshBooks that describes a very real problem that was affecting their users, how they dealt with it, why *that* failed, and what the final fix was. Post a link to your favorite MySQL Problem and Solution post in the comments (oh yeah, and “subscribe to comments” should be working now!)

Microsoft Makes Progress, but Still Misses the Point

By m0j0, July 28, 2008 12:50 am

I’m still digesting certain parts of OSCON 2008, but I think I’ve finally settled on a conclusion for my thoughts on Sam Ramji’s presentation of Microsoft’s contributions to open source projects, and their progress toward rethinking how they do business to be more friendly toward competing platforms and technologies, so that other platforms can integrate with Microsoft products more easily.

I think the talk should’ve been entitled “What Microsoft Has Done for You Lately”. To Microsoft’s credit, they have, in fact, done things that would’ve been unimaginable just a few years ago. However, I think they sort of still miss the point — or they didn’t and just don’t know how to do anything about it.

One must admit that Microsoft has made strides in making various technologies run in a Windows environment. PHP can be run on IIS, and it’s a supported language in their development tools. They’ve opened up the SMB/CIFS spec so that the Samba team can create more robust tools to integrate Windows and UNIX environments, and they’ve become a major sponsor of the Apache Software Foundation. Things like this are nice, but I can’t really accept them as any kind of long-term, reliable solution. Sure, they’ve opened up the SMB spec, but what happens when Microsoft decides to change the spec, or they decide to deprecate SMB altogether?

The answer is that we’re left in the dust, and that leads to my point. The major difference between opening up a spec and opening up the source code is that if you open the spec and then decide to go in another direction, we’re left in the lurch. If you open the source code and decide to change directions, the community building tools around that functionality can decide for themselves to either change directions or fork and exec, so to speak. The real point here is that Microsoft is under no obligation to involve any of the communities with the Open Source world in discussions about their direction with regards to any of the technologies they use. They’re a closed-source, proprietary company, with a fiduciary responsibility to look out for their investors and their bottom line. If they see an opportunity to make more money because more customers want to see them do something that requires something other than SMB, they’re free to do that, and if I were a shareholder, I think I’d want them to do it, because I’d want my stock price to go higher.

I have yet to see a corporate entity involve the community in their direction with regards to technology. I have seen lots of hand-waving, a lot of lip service, and a lot of people speaking at conferences, mostly trying to hide really big elephants behind a well-worded (but otherwise empty) speech, at least vetted, if not written by, marketing and PR folks. Talks like this are pretty transparent.

I think the issue is a hard one. How do you involve a community mostly unconcerned about financial matters (or, at least, mostly not involved with them directly on a regular basis), in a decision-making process that necessarily involves coming to a solution that is “profitable”? Well, actually, I think the issue starts at a higher level than that. The real trouble is a situation which was beautifully illustrated in a talk by Robert “r0ml” Lefkowitz at this year’s OSCON: the corporations are viewing these interactions as taking place between “thinkers” and “doers”, instead of two “thinker” parties which happen to have different philosophies about how things should be done. Given the numerous cultural differences between the groups, one could forgive the corporate types for making the error (by the way, if anyone finds the “Praxis/Techne” talk by r0ml online in video format – post the link! There *was* a video camera there!)

Of course, specific to Microsoft, there are other issues besides simple matters of tech-level software interaction. There’s the issue of software patents, and Microsoft’s “promise” not to sue. There’s also the issue of standards, and MIcrosoft’s attempts to either own them, or destroy those which it can’t own. These are issues that are deep, cultural issues at Microsoft that no single person giving a talk at OSCON is in a position to solve (or to convince me they’ve solved). If Microsoft really does want to interact with the community and interoperate with our software, they should probably be prepared to spend years and dollars just to earn some level of trust. I sincerely hope that they are sincerely trying to open up, but I can’t help but have my doubts.

OSCON Day 1: The BoF Board, for your perusal

By m0j0, July 21, 2008 3:23 pm

I’ve posted a picture of the BoF board for day 1. Click on it to see bigger sizes. The full size image (maybe smaller) is perfectly suitable for reading at your leisure. I’ll update this if/when I see significant changes to it:

IMG_4477.JPG

Notes on Book Shopping from a Tech Bibliophile

By m0j0, June 3, 2008 9:00 pm

Hi. My name is Brian, and I’m a tech bibliophile.

I have owned more books covering more technologies than I care to admit. Some of my more technical friends have stood in awe of the number of tech books I own. I am also constantly rotating old books that almost *can’t* be useful anymore out of my collection because there’s just no room to keep them all, and it would be an almost embarrassingly large collection if not for the fact that I have no shame or guilt associated with my need for dead trees.

If you need further proof:

  • I have, on more than one occasion, suggested to my wife that we take a walk around our local mall so I could browse the computer section of the book store, not to buy, but just to keep up with the new titles and stuff.
  • Ok, I usually buy.
  • I also go into book stores whenever I’m out of town to get a comparison of what seems to be popular in different areas of the state/country/world.
  • I just got a head rush because I just remembered that, since I’m attending OSCON in Portland, OR this year, I’ll get to go back to Powell’s, which, mind you, has a huge, city-block-sized store, which is very nice, but *also* has an entire store dedicated to geekery that rivals anything like it that I’ve ever seen, and contains a computer museum! (You can see some shots of it in my flickr set from OSCON ‘06)
  • I once owned a book about VBScript.

I have also co-authored a book for O’Reilly, and in addition to my day job (I’m the director of IT for AddThis.com), I also work for a publisher, MTA, the publisher of Python Magazine, php|architect, as well as a line of books. Oh yeah. I’m into it. It’s bad.

I’ve learned quite a bit about buying books, and some of that learning came from unexpected places. There’s even more that I don’t know, but at least now I know that I don’t know it, and can try to figure out more stuff :-)

So here are a few things to keep in mind when you need to buy a technical book, or one just tugs at your impulse buy strings.

Give Any New Version 6 Months Before Buying a Book About It.

The first books about PHP 5 were dreadful. I never, ever return books to a book store, even if I don’t particularly care for them, but I returned a book about PHP 5 because the level of inadequacy was just insulting to me as a consumer. This was quite some time ago (when PHP 5 books first hit the shelves), and thinking about it now I’m still amazed at how terrible that book was. Of course, PHP 5 is just one example. Way, way back in the day (1998-9 or so?) when the first books about Java 2 hit the shelves (some might remember that booksellers actually put stickers over the part of the title that said “1.2″ when it was renamed “2″), I had the same experience.

It’s not exclusive to languages either. When the first MySQL books came out that said “covers mysql 5″, they just barely covered MySQL 5. In fact, there’s a new edition of High Performance MySQL coming out that is *going* to say “covers MySQL 5.1″ on it, and it’s not really going to cover much about 5.1, so says one of the books authors (whose honesty I greatly appreciate, by the way – I’d love to see that from the various book publishers).

At the OS level, I’m mostly a Linux guy, and at this point I wouldn’t take a book about a specific version of a specific distribution of Linux if you paid me to take it. Those books are mostly rehashes of the last version of the book put together as marketing objects. I know, because when the “<distro><version> Bible” series first came out, I read them (I think RedHat was the only distro covered initially), and I followed up with later versions of the books, and was always disappointed. Nowadays, I don’t know how you can think that a book about something as fast moving as Fedora Core is going to be useful. Maybe if you’re learning it for the first time something like this can work out, but if you’re looking to exploit new features, you’re really better off just reading the release notes and changelog.

Lesson learned. Books take time to write, to edit, to format, to print, to distribute, and to get on the shelves. Keep that in mind when you see a book about Python 3000 on the shelves within days of a GA release of Python 3000. It’s likely that that book was completely written and in an editor’s hands 3 months ago, and writing for that project began probably 9 months ago… 9 months before Python 3000 was a reality in this example. Some changes can be accounted for during the writing process, but a book that is released 6 months after the release of a new technology is likely to be built on more solid ground (of course, this is only part of assessing the quality of the book – but I suspect it’s often overlooked).

I’d also like to note that this probably wasn’t the case quite so much in the days when, for each language or technology or application or whatever, there were far fewer titles in print on the topic, and an authoritative title was more easily identified. Nowadays, the number of books about Ruby is dizzying to witness on the shelves of your local retailer. I just don’t think there was a market to support that kind of sensationalistic publishing model back when, say, C++ hit the scene. Maybe I’m mistaken there and some more… distinguished folks can enlighten all of us.

Take reviews with several grains of salt

Book reviews are lame, unless you know the source. When I say “know”, I don’t mean “have heard of”. I mean “know” in the sense that you have some idea what this reviewer is working with on a day-to-day basis, you know what their leanings are within the technological landscape, and you recognize that person as an authority on some topic at least loosely related to the book being reviewed.

I wouldn’t put much faith in the reviews on Amazon unless it is an established title that’s in its second edition. First edition books that all of a sudden have 20 reviews on Amazon within the first week of the release are probably reviews done by other authors who work for the same publisher, or who have some other motivation for writing the review.

You can learn to identify lame reviews or astroturfing on sight (now that you’re aware of it, it’s not all that hard to recognize), so be on the lookout. If you can, google the reviewer by name. Some of those folks work for the same publisher, and should likely just be discarded. I hate astroturfing, but I guess the publishers feel like they have to do it to compete with everyone else who is doing it and creating buzz around their titles. Sad.

By the way, astroturfing in this context means sending everyone you know (and/or works for you, or wants to) to do reviews, talk up the book, link to the book’s web site, or the author’s blog (where his book is probably displayed prominently) or run ads on their blog, or mention the book on irc, digg, del.icio.us, slashdot, etc. If you get enough people to do this, it gives the impression that there’s a lot of buzz and “grass roots” enthusiasm around the book. Except the “grass” is fake. Hence “astroturfing”. This is the kind of thing that Digg fights against all the time. Mostly unsuccessfully. It goes like this:

  1. Big Media Inc publishes article on big web site.
  2. Big Media emails all editors, writers, bloggers, designers, etc., to go blog, talk, post, link, submit the article everywhere they can.
  3. Some of these people have multiple accounts on each service you can possibly submit links to, some have multiple blogs, they link to other peoples’ blogs who are also talking up the article… you get the idea.
  4. Big Media’s article is read by thousands, for no really good reason other than they happen to be good astroturfers.

…But I digress. Just take reviews with a grain of salt. Same goes for big numbers on Digg and other like services.

Look for “Timeless Tomes”

The K&R book on C is a timeless tome. The GoF book on Design Patterns is a timeless tome. Stevens on TCP/IP is a timeless tome. C.J. Date’s early Intro to Database Systems is a timeless tome. These books came out a shockingly long time ago considering how often they are referenced and recommended and handed down through generations of technologists. If you need a solid foundation in some technology like this, you should look for books on the topic that have stood the test of time.

However, time isn’t always your friend, and some of these tomes are enormous. That’s why there are books like “Learn Java in 24 Hours”. If you go after this type of book, fine. I have tons of books like this. Just know that going through it does NOT mean you “know” Java. See here for details.

Timeless tomes seem harder to find now that there are stores with 150,000 titles in stock. They get lost in the noise. They’re out there, though. I have a built-in Amazon storefront on LinuxLaboratory.org that I try to keep updated with books I have read and found genuinely useful. I’m a little behind on that, but the books there are a mix of huge tomes (Understanding and Deploying LDAP is enormous), and useful reference or “contextual” books that explain how to use a technology in a particular context (Perl for System Administration, for example, is a good book). The next book I need to add there is “The Art of SQL”, which completely rocks and I highly recommend if you *already know* SQL.

Look at the Copyright Date

Technology moves at break neck speed. Some books that are still on the shelves that say “PHP and MySQL” cover versions that aren’t even supported anymore. Oracle 8i books are still around. Some books about Apache only make passing references to Apache 2. It would take some time to sit around flipping through pages to figure out if the version you need information about is covered. If you have some familiarity with the subject, checking the copyright date is a quick reference that can let you know if this book is the one you need. It can also help you avoid the dreaded “written before the technology was GA” problem mentioned above. If you know that FooLang 24 came out in February of 2008, the book in your hands that says “FooLang 24″ on the cover should not have a 2007 copyright, ideally.

Be Wary of Growth in Second Editions

First: there are “Volumes” and there are “Editions”. A second volume is a completely different book from the first volume. A second *edition* is an updating of the first edition. It’s the same basic material. Or… that’s how it used to be. Nowadays, marketing sometimes dictates that new editions should include whole new sections about new and exciting buzzwords of the day and stuff like that. Have you seen the most recent edition of “Programming Python”? It’s probably the thickest technology book I own, even beating out Understanding and Deploying LDAP Directories. I have no idea if anything in there was put upon the author by O’Reilly – and I’m not making accusations (I’ve worked for O’Reilly and have no reason to believe they’re guilty of this practice) – I’m just saying that the first edition was probably around half the size of the second.

For what it’s worth, I own the latest edition of Programming Python, and am not sorry I bought it. In my editing work for Python Magazine, I came across code that used seemingly every conceivable Python module, and I had to be able to quickly reference and read up on stuff that was in unfamiliar territory. Of course, we have tech editors (who rock, by the way), but I still needed to make sure the text was explaining things in a way that made sense and didn’t contradict the code (or vice versa). That book covers a ton of stuff, and I was glad to have it.

I’ve worked with a good number of publishers, and I have definitely been encouraged to make mention of different things I had no interest in writing about, because it was good for Google rankings, or blog buzz, or tag clouds, or whatever. I have friends in tech publishing circles (and tech authors) who have confirmed that this *does* happen.

Understanding that publishers, no matter how granola they look, run businesses, and businesses need to grow and make money, which is an enormously large feat to pull off in publishing. Eventually, they hire marketing people, and priorities can conflict, and bad things can happen. This is not a diatribe against the publishers. It’s a guide for the reader and technical bibliophiles.

My $.02.

As usual, the more information the better, so share your thoughts!!

Simple S3 Log Archival

By m0j0, June 3, 2008 4:04 pm

UPDATE: if anyone knows of a non-broken syntax highlighting plugin for wordpress that supports bash or some other shell syntax, let me know :-/

Apache logs, database backups, etc., on busy web sites, can get large. If you rotate logs or perform backups regularly, they can get large and numerous, and as we all know, large * numerous = expensive, or rapidly filling disk partitions, or both.

Amazon’s S3 service, along with a simple downloadable suite of tools, and a shell script or two can ease your life considerably. Here’s one way to do it:

  1. Get an Amazon Web Services account by going to the AWS website.
  2. Download the ‘aws’ command line tool from here and install it.
  3. Write a couple of shell scripts, and schedule them using cron.

Once you have your Amazon account, you’ll be able to get an access key and secret key. You can copy these to a file and aws will use them to authenticate operations against S3. The aws utility’s web site (in #2 above) has good documentation on how to get set up in a flash.

With items 1 and 2 out of the way, you’re just left with writing a shell script (or two) and scheduling them via cron. Here are some simple example scripts I used to get started (you can add more complex/site-specific stuff once you know it’s working).

The first one is just a simple log compression script that gzips the log files and moves them out of the directory where the active log files are. It has nothing to do with Amazon web services. You can use it on its own if you want:

#!/bin/bash

LOGDIR='/mnt/fs/logs/httplogs'
ARCHIVE='/mnt/fs/logs/httplogs/archive'
cd $LOGDIR
if [ $? -eq 0 ]; then
for i in `find . -maxdepth 1 -name "*_log.*" -mtime +1`; do
gzip $i
done

mv $LOGDIR/*.gz $ARCHIVE/.
else
echo "Failed to cd to log directory"
fi

Before launching this in any kind of production environment, you might want to add some more features, like checking to make sure the archive partition has enough space before trying to copy things to it and stuff like that, but this is a decent start.

The second one is a wrapper around the aws ’s3put’ command, and it moves stuff from the archive location to S3. It checks a return code, and then if things went ok, it deletes the local gzip files.

#!/bin/bash

cd /mnt/fs/logs/httplogs/archive
for i in `ls *.gz`; do
s3put addthis-logs/ $i
if [ $? -eq 0 ]; then
echo "Moved $i to s3"
rm -f $i
continue
else
echo "Failed to move $i to s3... Continuing"
fi
done

I wish there was a way in aws to check for the existence of an object in a bucket without it trying to cat the file to stdout, but I don’t think there is. This would be a more reliable check than just checking the return code. I’ll work on that at some point.

Scheduling all of this in cron is an exercise for the user. I purposely created two scripts to do this work, so I could run the compression script every day, but the archival script once every week or something. You could also write a third script that checks your disk space in your log partition and runs either or both of these other scripts if it gets too high.

I used ‘aws’ because it was the first tool I found, by the way. I have only recently found ‘boto‘, a Python-based utility that looks like it’s probably the equivalent of the Perl-based ‘aws’. I’m happy to have found that and look forward to giving it a shot!

Panorama Theme by Themocracy