Archive for the ‘Web Services’ Category

In the few years since Twitter’s launch, they have shrunk the number of ways you can interact with it, shrunk the number of hours in the day when you can reliably get or send messages through it, and, now, shrunk the number of useful web-based services by two, with the announcement that stikkit and I Want Sandy will be shutting down, as a result of their purchase of Values of N, creators of the two sites.

I only just started trying to use twitter again in the past week or so, and my desktop client has recorded at least 3 or 4 outages in the past week. Between that, and now this news, I think I’m just going to give up. What are they planning to do — add Sandy-like features to twitter? Why? To attract more users, insuring that their availability will sink to zero-nines? Blech.

I’ve been using Google Reader since it was created. I really love the *idea* of Google Reader. I like that scrolling through the posts marks them as read. I like that you can toggle between list and expanded views of the posts. I like that you can search within a feed or across all feeds (though selecting multiple specific feeds would be great).

All of that said, I’d like to explore other avenues, because I don’t like that there’s, like, zero flexibility in how the Google Reader interface is configured. My problem starts with large fonts…

I use relatively large fonts. If you increase the font twice up from the default size in firefox on a mac (using the cmd-+ keystroke, twice), and you have more than just a couple of feeds, you wind up with this really horrible side pane with the bottom half of it requiring a scroll bar, and the text wraps, and it just looks terrible. What makes this really REALLY REALLY annoying is that:

  1. I don’t use the features included in the *top* part of the side pane, ever, at all (like ‘trends’ and stuff), and
  2. You can’t resize or disable that part of the side pane.

I’ve used folders and some other features to try to alleviate the issue, but it’s just a compromise, and I’d rather not do that if something else would work better for me. I’ve had a couple of quick glances at just a couple of other readers, but I thought I’d get some input from the lazyweb to see what your thoughts are. Is there a browser-based feed reader that has some of Google’s niceties, but perhaps with a little bit nicer/more configurable interface? Out of curiosity, are you using a Mac-compatible fat-client reader that just totally r0cks in some way? If so, let me know in the comments.

Boto is a Python library for interacting with Amazon’s web services. I’ve used it in the past, and am currently using it for an ’s3get’ implementation based on a simple example I found buried in a post on Patrick Altman’s blog.

While testing my code, I noticed I was getting import errors from boto/connection.py, because I didn’t have a module on my system named ‘hashlib’. Then I found an svn trunk commit that clued me in to the fact that I wasn’t supposed to have hashlib, because I was running a pre-2.5 version of Python. They had put in a fix for pre-2.5 users, but somehow it wasn’t being obeyed.

Then I noticed that the import errors weren’t from utils.py, where the fix was committed, but from connection.py, which was explicitly importing the module itself. Closer inspection revealed that it was also importing utils.py, which itself imports hashlib. I commented out the explicit import in connection.py (and, later, in boto/s3/key.py), and stopped getting import errors.

If you’re still having issues with Patrick’s s3get.py code, it’s probably because you need to change this line:

if name == 'main':
       main()

To this:

if __name__ == '__main__':
    main()

I’ve been on both sides of the remote worker relationship. On the manager side, I’ve managed some good-sized projects using an all-remote work force. Indeed, I’ve hired, managed, fired, and promoted workers without ever knowing what they look like. On the worker side, I do most of my work remotely, and I have for some time now. Judging by the amount of repeat business I get, I’d say that I’m more than acceptably productive working remotely.

In dealing with various clients, recruiters, prospective employers, business owners, and talking to friends who manage people for a living, I’ve heard pretty much every excuse/reason there is for not wanting to deal with a remote work force. I’ve heard and experienced successes with remote workers as well, and they all have a few key things in common, which are missing from the stories of failure. I’ll talk about them in a minute.

I first want to just say that I’m not some kind of fanboy who thinks remote workers are the answer to every problem. There are valid reasons for not having remote workers. For example, it’d be hard to build cars with a remote work force. Some things (some!) just require a physical presence. Whoever maintains the printers at your company really has to be around to change out ink cartridges and stuff like that.

There are certain classes of jobs, though, that are well-suited to working remotely. There are even classes of jobs that are necessarily performed remotely to some degree (field sales and support technicians for example), that could be made 100% remote with the proper tools and processes in place.

So what makes a remote worker success story different from a story of failure?

Always be prepared…

The number one difference I’ve seen between success and failure in managing a remote work force is that  successful managers spent the time to prepare the managers, the team, the department, the organization, and the remote workers themselves to work remotely.

If you don’t prepare for a remote work force, you will fail miserably. As a result, I’m a big advocate of treating “Let’s go remote!” as an internal project with goals and milestones just like any other project. Preparing an organization to manage a remote work force takes a good deal of forethought, with a focus on communication and collaboration tools, reporting, accountability, scheduling, etc. In addition, you have to prepare the remote workers themselves, to insure they know what’s expected of them in terms of reporting their status, scheduling, communication, etc. They also need to know *about*, and *how to use* the tools they’ll be expected to use from home.

You have to plan this. You have to prepare, or you’re going to be like the HR manager who told me their company no longer allows for remote workers because “we tried it once and the guy made a complete mess of things”. When I asked the HR manager why he attributed that to the geographic location of the worker, he said “good point, he could just as well have made a mess here in the office”. You need good workers no matter where they’re going to work. The workers need expectations and goals from the manager, and the manager needs feedback and communication (and results!) from the worker. Tools help to facilitate these things. This is already a long post, so I’ll probably make a tools list in another post.

Communicate, and set expectations

Before the tools come other higher-level decisions and communication. For example, one problem I’ve heard more than once about remote workers is “we can’t hire a remote worker full-time, because then everyone will want to work from home”. As if they didn’t already all want to work from home! Everyone would love to have the option! Even if they didn’t take advantage of it, they’d consider it a really cool perk! They’d tell all of their friends about it, because it would make them jealous, and guess who their friends will contact first when they start to look for other opportunities?

You have to start somewhere, and you can’t just swing the barn doors open and let everyone go their own way on day 1. If you have an existing corporate structure in place with assets and services and regular meetings and the like, then you have to decide who can make the most benefit from a remote situation the soonest, make them the pilot group, and manage the expectations of the rest of the organization while the pilot group prepares to move to a remote workspace.

1, 10, 100, 1000

A common software application rollout strategy is to make it accessible to 1 user, then 10, then 100, then 1000, then… move up from there. In preparing your organization or department, you might consider a similar strategy.

I work for a client right now where I’m the “1″. If I can work effectively with the rest of the team (in the office), if I can produce results, remain accessible as-needed during working hours, manage the expectations of my team with regards to my presence (appointments happen), and overall be an asset to the team, then the management may decide that it can work on some larger scale - even if ‘larger’ means 2 instead of 1. It might also be useful to do a ‘remote rotation’ so that glitches can be caught early before making a physical presence in the office optional.

Success, of course, means getting together with the team and figuring out what tools will be used to best emulate an office working environment. We use IRC for 99% of our communication, falling back to email when we need to cc managers, we have a wiki for documentation and status updates, we have a trouble ticket system, everyone has everyone else’s phone number, blackberry PIN, or whatever. We’re a technical group doing system administration. It’s working wonderfully.

“But if the sysadmins work from home, the developers will want to work from home!” Maybe so. That’s where you have to manage expectations, and communicate with your workers to let them know that the company’s ‘office optional’ project is in an early alpha stage, that it’s being tested on the group most familiar with the technologies involved, and most capable of exploiting those technologies successfully to produce results. Once the geeks work out the shortcomings, and management is able to evaluate the effectiveness of the plan, the tests will become more widespread.

Really, it’s not a whole lot different from doing anything else that affects the whole company: changing payroll providers, healthcare options, software and desktop hardware upgrades and replacements… it just takes communication. The process has to be managed, just like every other process.

There’s more than one way to do it!

There’s no one solution out there. When I joined php|architect Magazine in 2003, it was run by Marco Tabini, and I was a remote editor. A couple of months after joining, I became editor in chief, and was in charge of remotely managing the magazine. I did it differently from Marco, but he still remained involved and engaged through good communication.

Python Magazine was created and managed by me, and for the entire lifespan of the magazine, I have not seen anyone else involved in its production in person. Ever. Design, production, web site admin, executive administration, tech editors, authors, accountants… time lines, budgets and planning documents… all remote, and mostly delegated. I started the magazine with the thought that at some point someone more engaged in the community and with Python should take charge — I was just a “temp” to get the vision off the ground. Sure enough, when I handed the magazine over to Doug Hellmann, he did things differently from me, and it’s working out wonderfully for him as well!

Everyone has their own management style. Don’t think that just because your management style is a little unique you can’t handle remote workers. Good managers are creative, and aren’t afraid to execute on creative solutions.

My Drupal Reunion

I started using drupal maybe 3-4 years ago. At the time I wasn’t all that impressed. I liked it better than Joomla (Mambo, at that time), and it was a little more featureful than PHP-Nuke. But even back then I hated that this thing was really making some sweeping, grand assumptions about what I would be using my Drupal site for. I used Drupal for LinuxLaboratory.org, and it was ok. I left Drupal once, to give MediaWiki a shot, but the truth is I didn’t want a wiki, so I went out and tested a bunch of other applications, and wound up back at Drupal. The 5.5 release was quite a bit better, and it got the job done.

About 2 weeks ago (maybe less?) I downloaded version 6.6. I poked. I prodded. I looked for new themes and found lots of them, and they were pretty cool. I looked for theme and module-building tutorials, and there were lots of them, and even entire books were published on each of the topics - even specifically for version 6 of Drupal. I looked for modules, and found a few useful ones who actually showed a trend of following the Drupal releases pretty closely. I also found that a couple of things I had used as modules in earlier releases were now built-in.

I fiddled on and off for a few days and was able to get a site together for my company’s web site that’s way, way better than the wordpress site that was there before. I’m also redoing the main LinuxLaboratory.org site using Drupal.

What about Django?

I know that lots of you were encouraging me to keep moving ahead with Django. I *will* be moving ahead with Django at some point, but what I found is that doing example projects using the dev server and deploying a real application using Apache are such vastly different beasts that doing the former doesn’t really help make you qualified to perform the latter. When I had my site ready to go, and I had it working on my locally-installed dev server, I found myself completely lost when it came time to get it working on my webfaction account. It really shouldn’t be that hard, but it is. Or it was for me.

You can all take comfort in knowing that I still hate PHP and consider it a necessary evil. For the moment, though, I have a couple of projects involving PHP coming up. By the time those projects end, I hope I can be more skilled with Django, and with Django deployment. I’m not even going to mess with the dev server anymore. It’s just a damn tease. I’m going to sit down and spend some time with Django on Apache with mod_* and finally come up with answers to all the questions I had that nobody anywhere seemed to have any reasonable answers to. When I figure them out, I’ll post here and you can all flame me or learn something new, perhaps depending on your own skill level :-)

In the mean time, while I don’t typically do book reviews, I’d recommend that anyone using Django 1.x stay away from the book “Practical Django Projects”. It’s specifically non-1.0, and you’ll be tripped up from the very first sample app, and it doesn’t get better from there. If you want to learn from the book (and there’s learning to be had from it), download 0.96.x, and use that to go through the book. When you’re done with the book, read the release notes for Django 1.0. You’ll have to make some alterations before moving your apps to 1.0, but overall you’ll be just fine.

I’m not someone who wakes up every day and looks at how my blog is ranked by all of the various services. I check out my WordPress stats, but that’s really about it. However, someone went and did some of the work for me, and they’ve decided that, of the blogs that they read or that were suggested to them, this blog ranks #20 in a listing of 25.

I’m really flattered, but wonder if it’s an indicator that this is a quality blog, or that they should aim higher in their blog reading ;-P  Either way, listing 25 bloggers in a flattering way is a fantastic marketing technique, because most of us are probably egomaniacal enough to say “Hey! Look!” and link back to the list on *your* blog, resulting in lots of traffic. Kudos, and thanks Mobile Maven!

I have a small EC2 instance running with a 25GB EBS volume attached. It has a database on it that I need to manipulate by doing things like dropping indexes and creating new ones. This is on rather large (multi-GB, millions of rows) tables. After running one DROP INDEX operation that ran all day without finishing, I killed it and tried to see what was going on. Here’s the results of the first 10 minutes of testing:

-bash-3.2# dd if=/dev/zero of=/vol/128.txt bs=128k count=1000
1000+0 records in
1000+0 records out
131072000 bytes (131 MB) copied, 0.818328 seconds, 160 MB/s

This looks great. I’d love to get 160MB/s all the time. But wait! There’s more!

-bash-3.2# dd if=/dev/zero of=/vol/128.txt bs=128k count=100000
dd: writing `/vol/128.txt': No space left on device
86729+0 records in
86728+0 records out
11367641088 bytes (11 GB) copied, 268.191 seconds, 42.4 MB/s

Ok, well… that’s completely miserable. Let’s try something in between.

-bash-3.2# dd if=/dev/zero of=/vol/128.txt bs=128k count=10000
10000+0 records in
10000+0 records out
1310720000 bytes (1.3 GB) copied, 15.4684 seconds, 84.7 MB/s

So the performance gets cut in half when the number of 128k blocks is increased 10x. This kinda sucks. I’ll keep plugging along, but if anyone has hints or clues, let me know. If this is the way it’s going to be, then this is no place to run a production, IO-intensive (100,000s and maybe millions of writes per day, on top of reads) database.

Short Version: You can find a fantastic video here about bundling customized AMIs and registering them with Amazon so that you can launch as many instances of your new AMI as you want. The video is so good that I don’t bother writing out the steps to do the bundling (it would be pretty darn long). These are some notes about launching an AMI, customizing it, and mounting an EBS volume to it (the video linked above doesn’t cover EBS). Also, check out the ElasticFox tool which is a very good GUI for doing simple EC2 operations. Nice if you’re just getting started or doing some simple tests.

There are two ways you can go about creating a custom machine image (AMI) for use with Amazon EC2: You can create an image locally by dd’ing to a file, mounting it with “-o loop” creating a filesystem on it, and bootstrapping the whole thing yourself, or you can grab an existing AMI that will serve as a “good enough” base for you to make your customizations, then bundle the customized image.

I’ll be talking about the latter option, where you identify a “good enough” image, customize it for your needs, and save that as your AMI. Unless you’re doing some kind of highly specialized installation, or are a control freak, you shouldn’t really need to start from scratch. I was just building a test image, and wanted a CentOS 5.2 base installation.

Here’s the command you can use to browse the AMIs you have access to (they’re either public, or they’re yours):

$ ec2dim -a

If that command looks funny to you, it’s likely because you’re used to seeing the really long versions of the AWS commands. Amazon also provides shorter versions of the commands. No, really - have a look! The long version of this command is:

$ ec2-describe-images -a

Too long for my taste, but it’s nice to know it’s there.

So, rather than start from scratch, I grabbed a base image that was close enough for my needs, and customized it. It’s a 5.1 base image, pretty well stripped of things that I don’t need, and a few that I do, but that’s ok. I’d rather start with less than more.

So step one is to launch an instance of the AMI I’ve chosen to be my ‘base’. Simple enough to do:

$ ec2run ami-0459bc6d -k ec2-keypair

And that’s pretty much it. It takes a couple of minutes (literally) for the machine to actually become available. You can check to see if it’s still in “pending” state or if it’s available by running ‘ec2din’. Without arguments, that’ll show you the status of any instances you have pending or runnning. Once the instance is running, you’ll be able to glean the hostname from the information provided.

An important note at this point: Don’t confuse “image” with “instance”. For the OO types in the crowd, an “image” is an object. It does nothing by itself until you instantiate it and create an “instance” of that object. For sysadmins, the “image” is like a PXE boot image, which does nothing until you boot it, thereby creating an “instance”.

The reason I used “PXE” and “object” in the above is because of the implication it makes: you can launch as many instances of an object as you want from a single object definition. You can boot as many machines as you want from a single PXE boot image. Likewise, you can launch as many Amazon EC2 instances from an image as you want.

So, in the time it took you to read those last two paragraphs, your instance is probably running. I now grab the hostname for my instance, and ssh to it using my keypair:

$ ssh -i ec2-keypair root@<hostname>

Now that I’m in, I can customize the environment, and then “bundle” it, which will create a new AMI with all of my customizations. With the instance in question, I installed a LAMP stack, and a few other sundry tools I need to perform my testing. I also ran “yum -y upgrade” which will go off and upgrade the machine to CentOS 5.2.

One thing I want to do with this instance is test out the process for creating an EBS volume. The two pieces of information I need to do this are the size of the volume I want to create, and the “zone” I want to create it in. You can figure out which zone your instance is running in using ‘ec2din’ on your workstation (not in your instance). I took that information and created my image in the same zone using the ‘ec2addvol’ command. If you don’t have that command on your workstation, then you don’t have the latest version of the Amazon command line tools. Here’s the command I ran:

$ ec2addvol -z us-east-1b -s 25

To see how it went, run ‘ec2dvol’ by itself and it’ll show you the status of all of your volumes, as well as the unique name assigned to your volume, which you’ll need in order to attach the volume to your instance. To do the ‘attachment’, you need the name of the volume, the name of the instance (use ‘ec2din’), and you need to choose a device that you’ll tell your instance to mount. Here’s what I ran (on my workstation):

$ ec2attvol -d /dev/sdx -i i-xxxxxxxx -v vol-xxxxxxxx

Now you can go back to the shell on your instance, mount the device, create a file system, create a mount point, add it to fstab, and, as they say in the UK, “Bob’s yer uncle”. By the time I wrote this post, I had already shut down my instance, but here are the commands (caveat emptor: this is from memory):

# mkfs.ext3 /dev/sdx
# mkdir /vol
# mount /dev/sdx /vol

If that all works ok, you can add a line to /etc/fstab so that it’ll be mounted at boot time, but I haven’t yet figured out how to attach a volume to an instance at boot time. The mount doesn’t work if you don’t attach the volume to the instance first. You’ll get a “device doesn’t exist” error if you try it. Clues hereby solicited. I assume I could probably use ‘boto’ and some Python code to get this done, but doing the same with a shell script wrapper around the Amazon tools might also be possible — but I don’t know how reliable that would be, because you’re at the mercy of Amazon and how they decide their tools should present the data (and *if* they provide the data you need for a particular operation down the road).

So now I have an EBS volume, and an instance. The volume is attached to the instance, and I can do things with it. I’m testing some database stuff, so I copied a database over to the volume, which was now mounted, so I could just ’scp mydb.tbz root@<instance>:/vol/.’

Once my database is there, I can attach it to pretty much whatever I want, which makes it nice, because I can test the same database, and the same database code, and see how the different size Amazon instances affect the performance, which gives me more performance data to work with. For production purposes, I’ll have to look more closely at the IO metrics, play with attaching multiple volumes and spreading out the IO, and I also want to test the ’snapshot’ capabilities. It’s also nice to know that if I needed to launch this in production (there are no plans to do so, but you never know), I could upgrade the database “hardware” more or less instantly :-D

If anyone has code or tools to help automate the management of all of this stuff, please send links! If I come up with any myself, I’ll most likely post it here.

Now that I have a customized AMI with all of my packages installed and my config changes made, I need to bundle this so that I can boot as many instances of this particular configuration as I want. An important note about bundling this *particular* image is that you MUST run ‘depmod -a; modprobe loop’ before bundling, since this process basically abstracts the manual process of bundling an image, which involves mounting a file as a volume, which requires a loopback mount.

The video I used to do the bundling is here, and if you can live through the disgustingly bad burps and chirps in the (Flash version) audio, it’s an excellent tutorial for bundling custom AMIs. While the process *is* pretty straightforward, it involves a number of steps, and the video goes through all of them, and it worked perfectly the first time through.

Hi all,

The schedule for PyWorks has been posted! I’m really excited about three things:

1) there are some really cool talks that I’m looking forward to attending. There are a couple of sysadmin-related talks, AppEngine, TurboGears, Django, and an area I’ve been especially slow to move into: testing (I know, shame on me). There’s lots more so be sure to check it out.

2) the conference scheduling process is over ;-)

3) I get to meet a lot of people face-to-face that I’ve worked with in the past on Python Magazine developing articles, or interacted with on IRC, etc. One thing I like about conferences surrounding open source technologies is you get to thank people face-to-face for the sweat they poured into some of the tools I use regularly. Mark Ramm, Kevin Dangoor, Michael Foord, Brandon Rhodes, and a collection of Python Magazine authors will be speaking there, and other Python Magazine folks and generally familiar faces will be in attendance.

Enjoy!

For those still unaware, PyWorks will be held in Atlanta, Nov. 12-14, 2008. It’s sponsored by MTA, the publisher of Python Magazine, as well as php|architect. In fact, the php|works conference will be held simultaneously with PyWorks, and attendees of one are free to access talks in the other at will. There will also be a “center track” that will cover some more generic topics of interest to developers without regard to the language in use. Check it out!

I have a lot of interaction with publishing types. I write a lot, and I edit some, and I do tech reviews and stuff for some publishers, and I co-authored a book, and I’ve worked on two magazines, and a newspaper, and I’m generally fascinated by the technical book market and stuff like that. I’m also someone who is lucky enough that his job is also his hobby. I work in technology, and am always doing something technology related at home in my spare time. Needless to say, I read tons upon tons of technical books.

I almost never post book reviews, in spite of the fact that I read all of these books. Why? Well, to be honest, I couldn’t tell you. It just hasn’t occurred to me to write a book review. Could be because I don’t really value book reviews too much myself I guess. I mean, if there’s a really obvious consensus across a huge number of reviews, I might be swayed. But in general, I find that book reviews are too often the target of astroturfing campaigns.

If there’s a tech book you’d like a review of that deals with things I’m generally into, let me know and I’ll post a review, if I’ve read it (or want to read it). Here are subjects I’m likely to have read books about in the past couple of years:

  • Linux, UNIX, and administration thereof
  • Python (all levels — I just read pretty much whatever is out there)
  • web 2.0 APIs (mostly Google and Amazon)
  • Any book about any service that can be run in a *x environment (DNS, Apache, DHCP, Jabber, and most other things that open a port)
  • Anything related to generic SQL, database design, or (more specifically) mysql and postgresql.
  • HPC (cluster computing)
  • Generic programming, software, computer science, or high-level systems design books
  • Digital photography (I have a Canon Digital Rebel, if that helps — I do *not* use Photoshop)
  • PHP
  • Maybe some other stuff I’m forgetting