Archive for the ‘Big Ideas’ Category

I’ve been on both sides of the remote worker relationship. On the manager side, I’ve managed some good-sized projects using an all-remote work force. Indeed, I’ve hired, managed, fired, and promoted workers without ever knowing what they look like. On the worker side, I do most of my work remotely, and I have for some time now. Judging by the amount of repeat business I get, I’d say that I’m more than acceptably productive working remotely.

In dealing with various clients, recruiters, prospective employers, business owners, and talking to friends who manage people for a living, I’ve heard pretty much every excuse/reason there is for not wanting to deal with a remote work force. I’ve heard and experienced successes with remote workers as well, and they all have a few key things in common, which are missing from the stories of failure. I’ll talk about them in a minute.

I first want to just say that I’m not some kind of fanboy who thinks remote workers are the answer to every problem. There are valid reasons for not having remote workers. For example, it’d be hard to build cars with a remote work force. Some things (some!) just require a physical presence. Whoever maintains the printers at your company really has to be around to change out ink cartridges and stuff like that.

There are certain classes of jobs, though, that are well-suited to working remotely. There are even classes of jobs that are necessarily performed remotely to some degree (field sales and support technicians for example), that could be made 100% remote with the proper tools and processes in place.

So what makes a remote worker success story different from a story of failure?

Always be prepared…

The number one difference I’ve seen between success and failure in managing a remote work force is that  successful managers spent the time to prepare the managers, the team, the department, the organization, and the remote workers themselves to work remotely.

If you don’t prepare for a remote work force, you will fail miserably. As a result, I’m a big advocate of treating “Let’s go remote!” as an internal project with goals and milestones just like any other project. Preparing an organization to manage a remote work force takes a good deal of forethought, with a focus on communication and collaboration tools, reporting, accountability, scheduling, etc. In addition, you have to prepare the remote workers themselves, to insure they know what’s expected of them in terms of reporting their status, scheduling, communication, etc. They also need to know *about*, and *how to use* the tools they’ll be expected to use from home.

You have to plan this. You have to prepare, or you’re going to be like the HR manager who told me their company no longer allows for remote workers because “we tried it once and the guy made a complete mess of things”. When I asked the HR manager why he attributed that to the geographic location of the worker, he said “good point, he could just as well have made a mess here in the office”. You need good workers no matter where they’re going to work. The workers need expectations and goals from the manager, and the manager needs feedback and communication (and results!) from the worker. Tools help to facilitate these things. This is already a long post, so I’ll probably make a tools list in another post.

Communicate, and set expectations

Before the tools come other higher-level decisions and communication. For example, one problem I’ve heard more than once about remote workers is “we can’t hire a remote worker full-time, because then everyone will want to work from home”. As if they didn’t already all want to work from home! Everyone would love to have the option! Even if they didn’t take advantage of it, they’d consider it a really cool perk! They’d tell all of their friends about it, because it would make them jealous, and guess who their friends will contact first when they start to look for other opportunities?

You have to start somewhere, and you can’t just swing the barn doors open and let everyone go their own way on day 1. If you have an existing corporate structure in place with assets and services and regular meetings and the like, then you have to decide who can make the most benefit from a remote situation the soonest, make them the pilot group, and manage the expectations of the rest of the organization while the pilot group prepares to move to a remote workspace.

1, 10, 100, 1000

A common software application rollout strategy is to make it accessible to 1 user, then 10, then 100, then 1000, then… move up from there. In preparing your organization or department, you might consider a similar strategy.

I work for a client right now where I’m the “1″. If I can work effectively with the rest of the team (in the office), if I can produce results, remain accessible as-needed during working hours, manage the expectations of my team with regards to my presence (appointments happen), and overall be an asset to the team, then the management may decide that it can work on some larger scale - even if ‘larger’ means 2 instead of 1. It might also be useful to do a ‘remote rotation’ so that glitches can be caught early before making a physical presence in the office optional.

Success, of course, means getting together with the team and figuring out what tools will be used to best emulate an office working environment. We use IRC for 99% of our communication, falling back to email when we need to cc managers, we have a wiki for documentation and status updates, we have a trouble ticket system, everyone has everyone else’s phone number, blackberry PIN, or whatever. We’re a technical group doing system administration. It’s working wonderfully.

“But if the sysadmins work from home, the developers will want to work from home!” Maybe so. That’s where you have to manage expectations, and communicate with your workers to let them know that the company’s ‘office optional’ project is in an early alpha stage, that it’s being tested on the group most familiar with the technologies involved, and most capable of exploiting those technologies successfully to produce results. Once the geeks work out the shortcomings, and management is able to evaluate the effectiveness of the plan, the tests will become more widespread.

Really, it’s not a whole lot different from doing anything else that affects the whole company: changing payroll providers, healthcare options, software and desktop hardware upgrades and replacements… it just takes communication. The process has to be managed, just like every other process.

There’s more than one way to do it!

There’s no one solution out there. When I joined php|architect Magazine in 2003, it was run by Marco Tabini, and I was a remote editor. A couple of months after joining, I became editor in chief, and was in charge of remotely managing the magazine. I did it differently from Marco, but he still remained involved and engaged through good communication.

Python Magazine was created and managed by me, and for the entire lifespan of the magazine, I have not seen anyone else involved in its production in person. Ever. Design, production, web site admin, executive administration, tech editors, authors, accountants… time lines, budgets and planning documents… all remote, and mostly delegated. I started the magazine with the thought that at some point someone more engaged in the community and with Python should take charge — I was just a “temp” to get the vision off the ground. Sure enough, when I handed the magazine over to Doug Hellmann, he did things differently from me, and it’s working out wonderfully for him as well!

Everyone has their own management style. Don’t think that just because your management style is a little unique you can’t handle remote workers. Good managers are creative, and aren’t afraid to execute on creative solutions.

Most readers of my blog know that I consult, in addition to usually having a day job. I started my career working for a consulting firm, and couldn’t let go of the endless fascinating problems that exist in the “technological landscape”, and in addition, the seemingly endless numbers of ways to solve them. I’ve learned more than tons about how people, and institutions, approach technical problems in system design, and maybe more importantly, how they think about the problems and solutions.

I’ve worked in huge enterprises (several Fortune 100 companies), academia (cs.princeton.edu, to be more exact), government (gfdl.noaa.gov, for example), and a few startups and small businesses. I also grew up around small business around the time that technology was starting to become affordable enough to creep into even small offices (I helped run wire for my father’s first modem-connected office network around 1988 or so, and my mother’s office — admittedly much larger — had a mainframe and a few terminals, from which ascii posters of JFK and MLK were printed and hung on my walls when I was as young as 7 or 8). Observing and working with people to solve technical problems continues to bring me a lot of joy, and present plenty of challenges.

Over the years, I’ve done a decent bit of what I’ll loosely call “programming”. 10 years ago I might not have qualified that, but working for 6 years in support of graduate computer science research has a way of humbling a guy (and, really, for most grad students, actually *doing* 6 years of graduate research is probably just as humbling, if not more). One thing I’ve tried to do is keep up with trends in how programs are deployed, how the teams of workers in what are considered separate problem domains interact to get the applications to be useful to people, how the systems are organized, and how programs are designed, and finally, how to program…. um… “better” (for some undefined but surely long-winded definition of that term). As I’m starting to witness something of a convergence of programming and systems work (at least in my neck of the woods), programming is something I’m spending even more time doing, and learning.

Design patterns, whether in the context of extreme programming, agile methodologies, or whatever the project management philosophy is, appear to be extremely useful, but I’ve wondered why there doesn’t appear to be any movement in the system administration community toward defining some patterns for solving problems in the realm of systems infrastructure architecture. A few years back I stumbled upon infrastructures.org, which I think is an excellent general methodology for building infrastructures, but I think a fuller treatment of the topic could be had. Preferably one that addresses a broader set of problems prevalent in a wider variety of environments. For example, I found the tools and methodologies there to map perfectly in government and academic environments, and portions of that work can be mapped onto small business problems, but it leaves enterprise environments, and some larger government environments with some unanswered questions or unaddressed problems.

I don’t blame the folks at infrastructures.org — on the contrary, I applaud their work! But why has it been so difficult to find solutions to problems those nice folks just didn’t have, or didn’t have to focus on in their part of the organization?

So much of what we do is tribal knowledge, or knowledge earned “the hard way” — in the trenches, at 4am, on a Sunday, uphill, both ways… etc., but while many of these stories sound similar enough to discern a pattern, and while horror stories at conferences are universally met with “me too”, and “you should’ve done x, y and z, and it wouldn’t have been an issue”, I have yet to see these patterns codified in any meaningful way in a single work, or perhaps, an organized volume of works (no, mailing lists do *not* constitute an organized volume of works).

If something as complex and diverse as programming can have patterns applied to it, I have to believe that the same could hold true for building systems. If there were such a work, it could potentially serve as a de facto “best practices” reference — one that could be referred to by both technicians and higher-level decision makers, define a common language that both could understand, and help overcome some of the inevitable “people issues” that sysadmins (and, indeed, managers) often blame for a lack of forward movement.

Does such a work exist? Is this in the works now? Though I try to keep my finger on the pulse of the publishing market, I have yet to see any real commitment to the idea that a large swath of problems in systems can be solved using variants of pre-defined patterns. It’s not that we’re not using them, of course, and it’s not that there aren’t large numbers of us who could probably recite them off the top of our heads, but if you’re one of those people, you’re a “senior” system administrator (or better), and if that’s the case, imagine what your career might’ve been like if you had such a reference, and also, let me know what the “you” with 1 year of sysadmin experience would’ve loved to have, or what the “you” of today would love to see the junior folks reading.

My brain has a set of rules that software project websites get tested against. Each time a project site fails to comply with a rule, I get ever-so-slightly more annoyed, and ever-so-slightly less likely to use the software in question (if there are alternatives, this is even maybe not so “slightly”). 

I thought I’d list these rules because I suspect others are like me: we’re extremely busy, we work too many hours, and are involved with too many projects to spend hours trying to figure out what some piece of code someone mentioned once in IRC actually does. 

But first, know that this site actually complies with just about every single rule there is, so it’s a great template to work from if your site needs brushing up. 

  • First and foremost, tell me, right away, what this thing does, the problem it solves, and (at a high level) the approach taken to solve the problem. 
  • Tell me the language it’s written in. If it’s open source, and it’s written in a language I hack in, *and* it solves a problem I need solved, maybe I can help out, or be encouraged that if something flakes, I can fix it, or at least speak the developer’s language if I have to describe the issue to the folks upstream. 
  • Tell me what OS is required, and preferably what OS/version is tested with. 
  • Give me a full list of dependencies with links to go get them, or give me a link to “Dependencies”, or to an install document that lists them. 
  • Tell me the current version, and the date it was released. Beta versions and dates are nice too. If there is a timed release schedule, tell me that. 
  • Keep the information up-to-date. I shouldn’t have to wonder if your software is going to work under OS X 10.5 or RHEL 5, or if your plugin will work under the latest version of Drupal/Django/Moodle/MySQL/Joomla/Firefox…
  • BONUS: a very simple architectural drawing that shows me exactly what components make up the whole. The one for CouchDB is as good as any I’ve ever seen (assuming it’s accurate). 
  • BONUS: if screenshots are applicable, use them. They communicate a million times more information using a million times less real estate and bandwidth. They can communicate things you didn’t even know you were communicating. Of course, that could be good or bad, but it keeps you honest, and customers like that :-) 
For kicks, here are a few things I see sometimes on project web sites that I wish they *wouldn’t* do: 
  • DON’T require me to understand how something like Trac or some other tool works in order to get at the information about your software project. Navigation should not assume I’m a developer, it should assume I’m a prospective user who will leave if they can’t read the menu. If you want to use a project management tool to do your work, more power to you, but as a prospective customer, it’s none of my business — don’t drag me into your personal hell! I just want the software! 
  • DON’T be satisfied with the Sourceforge page as your project’s “homepage”. The problem with doing that is twofold: first, Sourceforge kinda sucks, and occasionally becomes unusable. Second, it doesn’t provide a simple way for you to give me information, nor a simple way for me to find it even if you produce said information using their tools. Also, it’s bad form. If you haven’t committed to the project enough to give it a proper site, well… 
  • DON’T put some kind of “Coming Soon” page with a bunch of information with *NO DATE*, because I’m going to go ahead and assume that this thing is vaporware, and that the “coming soon” post is 3 years old. Nothing in this world is more annoying than time-sensitive information being plastered on a web site with no date. 
  • DO NOT — I repeat — DO NOT force me to download a 20MB tarball to get at the documentation. That’s not how things work. I get to see what I’m downloading *before* I download it. You’ll save me some time, and save yourself some bandwidth, and you’ll have more accurate statistics about how many people download and use your software, because the numbers won’t be skewed by folks who were forced to download the package to get at the documentation. 
All of that said, I probably won’t use CouchDB, even though I love their project’s site. Javascript makes my brain explode, so mixing them with something like a database, which to me is the digital embodiment of sanity itself, is… insane. But if you’re someone who can deal with this concoction, I encourage you to check out CouchDB — at the very least, you can figure out if it might be a fit for you without clicking from their home page a single time. That just rocks. 

Yes, the same folks who bring you Python Magazine and php|architect magazine (and several other things, like online training, a full line of books, and more conferences), are hosting our first ever Python conference! You can see more about it, and the Call for Papers, at the conference site.

The hotel which once hosted php|works in Atlanta is actually large enough to host both php|works *and* PyWorks in the same venue, so this year the two conferences will be held at the same time, and the plan/hope is to be able to have talks that are generic enough to be of use to either audience, like talks about scaling MySQL, or SVN management, or Hadoop, or Amazon Web Services, or something like that. In any case, the attendees of either conference will be allowed to cross over to attend talks at the other if they so choose, which I think is pretty cool. Maybe we can add more languages in future years and just have it be called “LANG ‘08″ or something. See the PyWorks Call for Papers if you have ideas!

I’m really excited to get down to Atlanta and meet the guys I’ve interacted with down there from the Python community, including Doug Hellmann, whom I’ve worked pretty closely with over the past several months. There’s also a thriving Python community in that area, I hear. Looking forward to it!

UPDATE - 2008-06-23 - A member of O’Reilly’s editing team commented that this privilege has *NOT* been discontinued, and all O’Reilly authors should receive a free Safari account. Thanks a bunch, Mary, for the clarification (see comments for more).

I learned from one of the authors of the recently released second (read: first, squared) edition of High Performance MySQL that O’Reilly apparently did away with the idea of giving O’Reilly book authors free Safari accounts. Lame.

I do not know why in the world they would discontinue this offering for authors. Perhaps they’re not aware, but a great many of the O’Reilly authors are also bloggers. Tech bloggers. Some of them write on the O’Reilly blogs themselves, but almost all of them blog outside of that arena as well. And guess what they blog about? Well, lots of stuff, but there’s plenty of blogging about “something I learned”, or “this book rocks”, etc. Heck, we even blog about products we use — I’ve even blogged about Safari… *today* even!

In a world where people are paid to blog about products, it surprises me that O’Reilly wouldn’t offer people who are already actively blogging in and around their content, and who have actually formally joined the O’Reilly family, the opportunity to become users, and thereby advocates, of their other offerings.

I am an O’Reilly author, and have a free Safari account (we’ll see how long that lasts after this post goes live). I can think of *plenty* of instances where I’ve recommended that people who don’t have an account try to get one, or try to get their employer to get them (or their whole site) an account. I consult (as do TONS of O’Reilly authors), and I’ve also recommended to my clients that they get Safari accounts for their technical staff. Had O’Reilly not offered me the free account, that would never have happened. I am confident that the amount of money grossed by O’Reilly due to my big mouth since 2005 is approaching 6 figures, if it hasn’t exceeded that already.

Not to mention the fact that having the account is a very real, very sincerely felt way to make the authors feel appreciated, because lord knows we don’t write books for the money.

So, Tim, do you think you could find it in your budget to give the guys with probably the most popular MySQL Performance blog (and probably consulting outfit as well) free Safari accounts? Please? If they agree to put some badge on their blog or something? (I’d gladly do that as well).

Here’s to hoping they see the light, fellas.

I’m going to the O’Reilly Open Source Convention (OSCON) again this year. I went in 2006 as well, and had a blast, in addition to learning quite a bit, and meeting tons of people whom I’ve been acquainted with online for a long time. That was 2 years ago. Since then I’ve been acquainted with lots *more* people online, and I’m hoping I’ll meet at least some of them this year.

If you’re not going to OSCON, you’re not only missing out on a great technical conference that will leave you physically tired from all of the activity and at the same time unable to sleep from the ideas sparked by the day’s events, you’re also missing the Oregon Brewers Festival, which takes place just as OSCON is wrapping up.

I have a medium-sized home brewery that a buddy and I built from scratch. Over the years we’ve brewed and tasted all kinds of beer. But you can’t get all beers everywhere, so traveling is a good opportunity to taste wild and exotic beers, or just local beers you can’t get at home. It’s odd, but while you can get easy access to beers from Germany, Belgium, Poland, England, Scotland, and Ireland, you would be hard pressed to find a good number of great beers from the West Coast of the US on the East Coast of the US. And the West Coast has a lot happening, beer-wise.

Beer festivals are also where some brewers pull out all the stops. In ‘06 I went with a buddy and actually had not one, but TWO different watermelon beers - a variety I had not even heard of until I showed up at the counter. One was pretty good, the other tasted like Watermelon Bubblicious, but the experience was fantastic. Every have rock candy made from hops? Pretty good I tell you!

Anyway, I was thinking of getting a larger group together to attend this years Brew Fest, so if any geeks out there have an interest in beer, let me know. And if you DON’T have an interest in beer, you should DEFINITELY let me know. I’ve converted numerous friends and family who say they don’t like beer to becoming more familiar with styles they actually will go out and buy, unprovoked, voluntarily! Saying you don’t like beer is like saying you don’t like food. There’s just too many kinds of beer to say you don’t like beer. Maybe you don’t like hops, in which case you might like hefeweizen, but have probably never heard of it. Maybe you don’t like really fizzy beer, in which case you might like various Belgian ales, a Barleywine, a porter, or any beer with a less fizzy, more creamy, or less prevalent head on it.

Anyway, I’m going, and it’s fun. If you have an interest, do join in, whether you go with a group I put together or not!

Startups are pretty fascinating. I work for a startup, and one of my good friends works for another startup. I’ve also worked for 2 other startups, one during the first “bubble”, and another one a few years later. Oh my, how the world of web startups has changed in that time!

1999: You must have funding

The first startup I was ever involved in was a web startup. It was an online retailer. They were starting from nothing. My friend (a former coworker from an earlier job) had saved for years to get this idea off the ground. He was able to get a few servers, some PCs for the developers he hired, and he got the cheapest office space in all of NYC (but it still managed to be a really cool space, in a way that only NYC can pull off), and he hosted every single service required to run the web site in-house. If I recall correctly, he had a web and database server on one machine, and I believe the primary DNS server was on an old desktop machine he had laying around the house. This gave him the ability to build the completely, 100%-functional prototype, and use it to shop for funding.

It worked. They got funding, they bought more and bigger servers. They got UPSes for them (yay!), they got more cooling, a nicer office, and they launched the site, pretty much exactly as the prototype existed, and things were pretty stable. Unfortunately, the VCs who took seats on the board after the first round of financing didn’t understand the notion of “The Long Tail”, so they eventually went under, but that’s not the point.

The point is, that was 8 or 9 years ago. It costed him quite a good bit of his hard-earned savings just to get to a place where he could build a prototype. A prototype! He only really knew Microsoft products, and buying licenses for Microsoft SQL Server, and the developer’s tools (I forgot what they were using as an IDE, but they were a ColdFusion shop) was quite a chunk of money. My friend really only had enough money to put together a prototype, and they were playing “beat the clock” — trying to get a prototype done, and shop for (and get) funding, before the money ran out, because they couldn’t afford the hardware, power, cooling, big-boy internet connection, and the rest of what goes into a production infrastructure. The Prototype->VC->Production methodology was pretty typical at the time.

2003: Generate Some Revenue

In 2003, a couple of years after the bubble burst, I was involved in another startup. This one was 100% self funded, but has been rather successful since. By this time, dedicated hosting was just affordable enough that it was doable for a startup that had some revenue being generated, and that’s what my friend did. He also outsourced DNS completely (through his registrar, if memory serves), but he still hosted his own email, backup, and some other services in-house. He had plenty of hiccups and outages in the first year, but overall it ran pretty well considering all of the things he *didn’t* have to be concerned with, like power, cooling, internet uplinks, cable trays, etc. The world was becoming a friendlier place for startups.

2008: Do it all, on the cheap

Nowadays, the world is a completely different place for startups, and a lot of this is due to the rich set of free (or very cheap) resources available to startups that make it possible for them to do a production launch without the VC funding that used to be required just to get the hardware purchased.

In 2008 you can outsource DNS for relatively little money, and it’ll have the benefit of being globally distributed and redundant beyond what you’re likely to build yourself. You can get Google Apps to host your email and share calendars and documents. You can store backups on Amazon’s S3. You can use something like Eclipse, Komodo Edit, or a number of language-specific environments like Wing IDE or Zend Studio to do “real development” (whatever your definition of that is) for free or relatively cheap. You can also get a free database that is reasonably well-stocked with features and capabilities, a free web server that runs 65%+ of the internet sites in existence, and if you have the know-how (or can get it), you can actually host anything you want, including your entire production infrastructure (within reason, and subject to some caveats) on Amazon’s EC2, for a cost which is tied to what you use, which is cheaper in a lot of cases than either buying or leasing a dedicated server. Multisourcing has arrived!

In looking at this progression from “you must have funding”, to “you’re going to need to generate a little revenue”, to “do it all, on the cheap”, the really obvious question this all raises is:

“Now what?”

Well, this whole 2008 situation is making things better, but… how do I put this… “It’s not soup yet”.

First of all, there is no single platform where you can realistically do everything. Google’s AppEngine is nice, but it has its limitations, for example, you don’t have any control over the web servers that run it, so you can’t, say, add an Apache mod_rewrite rule, or use a 301 redirect, or process your log files, etc. Troubleshooting this application based solely on input from people who are having issues with your app would be difficult.

Amazon’s service gives you far more control, and if you need it, that’s great, but it completely changes how you architect a solution. I think that some of these things are good changes, and are things we should all be thinking about anyway, but Amazon forces you to make decisions about how to plan for failure from the moment you decide to go this route — even if it’s for prototyping, because until persistent storage on EC2 is a reality available to the entire user base, whenever an EC2 instance disappears, so does every bit of the data you added to it. You’ll have to start from scratch when you bring up another instance. You’re also going to have to add more scripts and utilities to your toolbelt to manage the instances. What happen when one disappears? How do you fail over to another instance? How can you migrate an IP address to the running instance from the failed one? How you do all of these things, in addition to just building and installing systems, is different, and that means learning curve, and that means overhead in the form of time (and expense, since you have to pay for the Amazon services to use them to learn on).

There are also now “grid” solutions that virtualize and abstract all of your infrastructure, but give you familiar interfaces through which to manage them. One that I’ve used with some success is AppLogic, but other services like GoGrid and MediaTemple have offerings that emphasize different aspects of this niche “Infrastructure-as-a-service” market. Choose very carefully, and really think about what you’ll want to do with your infrastructure, how you want to manage it, monitor it, in addition to how you’ll deliver your application, and also think about how you’ll be flexible within the confines of a grid solution before you commit, because the gotchas can be kind of big and hairy.

None of these are whole solutions. However, any of them could, potentially, some day, become what we would now call a “perfect solution”. But it still wouldn’t be perfect in the eyes of the people who are building and deploying applications that are having to scale into realms known seemingly only inside some brain vault that says “Google” on it. What those of us outside of that vault would like is not only Google-like scalability, but:

  • global distribution, without having to pledge our souls in exchange for Akamai services. It’s great that I can build an infrastructure on EC2 or GoGrid, but I’d like to deploy it to 10 different geographic locations, but still control it centrally.
  • the ability to tightly integrate things like caching distribution network services with the rest of our infrastructure (because CDNs are great at serving, but not so much at metrics)
  • SAN-like (not NFS-like) access to all storage from any/all systems, without sacrificing the IO performance needed to scale a database properly.
  • As an admin, I want access to all logs from all services I outsource, no matter who hosts it. I don’t believe I can access, for example, our Google Apps logs, but maybe I’ve forgotten to click a tab somewhere.
  • A *RELATIONAL* database that scales like BigTable or SimpleDB

There’s more to it than this, even, but I’ve stopped short to make a point that needs making. Namely, that these are hard problems. These are problems that PhD candidates in computer science departments do research on. I understand that. The database issue is one that is of particular interest to me, and which I think is one of the hardest issues (not only because of its relationship to the storage issue, by the way). Data in the cloud, for the masses, as we’ve seen, involves making quite a few rather grandiose assumptions about how your schema looks. Since that’s not realistic, the alternative is to flatten the look of the data, and take a lot of the variables out of the equation, so they don’t have to make *any* assumptions about how you’ll use/organize the data. “Let’s make it not matter”. Genius, even if it causes me pain. But I digress…

The idea here is just to give some people a clue what direction (I think) people are headed in.

These are also very low-level wants. At a much, much, much higher level, I’d like to see one main, major thing happen with all of these services:

  • Get systems administrators involved in defining how these things are done

I’m not saying that because I want everything to stay the same and think a system administrator will be my voice in that or something. I do *NOT* want things to stay the same, believe me. I’m saying it because it seems pretty obvious to me that the people putting these things together are engineers, and not systems administrators. Engineers are the people you call when you want to figure out how to make something that is currently 16GB fit into 32MB of RAM. They are not the people you call when you want to provide a service/interface/grid/offering/whatever that allows systems folks to build what amounts to a company’s IT infrastructure on a grid/instance/App/whatever.

Here’s a couple of examples:

When I first launched an AppLogic grid, a couple of things jumped out at me. The partitions on the components I launched were 90% full upon first boot, they had no swap partition, and there was no consistency between OS builds, so you can’t assume that a fix on one machine can be blown out via dsh or clusterssh to the rest. The components were designed to be as small as possible, so as to use as little of the user’s alotted resources as possible. In addition, mount points created in the GUI management interface and then mapped to a component… don’t cause the component to have any clue what you just did, which raises the question “umm… why did I bother using the GUI to map this thing to this component if I just have to edit /etc/fstab and mount it in the running instance myself anyway? Back to consistency, this is unlike if you had, say, allocated more RAM or storage, or defined a new network interface on the device in the GUI.

There is no part of EC2 or S3 that looks like a sysadmin was involved in that. It’s a programmer’s platform, from what I can tell. For programmers, by programmers. Luckily, I have enough background in programming that I kind of “get it”, but while I might be able to convince myself that there are parallels between how I approach programming and building infrastructures, it still represents a non-trivial context switch for me to move from working deeply at one to working deeply at the other, so mixing the two as a necessity for moving forward is less than nice.

There is no “database in the cloud” service that looks remotely like there was a database systems person involved at all, that I can tell. I’ll confess to not having used BigTable or SimpleDB, but the reason is because I can’t figure out how to make it useful to me at the moment. These tools are not relational, and my data, though it’s been somewhat denormalized for scaling purposes (compromises taken carefully and begrudgingly - I’d rather change database products, but it’s not in the cards), is nonetheless relational. I’ve considered all kinds of object schemes for data in the past, and I still think that there’s some data for which that can work well, but it’s not currently a solution for me. Once you look at the overhead in managing something like EC2, S3, AppLogic, etc., the very last thing you need is the overhead of a changing data storage/manipulation paradigm.

Should I be hiring systems folks, or developers? Both? Ugh. Just when I thought you could finally get away with building a startup with nothing more than an idea, a sysadmin and a coder, here they go roping me back into hiring a team of developers… to manage the systems… and the data. No good (and I mean *NO GOOD*) can come of developers managing data. I know, I’ve seen ‘em do it.

All of that said, I use all of this stuff. Multisourcing is here to stay - at least until someone figures a whole bunch of stuff out to make unisourcing a viable alternative for systems folks, or they collectively redefine what a “systems person” is, which is an extremely real possibility, but is probably quite a ways off. My $.02. Flame at will ;-)

I went to a very good panel discussion yesterday hosted by the Center for Information Technology Policy at Princeton University. There has been a conference going on there that covers a lot of the overlap between technology, law, and journalism, and the panel discussion yesterday, Data Mining, Visualization, and Interactivity was even more enlightening than I had anticipated.

The panel members included Matt Hurst, of Microsoft Live Labs, Kevin Anderson, blog editor for The Guardian, and David Blei, a professor at the Computer Science Dept., Princeton University. This made for a very lively discussion, covering a wide range of perspectives about social media, “what is news?”, how technology is changing how people interact with information (including news), how the news game is changing as a result (which was far more fascinating than it sounds), and how this unfathomably enormous stream of bits, enabled by lots of open APIs, feeds, and other data streams can be managed, mined, reduced, and presented in some value-added way (part of the value being the sheer reduction in noise).

Cool Tools for Finding News

Some of the tools presented by the panelists were new to me, and aside from being great tools for bloggers and other content publishers, there are some excellent examples of how to make effective use of the data you have access to through APIs like the Digg API.

BlogPulse

This was presented by Matt Hurst. It’s is pretty neat - it’s a tool that essentially charts blog buzz of a given phrase over time, and it even lets you compare multiple phrases, which is really interesting as well. Check it out here.

I’d like to know more about how it derives the metrics, but in doing a couple of quick comparisons using the tool, it seems to line up to some degree with simple comparisons of the number of search results for different phrases on sites like technorati and bloglines. Interestingly, even though there appears to be lots more data available at Technorati, in my very limited experimenting, the percent difference between search results for any two phrases appears to be similar, indicating that bloglines may be a representative sampling of technorati data. More experimentation, of course, would be needed to lend any credibility whatsoever to that claim. It’s probably irrelevant, because you can’t ask either service for any kind of historical data regarding search results :)

Twistori

This has the potential to be really interesting. Right now, it lets you pick from several different terms, like “love”, “wish”, “think” and “feel”, and after clicking one of those, it’ll start producing a constantly updating stream of twitters that contain those words. If this experiment is successful, I would imagine they’d eventually enable the same service for arbitrary keywords, which would be really powerful, and quite a lot of fun!

Tweetwheel

Oh how boring my life according to twitter is. I’m still in the schizophrenic stage of settling on a live ‘update your friends on what you’re doing whether they care or not’ services. Facebook, myspace, twitter, jaiku… there are too many. I’m trying out the imified route now to consolidate all the cruft. According to tweetwheel, there are more places to update my status at any given moment than there are people who give a damn what my status is.

Anyway, tweetwheel shows how you’re connected to people through twitter. If you have lots of followers and follow lots of people, the wheel is really exciting to look at, as displayed by Kevin Anderson, who has a much more “robust” wheel than me — it’s actually interesting to look at. At some point I’d like to see this idea expanded to cover the other services like Facebook and even LinkedIn.

Digg Labs

You have to go to the Digg Labs site and see what people are doing with the Digg API. There are too many awesome utlities to cover them all here. It almost makes me wish I did fancy Flash UI stuff instead of back end data mining and infrastructure administration.

At a higher level…

Most of the discussion about social media seems to be about measuring buzz created by bloggers (at least where news/content publishing is concerned). However, although things have shifted dramatically in a ‘consumers are producers’ direction, causing people to start rethinking the definition of news, this shift is caused as much by consumers who are still *only* consuming as anyone else, and I didn’t see much in the way of tools that measure the interest of those people in any meaningful way. Perhaps the consensus is that the bloggers are a representative sampling of the wider internet readership? I don’t know. I would disagree with that if it were the case.

I work for AddThis.com, which seeks to provide publishers of news and all kinds of other content with statistics that help them figure out not just what pages people happen to be landing on, but which ones they have elected to take a greater interest in, either by emailing it to a friend, adding it to their favorites, or posting it to digg, delicious, or some other service. Maybe some day there will be an AddThis API that’ll let you easily do even more interesting things with social media.

I seem to have found a pattern in my own internal workings. In the fall, I work furiously and get a lot done. Around the time of the winter holidays, I almost always do major personal web site changes and upgrades according to a mental list I’ve compiled over the previous year.

In the spring, I shake off the winter (I’m not a fan of winter), I brew my first batch of beer for the season (which symbolizes the end of winter, because I brew outdoors), and my brain starts to be flooded with new ideas. They range from the simplistic (maybe we should consider replacing windows in the house this year), to the slightly odd (why isn’t there a bluetooth setup that pairs two devices and alerts you if they get out of range, so if my daughter strays too far…), to the really useful (I should really take on that woodworking project to build that bookcase we desperately need), to the GEEKY!

This year I seem to be having a lot of geeky ideas. The difference is that, this year, I finally feel empowered enough to go after some of them. One idea that has come up is building an online brewer’s workshop. I would just build a GUI to do this for myself, but then I’d have to deal with which widget set to use, which platforms to support, and whatever else. Also, the final step in the evolution of a lot of GUIs is webification anyway. So I *think* this might be a job for Python, and I *think* I might try to do this using Django, which is fully supported by my web host (finally - see yesterday’s post)!

Brewing is one of those things that you can make as complex as you care to get. I started brewing with a buddy using a Coleman picnic cooler, a few buckets, and some odds and ends from the kitchen. Now I have a full three keg system, with pumps, plate chillers (small plate heat exchangers), fancy false bottoms, cool valves and tubing, and it involves relatively little manual labor. And that complexity can infect recipe development as well. Hops add bitterness by leeching alpha acids into the wort (the liquid that is not yet beer). Hop utilization calculations can be non-trivial and depend on many other factors in your system. Other characteristics depend heavily on the percent of available sugars you’re able to extract from the grains, your ability to keep a mash at a given temperature for a fixed period of time. This is easier to predict if you know, for example, the thermal mass of the vessels involved, and how much heat will be lost when you combine water and grain and stir. There are also proteins at work in the mash which can gum things up enough to make draining the liquid off a chore, so knowing what water/grain ratio to use is also important. And how quickly can you bring wort from boiling down to a temperature more friendly to yeast at the end of the cycle?

That’s a small fraction of the considerations you *could* make when brewing. I didn’t even touch on pH and water characteristics, or yeast attenuation! Needless to say, brewing with any consistency would be a great challenge and take a good bit more preparation without some tool to help you figure out how much water you’ll need, how many ounces of hops for how long, and how much grain you need to mash (and for how long), etc. There are lots of tools to help brewers out with this kind of stuff (ProMash is a popular one). The problem I have is that these tools are mostly commercial, proprietary, platform-specific ventures. I’d like to put one on the web that is at least “good enough”, and free for anyone to use. I’m open source that way (I’m happy to release the source as well).

Another tool I’d love to see is one that would let me manage my consulting business online. If BestPractical’s RT had a good PayPal plugin that would let you charge per ticket or charge for a bundle of so many tickets or something, that’d be a good start, but I’ve mucked with the code for RT (it’s written mostly in Perl), and it wasn’t a pleasant experience. This wouldn’t be a complete solution either, because most of my work is *not* simple support tickets, it’s large projects. For those I’d like people to be able to pay invoices online. There’s lots more I’d like to add on top of that, but that’s the general gist of it, and in the past I’ve been unable to find a really good solution, where “really good” is a completely nebulous term barely defined in my own head. :)

In addition to those ideas, I registered a couple of domains over the past year, and I hope to do some cool things with them as well if I ever get some time away from work and consulting. Oh yeah - I’ll also continue working on loghetti! Keep any eye out for updates. Maybe some people reading this have similar interests and would like to collaborate. Ciao for now!

Every. Single. Day. This. Happens.

I read a *LOT* of online technical documentation. Come to think of it, I read a *LOT* of documentation offline as well. I also occasionally read things like blogs and comments and stuff. In all of my reading, I have found that the most prevalent mistake made by the writer in terms of grammar and spelling is using the word “loose” in place of “lose”. So here’s the rule:

“Lose” is a verb, as in “I will lose my job if I do that”, or “Please lose my number”. Other forms include “losing” as in “I’m losing my mind”, and the ever-popular “loser”, as in “That Anonymous Coward is such a loser”.

On the other hand, the word “loose” is an adjective, as in “he’s got a few screws loose” or “that development team has somewhat loose morals - I’ve seen them at conferences”, or “loosely coupled”.

How about this: if you want to describe loss, lose the extra “o” — copyright me, today. :-D