Archive for the ‘Sysadmin’ Category

I brought my MacBook Pro in for a warranty repair yesterday around noon. Since then I’ve been using a Lenovo T61 to get basic work done, and also to see if any progress has been made in the area of Linux support for my laptop. I bought this laptop specifically because a website said that it was very well supported by Linux distributions “out of the box”, including video and wireless. I was sure to make hardware choices that didn’t require special third-party drivers… I’ve been doing this for 10 years, so I have some understanding of how to buy a laptop that I plan to put Linux on. Well, this time I apparently failed.

First, I had Ubuntu installed, and I was never able to keep the wireless card working consistently. To be honest, Ubuntu is the best distro I’ve had on this thing so far. Next, I gave OpenSUSE 11 a shot, and there’s been no end to the issues. Of course, it started with the wireless card. I have an Intel 3945ABG wireless card, according to lspci and dmesg output. In fact, here’s my lspci output right here:

00:00.0 Host bridge: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub (rev 0c)
00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)
00:02.1 Display controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)
00:19.0 Ethernet controller: Intel Corporation 82566MM Gigabit Network Connection (rev 03)
00:1a.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #5 (rev 03)
00:1a.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801H (ICH8 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 (rev 03)
00:1c.1 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 2 (rev 03)
00:1c.2 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 3 (rev 03)
00:1c.3 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 4 (rev 03)
00:1c.4 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 5 (rev 03)
00:1d.0 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB Controller: Intel Corporation 82801H (ICH8 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB Controller: Intel Corporation 82801H (ICH8 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev f3)
00:1f.0 ISA bridge: Intel Corporation 82801HBM (ICH8M-E) LPC Interface Controller (rev 03)
00:1f.1 IDE interface: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) IDE Controller (rev 03)
00:1f.2 SATA controller: Intel Corporation 82801HBM/HEM (ICH8M/ICH8M-E) SATA AHCI Controller (rev 03)
00:1f.3 SMBus: Intel Corporation 82801H (ICH8 Family) SMBus Controller (rev 03)
03:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG Network Connection (rev 02)
15:00.0 CardBus bridge: Ricoh Co Ltd RL5c476 II (rev ba)
15:00.1 FireWire (IEEE 1394): Ricoh Co Ltd R5C832 IEEE 1394 Controller (rev 04)
15:00.2 SD Host controller: Ricoh Co Ltd R5C822 SD/SDIO/MMC/MS/MSPro Host Adapter (rev 21)
15:00.3 System peripheral: Ricoh Co Ltd R5C843 MMC Host Controller (rev ff)
15:00.4 System peripheral: Ricoh Co Ltd R5C592 Memory Stick Bus Host Adapter (rev 11)
15:00.5 System peripheral: Ricoh Co Ltd xD-Picture Card Controller (rev 11)

I’m running the KDE4 desktop, and tried using the default NetworkManager icon that’s in the systray to get things working. From what I saw there, it appeared that my card wasn’t scanning. I put in my network details manually, and tried to connect, and it failed with no errors. In the NetworkManager log there was lots of output, but nothing particularly useful. It just said the association took to long and that it was now marking that connection as ‘invalid’. Great. So here I am, trying to use Linux on the desktop, and only 5 minutes after the very first system boot, I’m tailing log files and debugging, and basically playing sysadmin, which is exactly what I don’t want to be doing on my desktop system. Restart NetworkManager, see what dhclient is doing, reboot, check /etc/modprobe.d, lsmod…. fail. Now what?

Well, I opened kwifimanager, and it said that I had indeed associated with an access point. So… I *am* scanning? Hmm. I had no IP address, so I figured I had probably fat-fingered my WEP settings somewhere. Tailing /var/log/messages agrees, saying WEP decryption is failing. So I double-check everything, all looks normal and correct to me, I try again, and No Bueno. *sigh*.

Finally, I reverted to command-line tactics, and ran this little line:

iwconfig wlan0 essid <myssid> key <mykey>

Magically, it works, where all of the GUI nonsense had failed. Now here’s a question: how the hell do you get this to “just work” at boot time? Well, I had about 10 emails to send to clients, so I put that question off and fired up a browser and…. fail. WTF?

I had an IP address, pinged my router, pinged another host on the network, all good. Pinged an external IP I know by heart, fail. Ugh. Ran ‘cat /etc/resolv.conf’ — empty. Apparently, dhclient didn’t update the information it got from my router. It also didn’t update when I set the domain in NetworkManager to ‘home’, because it still said ’search site’. I added the proper lines in there, and tried again in the browser… fail. Now what?!?

Ran ‘netstat -rn’. I don’t have a default gateway. *sigh*…

route add default gw 192.168.1.1

And I finally have internet access.

Of course, I can’t work 24 hours a day, so I went to bed, and left my laptop running so I could get right back to work in the morning. Or not.

I had foolishly chosen to use an OpenGL screensaver. Overnight, it completely locked up the machine, rendering it useless without forcibly rebooting it. So much for getting right back to work.

Well, let’s see if I can get some of these issues fixed by updating the software, since I’m now at least connected to the internet (of course, after the forced reboot, I had to do the iwconfig->route add routine again). Ran the updater, picked some extra repositories, and it goes off to set things up. Unfortunately, it also prompts me to import probably 50 or so GPG keys. Annoying. More annoying is, after all of that, it fails to update any of my software, even though it tells me there are updates available. Why you ask? Here’s what I got…

Failed to mount cd:///?devices=/dev/sr0 on /var/adm/mount/AP_0x00000001: No medium found (mount: No medium found)

Click ok. Get same error again. Click ok. Get slightly different error…

Unexpected exception. Failed to mount cd:///?devices=/dev/sr0 on /var/adm/mount/AP_0x00000001: No medium found (mount: No medium found)

Click Ok, get another message…

Please file a bug report about this. See http://en.opensuse.org/Zypper#Troubleshooting for instructions.

I go there, the URL isn’t valid. I find the Troubleshooting page on my own, and there’s a bunch of generic troubleshooting information there. More command line sysadmin-ish stuff in there. Just the kind of stuff I don’t need to be spending otherwise billable time on. I give up and decide that I’ll just deal with it in its broken-ass state for the next 10 hours or so until I can get my beloved MacBook Pro back.

I’ve been using Google Reader since it was created. I really love the *idea* of Google Reader. I like that scrolling through the posts marks them as read. I like that you can toggle between list and expanded views of the posts. I like that you can search within a feed or across all feeds (though selecting multiple specific feeds would be great).

All of that said, I’d like to explore other avenues, because I don’t like that there’s, like, zero flexibility in how the Google Reader interface is configured. My problem starts with large fonts…

I use relatively large fonts. If you increase the font twice up from the default size in firefox on a mac (using the cmd-+ keystroke, twice), and you have more than just a couple of feeds, you wind up with this really horrible side pane with the bottom half of it requiring a scroll bar, and the text wraps, and it just looks terrible. What makes this really REALLY REALLY annoying is that:

  1. I don’t use the features included in the *top* part of the side pane, ever, at all (like ‘trends’ and stuff), and
  2. You can’t resize or disable that part of the side pane.

I’ve used folders and some other features to try to alleviate the issue, but it’s just a compromise, and I’d rather not do that if something else would work better for me. I’ve had a couple of quick glances at just a couple of other readers, but I thought I’d get some input from the lazyweb to see what your thoughts are. Is there a browser-based feed reader that has some of Google’s niceties, but perhaps with a little bit nicer/more configurable interface? Out of curiosity, are you using a Mac-compatible fat-client reader that just totally r0cks in some way? If so, let me know in the comments.

Boto is a Python library for interacting with Amazon’s web services. I’ve used it in the past, and am currently using it for an ’s3get’ implementation based on a simple example I found buried in a post on Patrick Altman’s blog.

While testing my code, I noticed I was getting import errors from boto/connection.py, because I didn’t have a module on my system named ‘hashlib’. Then I found an svn trunk commit that clued me in to the fact that I wasn’t supposed to have hashlib, because I was running a pre-2.5 version of Python. They had put in a fix for pre-2.5 users, but somehow it wasn’t being obeyed.

Then I noticed that the import errors weren’t from utils.py, where the fix was committed, but from connection.py, which was explicitly importing the module itself. Closer inspection revealed that it was also importing utils.py, which itself imports hashlib. I commented out the explicit import in connection.py (and, later, in boto/s3/key.py), and stopped getting import errors.

If you’re still having issues with Patrick’s s3get.py code, it’s probably because you need to change this line:

if name == 'main':
       main()

To this:

if __name__ == '__main__':
    main()

Taming MySQL is… challenging. Especially in very large, fast-growth, ‘always-on’ environments. It’s one of those things where you seemingly can never know all there is to know about it. That’s why I really like coming across posts like this one from FreshBooks that describes a very real problem that was affecting their users, how they dealt with it, why *that* failed, and what the final fix was. Post a link to your favorite MySQL Problem and Solution post in the comments (oh yeah, and “subscribe to comments” should be working now!)

I’ve been on both sides of the remote worker relationship. On the manager side, I’ve managed some good-sized projects using an all-remote work force. Indeed, I’ve hired, managed, fired, and promoted workers without ever knowing what they look like. On the worker side, I do most of my work remotely, and I have for some time now. Judging by the amount of repeat business I get, I’d say that I’m more than acceptably productive working remotely.

In dealing with various clients, recruiters, prospective employers, business owners, and talking to friends who manage people for a living, I’ve heard pretty much every excuse/reason there is for not wanting to deal with a remote work force. I’ve heard and experienced successes with remote workers as well, and they all have a few key things in common, which are missing from the stories of failure. I’ll talk about them in a minute.

I first want to just say that I’m not some kind of fanboy who thinks remote workers are the answer to every problem. There are valid reasons for not having remote workers. For example, it’d be hard to build cars with a remote work force. Some things (some!) just require a physical presence. Whoever maintains the printers at your company really has to be around to change out ink cartridges and stuff like that.

There are certain classes of jobs, though, that are well-suited to working remotely. There are even classes of jobs that are necessarily performed remotely to some degree (field sales and support technicians for example), that could be made 100% remote with the proper tools and processes in place.

So what makes a remote worker success story different from a story of failure?

Always be prepared…

The number one difference I’ve seen between success and failure in managing a remote work force is that  successful managers spent the time to prepare the managers, the team, the department, the organization, and the remote workers themselves to work remotely.

If you don’t prepare for a remote work force, you will fail miserably. As a result, I’m a big advocate of treating “Let’s go remote!” as an internal project with goals and milestones just like any other project. Preparing an organization to manage a remote work force takes a good deal of forethought, with a focus on communication and collaboration tools, reporting, accountability, scheduling, etc. In addition, you have to prepare the remote workers themselves, to insure they know what’s expected of them in terms of reporting their status, scheduling, communication, etc. They also need to know *about*, and *how to use* the tools they’ll be expected to use from home.

You have to plan this. You have to prepare, or you’re going to be like the HR manager who told me their company no longer allows for remote workers because “we tried it once and the guy made a complete mess of things”. When I asked the HR manager why he attributed that to the geographic location of the worker, he said “good point, he could just as well have made a mess here in the office”. You need good workers no matter where they’re going to work. The workers need expectations and goals from the manager, and the manager needs feedback and communication (and results!) from the worker. Tools help to facilitate these things. This is already a long post, so I’ll probably make a tools list in another post.

Communicate, and set expectations

Before the tools come other higher-level decisions and communication. For example, one problem I’ve heard more than once about remote workers is “we can’t hire a remote worker full-time, because then everyone will want to work from home”. As if they didn’t already all want to work from home! Everyone would love to have the option! Even if they didn’t take advantage of it, they’d consider it a really cool perk! They’d tell all of their friends about it, because it would make them jealous, and guess who their friends will contact first when they start to look for other opportunities?

You have to start somewhere, and you can’t just swing the barn doors open and let everyone go their own way on day 1. If you have an existing corporate structure in place with assets and services and regular meetings and the like, then you have to decide who can make the most benefit from a remote situation the soonest, make them the pilot group, and manage the expectations of the rest of the organization while the pilot group prepares to move to a remote workspace.

1, 10, 100, 1000

A common software application rollout strategy is to make it accessible to 1 user, then 10, then 100, then 1000, then… move up from there. In preparing your organization or department, you might consider a similar strategy.

I work for a client right now where I’m the “1″. If I can work effectively with the rest of the team (in the office), if I can produce results, remain accessible as-needed during working hours, manage the expectations of my team with regards to my presence (appointments happen), and overall be an asset to the team, then the management may decide that it can work on some larger scale - even if ‘larger’ means 2 instead of 1. It might also be useful to do a ‘remote rotation’ so that glitches can be caught early before making a physical presence in the office optional.

Success, of course, means getting together with the team and figuring out what tools will be used to best emulate an office working environment. We use IRC for 99% of our communication, falling back to email when we need to cc managers, we have a wiki for documentation and status updates, we have a trouble ticket system, everyone has everyone else’s phone number, blackberry PIN, or whatever. We’re a technical group doing system administration. It’s working wonderfully.

“But if the sysadmins work from home, the developers will want to work from home!” Maybe so. That’s where you have to manage expectations, and communicate with your workers to let them know that the company’s ‘office optional’ project is in an early alpha stage, that it’s being tested on the group most familiar with the technologies involved, and most capable of exploiting those technologies successfully to produce results. Once the geeks work out the shortcomings, and management is able to evaluate the effectiveness of the plan, the tests will become more widespread.

Really, it’s not a whole lot different from doing anything else that affects the whole company: changing payroll providers, healthcare options, software and desktop hardware upgrades and replacements… it just takes communication. The process has to be managed, just like every other process.

There’s more than one way to do it!

There’s no one solution out there. When I joined php|architect Magazine in 2003, it was run by Marco Tabini, and I was a remote editor. A couple of months after joining, I became editor in chief, and was in charge of remotely managing the magazine. I did it differently from Marco, but he still remained involved and engaged through good communication.

Python Magazine was created and managed by me, and for the entire lifespan of the magazine, I have not seen anyone else involved in its production in person. Ever. Design, production, web site admin, executive administration, tech editors, authors, accountants… time lines, budgets and planning documents… all remote, and mostly delegated. I started the magazine with the thought that at some point someone more engaged in the community and with Python should take charge — I was just a “temp” to get the vision off the ground. Sure enough, when I handed the magazine over to Doug Hellmann, he did things differently from me, and it’s working out wonderfully for him as well!

Everyone has their own management style. Don’t think that just because your management style is a little unique you can’t handle remote workers. Good managers are creative, and aren’t afraid to execute on creative solutions.

I’m not someone who wakes up every day and looks at how my blog is ranked by all of the various services. I check out my WordPress stats, but that’s really about it. However, someone went and did some of the work for me, and they’ve decided that, of the blogs that they read or that were suggested to them, this blog ranks #20 in a listing of 25.

I’m really flattered, but wonder if it’s an indicator that this is a quality blog, or that they should aim higher in their blog reading ;-P  Either way, listing 25 bloggers in a flattering way is a fantastic marketing technique, because most of us are probably egomaniacal enough to say “Hey! Look!” and link back to the list on *your* blog, resulting in lots of traffic. Kudos, and thanks Mobile Maven!

So, I’ve been a full-time freelance consultant now for almost 3 months, and I’m happy to say that business is going well so far. I’ll be coming off of a 3-month contract at the end of this month, and I’ve already had some success in building the business in that time. I’ve also had a really important “lesson learned”. I have clients and projects to carry me through the next few months, and a few contracts that are still “pending”. Here’s a list of things that are working, and a couple that aren’t:

What’s working…

  • Face time: Being able to meet people one-on-one has been invaluable in ways I hadn’t even imagined. I spend all kinds of time in front of a computer, answering emails and such. The conversations are very different when you can hear a voice, read body language, and make a personal connection with people. They feel more comfortable with me, and I feel more comfortable with them as well.
  • FreshBooks: I tested out 3 or 4 different systems for invoicing, time tracking, estimates, and the like, and I settled on FreshBooks, and I’m very happy with it. It’s not the most AJAX-y slicker-than-snot-on-a-doorknob web2.0 gooey application on the planet, but it’s quite mature, well-supported, bug free so far, and it helps me maintain a professional image *and* track sent/paid invoices. It admits defeat in certain areas, like project collaboration, but for things outside of its core competency, it integrates with existing tools like BaseCamp.
  • Google Apps for Your Domain: I got my business “brochure” web site up and running in no time (though I really want to create a new one with more features, probably using Django). With the domain registration and web site out of the way, the next thing to do was set up email. I’ve deployed Google Apps before, and was surprised at how much I used features I didn’t put much value in at first. Long story short, it took me less than an hour to set up Google Apps, change my DNS records, set up the SPF record, and verify to Google that I owned the domain. After that, everything ‘just worked’.
  • Networking: Almost 100% of my income can be traced to a friend, associate, someone I’ve helped in some way, or a former employer. I read a blog post once that networking doesn’t make that big of a difference. Turns out it was written by someone who hit a dry spell, and then tried to dive into networking, as if it were some kind of quick fix. Don’t do that. Don’t network because you have to. Don’t network as an “angle” to get more business. I’ve done the exact opposite: instead of consulting and then networking to help my business, I have *always* networked, and the friends I’ve made by just being social and being involved in things that interest me gave me a lot of encouragement to “go solo”.
  • Digital Ubiquity: I have accounts on LinkedIn, facebook, twitter, brightkite, yelp, guru, myspace, and jaiku. I’m a regular on probably 10-12 different IRC channels. I’m on several different mailing lists at any given point in time. I write on my blog here, which is syndicated in a few places, and I write for other sites and publications as well. I make an effort to contribute code to the projects I use, or otherwise release code I write in the course of doing my work, either on my blog, my old web site (well, it’s still mine, but it’s old), or on a project hosting site. The income I have which is not traceable to someone I know is traceable to one of these online presences. One client found me because he was looking for a local presence that could complement his business, so he found me on Guru.com by searching for people in his area. That’s actually the only good Guru has ever really done me, to be honest, but it was worth it! Others have found me on O’Reilly, Linux.com, or my blog.

What’s Not Working…

  • Trying to be a bargain: I have a client who is also someone I consider a friend. A long time ago he told me to give him a rate that is low enough that I could make money, but they could get some of their mounting projects done. I gave them a really low rate, and you know what? It turned out to be a complete disservice to him. I didn’t do him any favors at all. By giving him a steeply discounted rate, I was basically forced to turn down all of his projects because even contracts that came in that were relatively cheap were still paying quite a lot more than I was charging him. I wound up with enough work that I could no longer justify doing work at the lower rate. In a way, it’s good news for me, but it’s bad news for my friend, and it wasn’t intentional at all.
  • Working at “hotspots”: I have a great working environment at home, but occasionally I used to try to work outside the house for a change of scenery. It was a bust. Panera would be great, except that they only allow 30 minutes of internet access between 11am and 1:30pm. I can’t be disconnected for that long, so if I go there, it’s at 7:30am, I have breakfast there, and then I’m cut off at 11:30. At that point, I leave. I’m not going to eat lunch for 2 hours, and I’m not going to kill time waiting for the ‘net to come back. Border’s isn’t bad as a working environment, but they charge a lot for ‘net access, and the food there has never been anything short of horrid. Other places are further away, or are too loud, or don’t have enough power outlets, or…. whatever. I haven’t found a reliable spot yet. I’ll let you know if I do.
  • Recruiters: In the past few months, I’ve received *AT LEAST* 100 “opportunities” from recruiters. Useless. They fail to read even the first sentence on my CV or profile. I’m a hands-on system/db admin and trainer and project manager in NJ. They’re contacting me for 6 month senior java developer positions in Massachussetts. No, I’m not kidding. I have not only not received a suitable opportunity from a recruiter, I haven’t even seen a single one that was in the ballpark enough to respond to. The problem with recruiters is that it’s all automated, and the automation systems SUCK. As a result, they end up becoming what amounts to a spam shop: they fire off 5000 emails to fill one position in the hopes that 5 people respond.

What’s working for you? What’s gone horribly wrong?

Note that I’m talking about using these tools in some kind of professional way, and more specifically, I’m talking about using Excel as a database, and using VPS hosting to host “professional” web sites. By “professional”, I mean something other than your personal blog, picture gallery, or other relatively inconsequential site.

Excel is not a database

Here’s the thing: Excel isn’t a database. Most people who don’t work in IT don’t seem to understand this, and they’re deathly afraid to actually communicate with anyone in IT, so they take matters into their own hands, and create problems so big that IT is forced to get involved, because at some point this spreadsheet becomes “critical” to some business function. Then IT gets even more bitter toward the non-IT folk, validating some of the reasons the non-IT folk went that route in the first place, and virtually guaranteeing that they won’t come to the IT group next time either.

So, if you don’t work in IT and are not a geek, know this: Excel is not a database. Excel is not meant to manage data on a long-term basis. For everything you can do with Excel, there is almost certainly a better tool for the job. This isn’t to say that Excel is good for *nothing*, just that it’s generally not good in places where data needs to be managed over the longer term, shared with others, and relied upon for day-to-day operations of a business or department.

Find someone in IT who seems nice and “deals with databases”, and ask them what their thoughts are on the topic. Then tell them the *actual problem you’re trying to solve*, and ask how they would approach it. You’re not likely to hear “Excel” in the reply unless Excel is so rampant in your company that it’s become a corporate standard for creating data fiefdoms, which would be bad.

A VPS is Not “Professional Grade”. Ignore Adverts to the Contrary

No, really - I mean it. I’ve done plenty of consulting for companies who need some kind of fire put out for one of their web sites. Not long into the conversation I learn (for about 50% of the calls I get) that the site is externally hosted on a VPS. Occasionally I get people whose sites are, or are supposed to be, hosted on dedicated servers, but the actual VPS/dedicated server isn’t really the whole issue. The issue is with how these things are configured, and your ability to do what you need with them.

Marketing for VPS and dedicated server hosting often say “full root access” somewhere in the list of features. There are also specs like the CPU speed, amount of RAM, and bandwidth limits. All of these come together to give the unwitting customer the notion that they’re getting full root access to some kind of behemouth server with all kinds of resources. However, things go downhill when you see things like cPanel, Plesk, or anything else that looks like “easy management through web-based administrative interface”. Again, this is probably fine for something that gets 100 hits per month or so and isn’t critical. The minute you can attach a cost to the problems that can arise with your site, you need to ditch these hosting plans.

Why? There are numerous reasons, but I’ll start with three:

  1. There’s typically no failover or “high availability”: if one machine goes down, or one VPS on the same hardware goes nuts, you’ve just ceased to exist on the internet at all.
  2. The CPU and RAM advertised is used mostly by the bloated software used to automate the management and monitoring of the systems (in other words, it’s used by your hosting provider, not your own application).
  3. The system configurations I’ve seen in these environments borders on retarded, and since the end user is managing all of this through a web interface, the only folks left to blame are the providers. So when you have problems, they’re guaranteed to be extra-challenging to solve.

What kinds of system configuration issues? Well, how about every service turned on, every port open (and not filtered) by default? How about downright broken service configurations, ranging from named.conf (DNS) configs specifying features that *can’t* work as configured, to crippled package management tools that disallow package modifications because doing so would break the monitoring/management tools, to php.ini files that turn on displayErrors and turn *off* log_errors. In general, logging configurations are poor or worse, making problem-solving an uphill battle. Every time I log into a VPS I am typically shocked and appalled at what I find. Even if it’s $5 a month, it’s not worth it.

Think about it: if you have a VPS and you have database corruption, what happens? You call support, who will probably just confirm or deny that actions performed by them or their automated routines had anything to do with the corruption (if they were forced to uncleanly reboot the machine, for example, that might explain things). Usually, they’ll say they don’t have any record of any events on the server that might be an issue, and you’ll need to fix it yourself (that’s what you wanted “full root access” for in the first place, right?).

So, you get a system or database guy to look into things. He’ll find that there are no logs, broken configurations, and when he tries to make a change, it’s either overwritten by these wacky automated management routines, or it breaks some part of the web-based management interface. He’ll also find that, while your web site uses about 128MB of the 512MB of available RAM, the host is running software that takes up double that amount of RAM. Wow, what a deal you got!

All of these issues, by the way, can also occur on dedicated servers, but what sets VPS services apart is the performance: it is, at the very best, unpredictable, and often bad. Some hosts try to market their way around this by charging you more money for “low-density” VPS “solutions”. Don’t buy it. It’s not a density issue. Even if you only share the hardware that runs your VPS with *one* other VPS, if that other VPS goes crazy and starts performing huge amounts of disk reads and writes, your site, even if there are only 3 people looking at it, is going to be slow.

The solution? Well, evaluate whether or not you really need the control a VPS gives you. If you’re just running WordPress, a simple CMS, or a brochure web site, you almost certainly don’t need a VPS. Get a web hosting plan. They often offer one-click installations of wordpress and CMSes like PostNuke, PHP-Nuke, Joomla, Drupal, etc, along with phpMyAdmin for doing database operations. LinuxLaboratory.org runs on Drupal and MySQL, and houses a bunch of articles I’ve written about Linux, System/DB Administration, etc., that I’ve written over the years. It also presents a feed of the content on this blog, and it’s been running on a simple, cheap, web hosting plan for probably 7 or 8 years now. My uptime is better than the sites of friends of mine who decided they needed the control of a VPS. Same goes for this blog (though it’s a different provider). Heck, my beer blog runs on a *free* web hosting solution at DreamHost. It’s not super fast, but aside from that it serves its purpose well, and they have one-click installations for just about everything.

If you need to launch some kind of site that requires things not offered by a web hosting plan, then chances are you’re developing the site, or have some budget or staff for helping you setup/manage/troubleshoot the services you’ll run there. Check out Amazon EC2 and Google AppEngine, and look into dedicated hosting to see if any of those meet your needs.

If you have an IT department, you could, of course, try to work with them on a solution. This is almost always the best solution over the long haul.

I posted here last week about using ReportLab to generate very simple reports with a chart and code sample. I got several requests for screenshots, but what fun is that? Here’s the PDF… gfe.pdf

I’ve been doing a little reporting project, and I’ve been searching around for quite some time for a good graphing and charting solution for general-purpose use. I had come across ReportLab before, but it just looked so huge and convoluted to me, given the simplicity of what I wanted at the time, that I moved on. This time was different.

This time I needed a lot of the capabilities of ReportLab. I needed to generate PDFs (this is not a web-based project), I needed to generate charts, and I wanted the reports I was generating to contain various types of text objects in addition to the charts and such.

I took the cliff-dive into the depths of the ReportLab documentation. I discovered three things:

  1. There is quite a lot of documentation
  2. ReportLab is quite a capable library
  3. The documentation actually defies the simplicity of the library.

It’s a decent bit easier than it looks in the documentation, so I thought I’d take you through an example. This example is dead simple, but I still think it’s a little more practical than what I was able to find. The ReportLab documentation refers to what sounds like a great reference example, but the problem is that the tarball I downloaded didn’t contain the files it was making reference to :(

I started out by investigating one of the small example projects in the “demo” directory of the ReportLab directory. It was called “gadflypaper” (Ironically, written by Aaron Watters. I worked in the cube outside of his office for several months last year — Hi Aaron!). Aaron’s example was very simple, and a great starting point to start understanding how to put together a very basic document. It’s not infested with abstractions — just a few simple functions, and a lot of text. I ripped out a lot of the text until I had just an example of each function in action, and then set to work.

The Basic Process

To simplify the work of doing page layout minutiae, I (like the example) used PLATYPUS, which is built into ReportLab and abstracts away some of the low-level layout details. If you *want* low-level control, however, you can do whatever you want with the pdfgen module, also included (and PLATYPUS is basically a layer built from it).

With PLATYPUS, you get access to a bunch of prebuilt layout-related objects, representing things like paragraphs, tables, frames, and other things. You also have access to page templates, so that dealing with things like frame placement is a little easier.

So, to give you a rundown of the high-level steps:

  1. Choose a page template, and use it to create a document object.
  2. Create your “flowables” (paragraphs, charts, images, etc), and put them all into a list object. In ReportLab documentation, this is often referred to as a list named “story”
  3. Pass the list object to the build() method of the document object you created in step 1.

Phase 1: Let’s Get Something Working

As a first phase, let’s just make sure we can do the simplest of documents. Here’s some code that should work if you have a good installation of ReportLab (I’m using whatever was the latest version in early October, 2008.) Note that we’ll be cleaning this up and simplifying it as we go along.


#!/usr/bin/env python

from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.rl_config import defaultPageSize
from reportlab.lib.units import inch

PAGE_HEIGHT=defaultPageSize[1]
styles = getSampleStyleSheet()
Title = Paragraph("Generating Reports with Python", styles["Heading1"])
Author = Paragraph("Brian K. Jones", styles["Normal"])
URL = Paragraph("http://www.protocolostomy.com", styles["Normal"])
email = Paragraph("bkjones +_at_+ gmail.com", styles["Normal"])
Abstract = Paragraph("""This is a simple example document that illustrates how to put together a basic PDF with a chart.
I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab.""", styles["Normal"])

Elements = [Title, Author, URL, email, Abstract]

def go():
   doc = SimpleDocTemplate('gfe.pdf')
   doc.build(Elements)

go()

Not a lot of actual code here. It’s mostly variable assignments. The variables are mostly just strings, but because I want to control how they’re arranged, I need to make them “Flowables”. Remember that PLATYPUS puts together a document by processing a list of Flowable objects and drawing them onto the document. So all of our strings are “Paragraph” objects. You’ll note, too, that Paragraph objects can be styled using definitions accessed from getSampleStyleSheet, which returns a ’style object’. If you create one of these at the Python interpreter, and call the resulting object’s ‘list()’ function, you’ll see what styles are available, and you’ll also see what attributes each style has. Try running this code to make sure things work. Change the strings if you like :)

Phase 2: Simple Cleanup

I haven’t yet created insane layers of abstraction in my own code, because I’ve been working on deadlines and doing things that are relatively simple. This will inevitably change :)  However, there are some things you can do to make life a bit simpler and cleaner.


#!/usr/bin/env python

from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.rl_config import defaultPageSize
from reportlab.lib.units import inch

PAGE_HEIGHT=defaultPageSize[1]
styles = getSampleStyleSheet()
Title = "Generating Reports with Python"
Author = "Brian K. Jones"
URL = "http://www.protocolostomy.com"
email = "bkjones@gmail.com"
Abstract = """This is a simple example document that illustrates how to put together a basic PDF with a chart.
I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab."""
Elements=[]
HeaderStyle = styles["Heading1"]
ParaStyle = styles["Normal"]
PreStyle = styles["Code"]

def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    Elements.append(s)
    para = klass(txt, style)
    Elements.append(para)

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)

header(Title)
header(Author, sep=0.1, style=ParaStyle)
header(URL, sep=0.1, style=ParaStyle)
header(email, sep=0.1, style=ParaStyle)
header("ABSTRACT")
p(Abstract)

go()

So, this is still simple. Simplistic, even. All I did was move the repetitive bits to functions. The ‘header’ and ‘p’ functions are (for now) unaltered from the gadflypaper demo. The good part here is that strings can be defined as ‘just strings’. Paragraphs and headers are just plain old string variables, and then at the bottom I just call the ‘header’ and ‘p’ functions and pass in the variables. The order in which I call the functions determines the order my document will appear in.

Phase 3

There’s kind of an issue with the way these functions work, at least for my needs. The problem is that they just go ahead and add things to the “Elements” list automagically. This might be ok for some quick and dirty tasks, but in my case I found that I needed more control. Things were crossing page boundaries where I didn’t want them to, and if I want to add formatting or apply built-in functionality, I can’t do it on a per-object basis without loading up the argument list.

I also wanted to have a relatively easy way to move *sections* of reports around, where a section might consist of a heading, a paragraph, and a source code listing — three different “Flowable” objects. So I altered these functions to make them return flowables instead of just adding things to the Elements list for me:


#!/usr/bin/env python

from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.rl_config import defaultPageSize
from reportlab.lib.units import inch

PAGE_HEIGHT=defaultPageSize[1]
styles = getSampleStyleSheet()
Title = "Generating Reports with Python"
Author = "Brian K. Jones"
URL = "http://www.protocolostomy.com"
email = "bkjones@gmail.com"
Abstract = """This is a simple example document that illustrates how to put together a basic PDF with a chart.
I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab."""
Elements=[]
HeaderStyle = styles["Heading1"]
ParaStyle = styles["Normal"]
PreStyle = styles["Code"]

def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    para = klass(txt, style)
    sect = [s, para]
    result = KeepTogether(sect)
    return result

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def pre(txt):
    s = Spacer(0.1*inch, 0.1*inch)
    p = Preformatted(txt, PreStyle)
    precomps = [s,p]
    result = KeepTogether(precomps)
    return result

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)

mytitle = header(Title)
myname = header(Author, sep=0.1, style=ParaStyle)
mysite = header(URL, sep=0.1, style=ParaStyle)
mymail = header(email, sep=0.1, style=ParaStyle)
abstract_title = header("ABSTRACT")
myabstract = p(Abstract)
head_info = [mytitle, myname, mysite, mymail, abstract_title, myabstract]
Elements.extend(head_info)

code_title = header("Basic code to produce output")
code_explain = p("""This is a snippet of code. It's an example using the Preformatted flowable object, which
                 makes it easy to put code into your documents. Enjoy!""")
code_source = pre("""
def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    para = klass(txt, style)
    sect = [s, para]
    result = KeepTogether(sect)
    return result

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def pre(txt):
    s = Spacer(0.1*inch, 0.1*inch)
    p = Preformatted(txt, PreStyle)
    precomps = [s,p]
    result = KeepTogether(precomps)
    return result

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)
    """)
codesection = [code_title, code_explain, code_source]
src = KeepTogether(codesection)
Elements.append(src)
go()

So, this isn’t too bad. It’s still functional programming. I’ll revamp it in another post to use objects, but for those readers who are still learning all of this, it might help to leave out the abstraction for now. What I liked about the gadflypaper demo was that it was quick and dirty. You could read it line by line, top to bottom, and understand what just happened without jumping back and forth between main() code and object code.

As you can see, I’m using the KeepTogether() method, in two different ways. In the functions, I use it so I don’t have to go back later and manually add spacer elements to the Elements array. Then, toward the bottom, I create a preformatted code snippet, and I use the KeepTogether method to make sure that all parts in the code section stay together without flowing across a page boundary. There are other options you can use to customize how your document deals with ‘orphan’ and ‘widow’ elements as well, so definitely check out the documentation for that (or keep reading this blog. i’ll get to it eventually).

So what’s left?

Phase 4: The Grand Finale

The rest of the code I add is to connect to a database, make a query, and then pass the data returned from the database to a function that creates a chart. I add the chart to the Elements, and we’re in business!


#!/usr/bin/env python
import MySQLdb
import sys
import string
from reportlab.graphics.shapes import Drawing
from reportlab.graphics.charts.linecharts import HorizontalLineChart
from reportlab.platypus import *
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.rl_config import defaultPageSize
from reportlab.lib.units import inch

dbhost = 'localhost'
dbname = 'httplog'
dbuser = 'jonesy'
dbpasswd = 'mypassword'

PAGE_HEIGHT=defaultPageSize[1]
styles = getSampleStyleSheet()
Title = "Generating Reports with Python"
Author = "Brian K. Jones"
URL = "http://www.protocolostomy.com"
email = "bkjones@gmail.com"
Abstract = """This is a simple example document that illustrates how to put together a basic PDF with a chart.
I used the PLATYPUS library, which is part of ReportLab, and the charting capabilities built into ReportLab."""
Elements=[]
HeaderStyle = styles["Heading1"]
ParaStyle = styles["Normal"]
PreStyle = styles["Code"]

def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    para = klass(txt, style)
    sect = [s, para]
    result = KeepTogether(sect)
    return result

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def pre(txt):
    s = Spacer(0.1*inch, 0.1*inch)
    p = Preformatted(txt, PreStyle)
    precomps = [s,p]
    result = KeepTogether(precomps)
    return result

def connect():
   try:
      conn1 = MySQLdb.connect(host = dbhost, user = dbuser, passwd = dbpasswd, db = dbname)
      return conn1
   except MySQLdb.Error, e:
      print "Error %d: %s" % (e.args[0], e.args[1])
      sys.exit (1)

def getcursor(conn):
   cursor = conn.cursor()
   return cursor

def totalevents_hourly(rcursor):
    rcursor.execute("""select hour, count(*) as hits from hits group by hour;""")
    return rcursor

def graphout(catnames, data):
    drawing = Drawing(400, 200)
    lc = HorizontalLineChart()
    lc.x = 30
    lc.y = 50
    lc.height = 125
    lc.width = 350
    lc.data = data
    catNames = catnames
    lc.categoryAxis.categoryNames = catNames
    lc.categoryAxis.labels.boxAnchor = 'n'
    lc.valueAxis.valueMin = 0
    lc.valueAxis.valueMax = 1500
    lc.valueAxis.valueStep = 300
    lc.lines[0].strokeWidth = 2
    lc.lines[0].symbol = makeMarker('FilledCircle') # added to make filled circles.
    lc.lines[1].strokeWidth = 1.5
    drawing.add(lc)
    return drawing

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)

mytitle = header(Title)
myname = header(Author, sep=0.1, style=ParaStyle)
mysite = header(URL, sep=0.1, style=ParaStyle)
mymail = header(email, sep=0.1, style=ParaStyle)
abstract_title = header("ABSTRACT")
myabstract = p(Abstract)
head_info = [mytitle, myname, mysite, mymail, abstract_title, myabstract]
Elements.extend(head_info)

code_title = header("Basic code to produce output")
code_explain = p("""This is a snippet of code. It's an example using the Preformatted flowable object, which
                 makes it easy to put code into your documents. Enjoy!""")
code_source = pre("""
def header(txt, style=HeaderStyle, klass=Paragraph, sep=0.3):
    s = Spacer(0.2*inch, sep*inch)
    para = klass(txt, style)
    sect = [s, para]
    result = KeepTogether(sect)
    return result

def p(txt):
    return header(txt, style=ParaStyle, sep=0.1)

def pre(txt):
    s = Spacer(0.1*inch, 0.1*inch)
    p = Preformatted(txt, PreStyle)
    precomps = [s,p]
    result = KeepTogether(precomps)
    return result

def go():
    doc = SimpleDocTemplate('gfe.pdf')
    doc.build(Elements)
    """)
codesection = [code_title, code_explain, code_source]
src = KeepTogether(codesection)
Elements.append(src)

hourly_title = header("Hits logged, per hour")
hourly_explain = p("""This shows aggregate hits across a 24-hour period. """)

conn = connect()
cur = getcursor(conn)
te_hourly = totalevents_hourly(cur)
catnames = []
data = []
values = []
for row in te_hourly:
   catnames.append(str(row[0]))
   values.append(row[1])

data.append(values)
hourly_chart = graphout(catnames, data)
hourly_section = [hourly_title, hourly_explain, hourly_chart]
Elements.extend(hourly_section)

go()

So, I’ve muddied things up a bit. If you’ve written database code before, you can just look past it all. I don’t do anything magical there. In fact, the chart creation isn’t magical either. I’m sure there’s even a cleaner way to do it - but this works for the moment.

I get a connection object, use it to get a cursor, then pass the cursor to the query function, which passes back…. a query object: te_hourly. The chart I’m going to create needs ‘category’ names for the y-axis values, and then values to plot on the chart. In my case, the hour is row[0] and the total hits for that hour are in row[1]. I build my catnames and data lists, and then create “hourly_chart” by passing my lists to the graphout function. Finally, I add the chart, along with its title and explanation to the Elements list. Done!

For its part, the graphout function is mostly just a bunch of parameters I need to configure my HorizontalLineChart object. Once the chart is all set to go, I need to add it onto my Drawing object, and return the Drawing flowable object.

Not yet what I’d call “Beautiful Code”, but it works, and it’s likely to help some other folks wade through the ‘getting started’ hump with ReportLab. Hope it was useful.