Archive for the ‘Scripting’ Category

PyWorks Conference Schedule Posted

Wednesday, August 27th, 2008

Hi all,

The schedule for PyWorks has been posted! I’m really excited about three things:

1) there are some really cool talks that I’m looking forward to attending. There are a couple of sysadmin-related talks, AppEngine, TurboGears, Django, and an area I’ve been especially slow to move into: testing (I know, shame on me). There’s lots more so be sure to check it out.

2) the conference scheduling process is over ;-)

3) I get to meet a lot of people face-to-face that I’ve worked with in the past on Python Magazine developing articles, or interacted with on IRC, etc. One thing I like about conferences surrounding open source technologies is you get to thank people face-to-face for the sweat they poured into some of the tools I use regularly. Mark Ramm, Kevin Dangoor, Michael Foord, Brandon Rhodes, and a collection of Python Magazine authors will be speaking there, and other Python Magazine folks and generally familiar faces will be in attendance.

Enjoy!

For those still unaware, PyWorks will be held in Atlanta, Nov. 12-14, 2008. It’s sponsored by MTA, the publisher of Python Magazine, as well as php|architect. In fact, the php|works conference will be held simultaneously with PyWorks, and attendees of one are free to access talks in the other at will. There will also be a “center track” that will cover some more generic topics of interest to developers without regard to the language in use. Check it out!

The promise of Drizzle

Sunday, July 27th, 2008

I got to actually speak to Brian Aker for maybe a total of 5 minutes after his micro-presentation about Drizzle, which took place at the Sun booth at OSCON 2008. I was a bit nervous to ask what questions I had out loud, because the things I had wondered about were things I really didn’t see too much discussion about out in the intarweb. I’m happy to report that, if Brian Aker is to be considered any kind of authority (hint: he is), my ideas are not completely ridiculous, so maybe I’ll start talking a bit more about them.

UPDATE: lest anyone get the wrong idea, Brian Aker did, in fact, state that views are not on the short list of priority items for Drizzle, but he did say that views are one of the features he finds most useful, and that they’d probably be higher on any future priority list than, say, stored procedures. So, take my notes below about views with a grain of salt. It’s not necessarily “coming”.

My three ideas were these:

  1. Materialized views: my experience with views in MySQL is that they just plain old don’t scale well compared to other database systems I’ve used. I used Sybase in 2000 and views scaled better for me then than MySQL views do now, and I’m using them in mostly the same way (which is to say that I’m not using them to do evil things - I’m using them in the way most of the database community agrees they should be used). In the past, I thought materialized views were “nice to have”, but now that I’m working with much larger data sets, without a need for my reporting to be 100% real-time, materialized views would be great. To be honest, I win either way with Drizzle in all likelihood, as Brian Aker has proclaimed that views in Drizzle will not look like views in MySQL. He confirmed that materialized views would be a great thing to have a closer look at, and I was happy with that reply.
  2. “Query Fragments”: I didn’t know they were called query fragments. What I explained to Brian Aker was that I wanted to harvest subsets of cached result sets. So, for example, if I do a date range query (I do that a lot), and the result set is cached, and then my app does another query which is identical save for that the date range is a subset of the one in the cached result set, I’d like to grab that data from the cache. My actual question to Brian was “can this be built in such a way that this would be a reliable, trustworthy result coming from some middle tier component?” And that’s when he told me about query fragments. He said that this idea was not at all crazy, and was also worthy of further discussion.
  3. Vertical data-padding. Well… that’s what I’m calling it. Here’s the logic: I have lots of temporal data. Tons of it. My queries are largely things like “for this user, show me all of the foo’s bar’d — group by day — where day is between these two days”. MySQL is good at these kinds of queries (assuming you index well and, basically, that you’ve read “High Performance MySQL” a couple of times… per edition), but there’s something missing. When I get the result set back, any of the days for which no foo’s were bar’d, I don’t get a record. This is perfectly within the realm of reasonable behavior, but I’ve never known MySQL to let things like being reasonable get in the way of helping their users, so my suggestion was that MySQL, in order to do the comparisons for the “WHERE” clause, *MUST* know what dates fall within the dates in, say, a “BETWEEN” statement. It would then seem logical to have some way to tell MySQL “If one of those dates has no value, return a NULL” (or 0, or an empty string, or something). I don’t know the real name for this proposed feature, but I call it “vertical data-padding” because you’re padding columnar data, which, in my mind, is visualized vertically. Just like when you do a “GROUP CONCAT” or something, I would refer to that as horizontal data padding. I explained to Brian that one way I’ve seen this handled is to have a lookup table of static dates that gets joined to the main data table. You do a left outer join with the date lookup table on the left, and you get back a row for every date whether there’s data in the right-hand table or not. This works when ‘n’ is small (like everything else), but it’s hurrendous when you have, say, 10 million rows to deal with. Then you’re in what I call the “No Bueno Zone”. Brian seemed interested in the problem, and I’ll be discussing that with him further when his life settles down a bit (he’s been at OSCON, and he’s still settling in at Sun).

I want to thank Brian Aker for his enthusiastic attitude toward helping others, and for all of the work he pours into all of this stuff. I also want to say that, for me, Drizzle is really exciting, not so much because the feature set is more or less cherry picked to map onto what I do for a living, but also because it represents an opportunity to get ideas in the door before a lot of legacy cruft makes it impossible to implement these somewhat idealistic features without rocking the boat for the millions of users already launching it into production.

OSCON Day 2: Launching a Startup in 3 Hours

Tuesday, July 22nd, 2008

Launching a Startup in 3 Hours was a great talk given by Andrew Hyde (of techstars.org) and Gavin Doughtie (of Google). Both of the speakers are heavily involved in the recent trend of doing “Startup Weekends”, and techstars.org is an organization that hosts startup weekends all around the US (and I think internationally as well - Andrew mentioned one in Germany if I heard correctly).

The first half of the talk was about the general concept of a startup weekend, the problems it avoids (”we’ve been working for 9 months and haven’t launched anything”), the problems it brings up (”If you’re not using Java, you’re an idiot, so count me out!!”), and lots of details about how to organize, how to assign roles, and some common tools they use (like Basecamp and whatever your IM of choice is). There was also talk of legal issues, how (basically) to think about forming the company with the people involved, and decisions that need to be made at a business level aside from just the coding.
IMG_4514.JPG

The second half of the talk wasn’t a talk at all. Instead, people who had ideas stood up, presented their idea in a couple of sentences, and once the ideas were out there, we were told to break into groups and get to work! So people would get up and move over to the person whose idea they liked, and they’d start brainstorming. I decided to head out after about 30 minutes of observing and talking with people about ideas, but when I left, there were probably 6-8 groups of people engrossed in conversations, and the energy level was very high. Overall, it was a really exciting experience!

OSCON Evening 1 Begins, and More Portland Tips

Monday, July 21st, 2008

The evening plans didn’t wait for talks to be done. The IRC channel (#oscon on irc.freenode.net) was alive with talk of prospects for dinner and drinks after the conference. I myself was torn between a group going out for Lebanese and another going to Henry’s, but opted to go with my buddies from home to Henry’s.

It was worth it. If you haven’t been, Henry’s Tavern boasts 100 beers and hard ciders on tap (oddly, the beer list is the only menu *not* online - guess it changes too frequently). There are a ton of local beers that you can’t even get on the east coast just waiting for you to try, but there are also some rare treats, like the Belgian Lambic beers, which you don’t often see on tap. The food is a little pricey, but is really good, and the staff is very friendly. IMG_4491.JPGA couple of us were in a rush to get back by 7 for the BoF sessions, and when we asked the waittress how easy it was to catch a cab, she immediately informed us that she would have the hostess call one for us. About 2 minutes later we were in a cab on our way back (we wouldn’t have made it back in time if we had to walk back to catch the light rail).

I was not one of those rushing to a BoF, so I did a little poking around the area near the convention center. It was getting dark, and I didn’t want to stray too far, but I did find a couple of points of interest. First, there’s a bank right across the street from the convention center. I’d be willing to bet that the ATM there is less than the $3 the ATM inside the center charges.
IMG_4501.JPG
Beyond that is a paintball place. It was closed by the time I found it, and I don’t know if they run every day, or anything else, but interested parties might find it open during the lunch breaks or something if you wanted to check it out. The paintball place is located behind a building that is directly across the street from the conv. center. If you see the bank, it’s on the other side of the side street the bank sits on.

Tonight appears to be low-key from what I can tell. There’s currently no chatter on irc, the hotel bar had a few people chatting, and I might go down to catch the rush of people as they return from dinner and BoF sessions. Stay tuned tomorrow for more!

OSCON Day 1 Comes to a Close

Monday, July 21st, 2008

I think I have pictures of most of the basic parts of the conference at my OSCON Flickr set, and I thoroughly enjoyed day 1 of the conference. Of course, while *day* 1 is over, *night* 1 has yet to even begin. There are lots of BoF sessions, and maybe even more smaller meetups going on, as smaller groups take to discussing things over dinner and a beer or three.

I have to say, that I occasionally pop into irc channels for conferences I’m not even at and follow up on that because I’m involved a bit in conference planning as part of my work with Python Magazine (I’m helping to organize the PyWorks conference in November). This conference seems to have a pretty happy audience, if IRC chatter is any indication (and it usually is). Sure, there are a couple of weak spots in the wireless network, there are some fuzzy projectors, and there was a little confusion regarding breakfast this morning, but the important bits have been well-covered by the OSCON organizers and the “boots on the ground” here on site. Kudos to them all.

This afternoon I hopped to a couple of different talks: one on Memcached and MySQL, and the other on A/B testing. Both contained good content. Of course, I’m a systems guy primarily, so I sort of wanted more of an overview of memcached from the point of view of an admin who is deploying it rather than a developer implementing their code around it. I still got plenty of value out of that talk, and this *is* really more of an open source *developer’s* conference, so the expectations of 99% of the people in the room were met, I’m sure.

A/B testing is just not an exciting topic, and I would imagine that peoples’ bosses made them go to that talk whether they liked it or not. Not to say the talk wasn’t good - the parts I saw (I came in after the break) were good, and I learned from it, and that was the goal. If you’re a QA/QC person, I’m sure the talk was riveting, and there were a lot of good ideas and things I’d never considered flying by in the slides.

Overall, Day 1 is a win. I’ll cover more about this evening’s events in the pre-breakfast hours tomorrow. Stay tuned!

OSCON Day 1: The BoF Board, for your perusal

Monday, July 21st, 2008

I’ve posted a picture of the BoF board for day 1. Click on it to see bigger sizes. The full size image (maybe smaller) is perfectly suitable for reading at your leisure. I’ll update this if/when I see significant changes to it:

IMG_4477.JPG

Day 1 of OSCON Begins, and More Tips for Conference-goers

Monday, July 21st, 2008

I got an early start. Too early. But I’m from the west coast, so my body thinks I slept in. I wandered around a bit, took a few pics which you can see at my Flickr OSCON set, and I discovered a couple of things that might be of interest:

  • The starbucks in the conference center charges over $2 for a small cup of joe. There’s a starbucks right across the street (you can see it from the breakfast area - seriously, it’s 5 seconds away), and they charge less than $2 for a medium (grande). That’s less than I pay at home.
  • The ATM outside the starbucks charges $3 for cash. I’ll report back when I find a cheaper one, but most places seem to take plastic here.
  • Every computer involved in this conference, from registration to the video screens that dot the common areas, are running Windows XP. Just sayin’.
  • The light rail system is free to go just about anywhere except for the airport, so there’s no excuse not to get out and see Portland and take in the food and beer and stuff.
  • For beer-lovers, not only is there the Oregon Brewers Festival starting at the tail end of this conference, but there’s apparently another festival that we missed *last* weekend!! Keep that in mind when you’re planning to come to OSCON next year.

A Few Tips for OSCON Attendees

Sunday, July 20th, 2008

If you’re attending the 10th Annual Open Source Convention, I’ve compiled just a few tips for you on this, “day 0″ of the event:

  • Don’t check bags. Everything is slower if you check bags, and if you’re packing more than three shirts, you’re crazy, because if history is any indicator, you’re going to be bombarded with shirts over the course of the week. One maximum size (22″ x 14″ + 9″) suitcase, and a bookbag with a laptop pocket is all I brought, and I’m confident I’ll have all I need. I’ll report back if things change :)
  • Request a room away from the ice machine. They can be loud. This year my room is the last room at the end of a long hallway. Ahhhhhhhh….
  • Don’t bring toiletries of any kind: you can’t bring a lot of them on board, and I’d rather just avoid it altogether and buy stuff when I get to my destination. Don’t use the Hotel store though - there’s a Dollar store about 2 blocks from the Lloyd Center Doubletree Hotel (on the back end of the Lloyd Center mall), and they probably have everything you’ll need. If not, walk another block north to the Safeway, and you can get anything, though I didn’t find any travel-sized stuff.
  • Show up to registration early: I’m leaving shortly for registration. Registration moves pretty quickly even if you go on Monday morning, but on Sunday night (from 5-7pm) there’s a nice, jovial, laid-back mood around the registration areas.
  • When you’re in Portland, know that you’re in an area that is something of a mecca for beer. Even if you don’t like beer, I urge you to join friends and at least have a look at the beers available. You’re in an area where even the hotel bar has an ok beer selection. Saying you don’t like beer is like saying you don’t like food. If “beer” to you means Coors Light (or similar), you have no idea what beer is - but that’s ok, because you’re now in a place that can grant you a PhD in beer snobbery in a matter of a weekend. Really. Take advantage of it!! (a hint: many people who “don’t like beer” really just don’t like the bitterness that comes from hops. Ask a bartender for a sample of their finest wheat beer. I’ll bet you’ll be hooked).
  • Don’t stay in your hotel room if you can help it. Engage. Look at the whiteboard that is probably in the registration area as I type this. Find the conference web site, irc channels, wikis, and everything else that you can. 75% of the value of coming to OSCON is finding and meeting people you’ll be in contact with well after you leave. It’s a commercial conference, yes — but it’s a community atmosphere.
  • Plan your day. You can try to plan everything you’re going to attend before you get here, but it probably won’t work very well, because you’ll inevitably hear someone talking about something else and decide to attend that instead. What might work better is if you try to plan the night before — but not after the parties — probably sometime between the last session of the day and dinner. At least have an idea what you’re doing the next day, because parsing the program on-the-fly is, imho, difficult, especially when ten people you know walk by and say hi and stuff.
  • Try to plan lunch in the city. This can be a little difficult, but you can hop on the light rail for free as soon as the conference breaks for lunch, and be downtown in no time. Last time I attended, I only made it out for two lunches downtown, and I’m kind of a foodie, so I would’ve liked to sample more of the local faire. Try to keep away from the chains (you can get that at home) and be adventurous!!

A Quick Look at ElementTree (and a bit about ’sar’)

Friday, July 18th, 2008

I’m working on a new project that will be open sourced if I can ever get it to be generically useful. It’s called “sarviz”, and it’s a visualization tool for output from the “sar” UNIX system reporting utility. I know tools like this exist, but please read on, as I’m looking to do something a bit different from what I’ve seen.

A quick, simple explanation of sar

System administrators typically run sar as a cron job, and each day sar will generate a report that lists the values of various system counters for a specified time interval throughout the day. So you end up with a text file that lists, for example, the cpu iowait value every 10 minutes throughout the day. There are maybe a dozen different categories of counters enabled by default, and more that aren’t (like disk-related counters). Anyway, you wind up with a text file that looks something like this:

23:30:01          CPU     %user     %nice   %system   %iowait    %steal     %idle
23:40:02          all      0.32      0.00      0.32      6.57      0.49     92.29
23:40:02            0      0.32      0.00      0.32      6.57      0.49     92.29
23:50:01          all      0.74      0.00      0.82      7.14      0.55     90.76
23:50:01            0      0.74      0.00      0.82      7.14      0.55     90.76
Average:          all      0.82      0.00      0.72     13.54      0.78     84.14
Average:            0      0.82      0.00      0.72     13.54      0.78     84.14

This is just a small part of one section of the file (this box has only one cpu, which is why the ‘all’ and ‘0′ numbers are the same, btw). The whole file on one server, running with default configurations, is 4000 lines long.

There’s a ton of great information in here, but… it all looks like the above. There’s no graphical output to be had. This is bad, because it would be nice to use this (historical) monitoring output for things like capacity planning, problem tracking, etc. You would, of course, want to couple this type of monitoring with something else that’ll do real-time monitoring, alerts, dependencies, escalation, etc.

So I want to write an application that’ll generate graphs of all of this stuff. Furthermore, I thought it would be cool to do something like what planetplanet does, which is to say that I want sarviz to run as a cron job, parse all of this stuff, and generate static html files, with an index.html that’ll make it really easy to browse this information either by host, by date, by resource… whatever. Later on I can add features to actually do even more useful stuff like longer-term trending of resource usage (by aggregating across various ’sar’ output files), and more.

Sar is not alone

Sar comes with some friends, and it turns out they can be extremely useful. The best one for my purposes here is called ’sadf’, and it is used to basically format the sar output to make it more useful for programmatic processing. It can output the information in CSV format, or make it ready for insertion into a relational database, but what I’m currently using for sarviz (and it’s early, so this could change) is the XML output capability. With XML output, I won’t have to deal with parsing out column headers, scanning an entire file for information from a single sar run, dealing with the blank lines sar uses by default to make it easier to read on a console, etc. So with sadf I can get output that looks like this:

<timestamp date=”2008-06-15″ time=”07:10:01″ interval=”600″>
<processes per=”second” proc=”0.93″/>
<context-switch per=”second” cswch=”221.50″/>
<cpu-load>
<cpu number=”all” user=”1.77″ nice=”0.00″ system=”0.56″ iowait=”0.04″ steal=”0.08″ idle=”97.55″/>
<cpu number=”0″ user=”1.77″ nice=”0.00″ system=”0.56″ iowait=”0.04″ steal=”0.08″ idle=”97.55″/>
</cpu-load> ….

This is a bit nicer to deal with, and I was excited to use Python’s (now built-in) ElementTree module to do something from scratch after having dealt with it being somewhat abstracted in the Python tools for the GData API (which I used to write a command line client for Google Spreadsheets, for example).

Doing Simple Things with ElementTree

Well, as it turns out, I had kind of a hard time getting started doing what I thought were simple things with ElementTree, so I want to post a few examples of how I did them so that I and others have something to refer to online.

The first thing to know about ElementTree is that there are Element objects, and ElementTree objects. ElementTree objects are made up of a hierarchical collection of Element objects, and Element objects are the things you can actually get attributes from that you’re likely to want. For whatever reason, I was a little confused starting out, because I wanted to get an ElementTree object and then ask that object to “scan the tree and give me all of the “time” attributes of the “timestamp” elements in the tree. You might be able to do this with a one-liner, but I never found a document that said how.

So here’s how to load in an XML file, parse it, and return all of the timestamp elements in that tree (or, rather, this is how I did it, which seems reasonable):

strudel:sa jonesy$ python
Python 2.5.1 (r251:54863, Jan 17 2008, 19:35:17)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> from xml.etree import ElementTree as ET
>>> tree = ET.parse(”sa15.xml”)
>>> for ts in tree.findall(”host/statistics/timestamp”):
…        isotime = ts.attrib["date"]+”T”+ts.attrib["time"]
…        print isotime

2008-06-16T05:00:01
2008-06-16T05:10:01
2008-06-16T05:20:01
2008-06-16T05:30:01
2008-06-16T05:40:01
2008-06-16T05:50:01
2008-06-16T06:00:01
2008-06-16T06:10:01
2008-06-16T06:20:01
2008-06-16T06:30:01
2008-06-16T06:40:01
2008-06-16T06:50:01
….

So, I imported the ElementTree module, fed my xml file to a method called “parse()”, and that gives me an ElementTree object. In that tree, I then ask for the timestamp elements which are under the root element at “host/statistics/timestamp”. You can then see that I create an ISO8601-formatted timestamp by asking for the “date” and “time” attributes of the timestamp element, and put a “T” between them. I would’ve used something like “T”.join, but there are other attributes in that element, and I only needed two, so I took the easy way out here instead of creating a list first and then doing the join on the list.

Of course, my real interest in the timestamps isn’t to print them, but to get the statistics for each sar run (represented by a timestamp, since sar records statistics for regular time intervals). So now let’s grab the 1-, 5-, and 15-minute load averages according to sar. I want all of this printed on one line along with the timestamp, because this output is going to be graphed using Timeplot, and that’s how Timeplot wants the data. Here goes:


>>>for ts in tree.findall("host/statistics/timestamp"):
...        isotime = ts.attrib["date"] + “T” + ts.attrib["time"]
…        for q in ts.findall(”queue”):
…             qstat = [isotime, q.attrib["ldavg-1"], q.attrib["ldavg-5"], q.attrib["ldavg-15"]]
…             print “,”.join(qstat)

2008-06-16T05:10:01,0.05,0.12,0.09
2008-06-16T05:20:01,0.03,0.06,0.07
2008-06-16T05:30:01,0.02,0.02,0.03
2008-06-16T05:40:01,0.02,0.06,0.03
2008-06-16T05:50:01,0.03,0.06,0.03
2008-06-16T06:00:01,0.04,0.03,0.00
2008-06-16T06:10:01,0.02,0.06,0.03
2008-06-16T06:20:01,0.06,0.10,0.04
2008-06-16T06:30:01,0.13,0.11,0.06
2008-06-16T06:40:01,0.16,0.12,0.08
2008-06-16T06:50:01,0.04,0.06,0.06

The thing to note here, in case it escaped your eyeball, is that the second call to ‘findall’ feeds an argument relative to the ‘ts’ object rather than the ‘tree’ object.

This data is ready for Timeplot, and now it’s just a matter of somehow generating the files with the appropriate HTML and JavaScript in them to present the information. I have absolutely no clue how to easily use dynamic variables from Python to easily generate static HTML and JavaScript, so what I have in that area of my code is not something I want to share, out of sheer embarrasment. If someone has done that, let me know. PlanetPlanet does not output JavaScript, best I can tell, but it does output HTML, so I’ll be checking that part of the code out (probably uses BeautifulSoup I guess?). Input on that is hereby solicited!

Show Me Your Python SysAdmin One-Liners!

Wednesday, July 16th, 2008

Ah, the lazyweb. Today, I’m putting together content for a class I’m teaching on basic Linux administration, but during my meeting with a group of trainees to determine the scope of the course, they requested that I completely skip any coverage of “perl -e” one-liners, and show them the Python equivalents. Of course, I found this page, which has a few, but I figured I’d put out the call for more, just to get a good collection of ideas, and a higher-level idea of how people are using Python for system administration for ‘quick-n-dirty’ jobs. If I get a bunch of interesting ones, I’ll collect them all somewhere for easy reference (or add them to the wiki linked above?), so link this callout wherever pythonistas can be found.

Oddly enough, my experience with Python has me going in the completely opposite direction: I don’t write as many one-liners as I did with perl. If it’s not obvious to me how to do something with sed, awk, grep, find, xargs, and the “regular” tools, I write a Python script. I’ve tried remembering some things I used nasty Perl one-liners for, but I guess they were sufficiently nasty that I’ve forgotten them.

By the way, if you’re a sysadmin who writes their tools using Python, do consider giving a talk at this year’s PyWorks conference in November!