Category Archives: Database

PyTPMOTW: PsycoPG2

What is this module for?

Interacting with a PostgreSQL database in Python.

What is PostgreSQL?

PostgreSQL is an open source relational database product. It has some more advanced features, like built-in networking-related and GIS-related datatypes, the ability to script stored functions in multiple languages (including Python), etc. If you have never heard of PostgreSQL, get out from under your rock!

Making Contact

Using the pscyopg2 module to connect to a PostgreSQL database couldn’t be simpler. You can use the connect() method of the module, passing in either the individual arguments required to make contact (dbname, user, etc), or you can pass them in as one long “DSN” string, like this:

dsn = "host=localhost port=6000 dbname=testdb user=jonesy"
conn = psycopg2.connect(dsn)
conn.set_isolation_level(psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT)

The DSN value is a space-delimited collection of key=value pairs, which I construct before sending the dsn to the psycopg2.connect() method. Once we have a connection object, the very first thing I do is set the connection’s isolation level to ‘autocommit’, so that INSERT and UPDATE transactions are committed automatically without my having to call conn.commit() after each transaction. There are several isolation levels defined in the psycopg2.extensions package, and they’re defined in ‘extensions’ because they go beyond what is defined in the DB API 2.0 spec that is typically used as a reference in creating Python database modules.

Simple Queries and Type Conversion

In order to get anything out of the database, we have to know how to talk to it. Of course this means writing some SQL, but it also means sending query arguments in a format understood by the database. I’m happy to report that psycopg2 does a pretty good job of making things “just work” when it comes to converting your input into PostgreSQL types, and converting the output directly into Python types for easy manipulation in your code. That said, understanding how to properly use these features can be a bit confusing at first, so let me address the source of a lot of early confusion right away:

cur = conn.cursor()
cur.execute("""SELECT id, fname, lname, balance FROM accounts WHERE balance > %s""", min_balance)

Chances are, min_balance is an integer, but we’re using ‘%s’ anyway. Why? Because this isn’t really you telling Python to do a string formatting operation, it’s you telling psycopg2 to convert the incoming data using the default psycopg2 method, which converts integers into the PostgreSQL INT type. So, you can use “%s” in the ‘execute()’ method to properly convert integers, strings, dates, datetimes, timedeltas, lists, tuples and most other native Python types to a corresponding PostgreSQL type. There are adapters built into psycopg2 as well if you need more control over the type conversion process.

Cursors

Psycopg2 makes it pretty easy to get your results back in a format that is easy for the receiving code to deal with. For example, the projects I work on tend to use the  RealDictCursor type, because the code tends to require accessing the parts of the resultset rows by name rather than by index (or just via blind looping). Here’s how to set up and use a RealDictCursor:

curs = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
curs.execute("SELECT id, name FROM users")
rs = curs.fetchall()
for row in rs:
   print rs['id'], rs['name']

It’s possible you have two sections of code that’ll rip apart a result set, and one needs by-name access, and the other just wants to loop blindly or access by index number. If that’s the case, just replace ‘RealDictCursor’ with ‘DictCursor’, and you can have it both ways!

Another nice thing about psycopg2 is the cursor.query attribute and cursor.mogrify method. Mogrify allows you to test and see how a query will look after all input variables are bound, but before the query is sent to the server. Cursor.query prints out the exact query that was actually sent over the wire. I use cursor.query in my logging output all the time to catch out-of-order parameters and mismatched input types, etc. Here’s an example:

try:
    curs.callproc('myschema.myprocedure', callproc_params)
except Exception as out:
    print out
    print curs.query

Calling Stored Functions

Stored procedures or ‘functions’ in PostgreSQL-speak can be immensely useful in large complex applications where you want to enforce business rules in a single place outside the domain of the main application developers. It can also in some cases be more efficient to put functionality in the database than in the main application code. In addition, if you’re hiring developers, they should develop in the standard language for your environment, not SQL: SQL should be written by database administrators and developers, and exposed to the developers as needed, so all the developers have to do is call this newly-exposed function. Here’s how to call a function using psycopg2:

callproc_params = [uname, fname, lname, uid]
cur.callproc("myschema.myproc", callproc_params)

The first argument to ‘callproc()’ is the name of the stored procedure, and the second argument is a sequence holding the input parameters to the function. The input parameters should be in the order that the stored procedure expects them, and I’ve found after quite a bit of usage that the module typically is able to convert the types perfectly well without my intervention, with one exception…

The UUID Array

PostgreSQL has built-in support for lots of interesting data types, like INET types for supporting IP addresses and CIDR network blocks, and GIS-related data types. In addition, PostgreSQL supports a type that is an array of UUIDs. This comes in handy if you use a UUID to identify items and want to store an array of them to associate with an order, or you use UUIDs to track messages and want to store an array of them together to represent a message thread or conversation. To get a UUID array into the database quickly and easily, it’s really not too difficult. If you have a list of strings that are UUID strings, you can do a quick conversion, call one function, and then use the array like any other input parameter:

my_uuid_arr = [uuid.UUID(i) for i in my_uuid_arr]
psycopg2.extras.register_uuid()
callproc_params = [
myvar1,
myvar2,
my_uuid_arr
]

curs.callproc('myschema.myproc', callproc_params)

Connection Status

It’s not a given that your database connection lives on from query to query, and you shouldn’t really just assume that because you did a query a fraction of a second ago that it’s still around now. Actually, to speak about things more Pythonically, you *should* assume the connection is still there, but be ready for failure, and check the connection status to diagnose and help get things back on track. You can check the ‘status’ attribute of your connection object. Here’s one way you might do it:

    @property
    def active_dbconn(self):
        return self.conn.status in [psycopg2.extensions.STATUS_READY, psycopg2.extensions.STATUS_BEGIN]:

So, I’m assuming here that you have some object that has a connection object that it refers to as ‘self.connection’. This one-liner function uses the @property built-in Python decorator, so the other methods in the class can either check the connection status before attempting a query:

if self.active_dbconn:
    try:
        curs.execute(...)
    except Exception as out:
         logging.error("Houston we have a problem")

Or you can flip that around like this:

try:
   curs.execute(...)
except Exception as out:
    if not self.active_dbconn:
        logging.error("Execution failed because your connection is dead")
    else:
         logging.error("Execution failed in spite of live connection: %s" % out)

Read On…

A database is a large, complex beast. There’s no way to cover the entirety of a database or a module that talks to it in a simple blog post, but I hope I’ve been able to show some of the more common features, and maybe one or two other items of interest. If you want to know more, I’m happy to report that, after a LONG time of being unmaintained, the project has recently sprung back to life and is pretty well-documented these days. Check it out!

Python, PostgreSQL, and psycopg2′s Dusty Corners

Last time I wrote code with psycopg2 was around 2006, but I was reacquainted with it over the past couple of weeks, and I wanted to make some notes on a couple of features that are not well documented, imho. Portions of this post have been snipped from mailing list threads I was involved in.

Calling PostgreSQL Functions with psycopg2

So you need to call a function. Me too. I had to call a function called ‘myapp.new_user’. It expects a bunch of input arguments. Here’s my first shot after misreading some piece of some example code somewhere:

qdict = {'fname': self.fname, 'lname': self.lname, 'dob': self.dob, 'city': self.city, 'state': self.state, 'zip': self.zipcode}

sqlcall = """SELECT * FROM myapp.new_user( %(fname)s, %(lname)s,
%(dob)s, %(city)s, %(state)s, %(zip)s""" % qdict

curs.execute(sqlcall)

There’s no reason this should work, or that anyone should expect it to work. I just wanted to include it in case someone else made the same mistake. Sure, the proper arguments are put in their proper places in ‘sqlcall’, but they’re not quoted at all.

Of course, I foolishly tried going back and putting quotes around all of those named string formatting arguments, and of course that fails when you have something like a quoted “NULL” trying to move into a date column. It has other issues too, like being error-prone and a PITA, but hey, it was pre-coffee time.

What’s needed is a solution whereby psycopg2 takes care of the formatting for us, so that strings become strings, NULLs are passed in a way that PostgreSQL recognizes them, dates are passed in the proper format, and all that jazz.

My next attempt looked like this:

curs.execute("""SELECT * FROM myapp.new_user( %(fname)s, %(lname)s,
%(dob)s, %(city)s, %(state)s, %(zip)s""", qdict)

This is, according to some articles, blog posts, and at least one reply on the psycopg mailing list “the right way” to call a function using psycopg2 with PostgreSQL. I’m here to tell you that this is not correct to the best of my knowledge.The only real difference between this attempt and the last is I’ve replaced the “%” with a comma, which turns what *was* a string formatting operation into a proper SELECT with a psycopg2-recognized parameter list. I thought this would get psycopg2 to “just work”, but no such luck. I still had some quoting issues.

I have no idea where I read this little tidbit about psycopg2 being able to convert between Python and PostgreSQL data types, but I did. Right around the same time I was thinking “it’s goofy to issue a SELECT to call a function that doesn’t really want to SELECT anything. Can’t callproc() do this?” Turns out callproc() is really the right way to do this (where “right” is defined by the DB-API which is the spec for writing a Python database module). Also turns out that psycopg2 can and will do the type conversions. Properly, even (in my experience so far).

So here’s what I got to work:

callproc_params = [self.fname, self.lname, self.dob, self.city, self.state, self.zipcode]

curs.callproc('myapp.new_user', callproc_params)

This is great! Zero manual quoting or string formatting at all! And no “SELECT”. Just call the procedure and pass the parameters. The only thing I had to change in my code was to make my ‘self.dob’ into a datetime.date() object, but that’s super easy, and after that psycopg2 takes care of the type conversion from a Python date to a PostgreSQL date. Tomorrow I’m actually going to try calling callproc() with a list object inside the second argument. Wish me luck!

A quick cursor gotcha

I made a really goofy mistake. At the root of it, what I did was share a connection *and a cursor object* among all methods of a class I created to abstract database operations out of my code. So, I did something like this (this is not the exact code, and it’s untested. Treat it like pseudocode):

class MyData(object):
   def __init__(self, dsn): 
      self.conn = psycopg2.Connection(dsn)
      self.cursor = self.conn.cursor()

   def get_users_by_regdate(self, regdate, limit):
      self.cursor.arraysize = limit 
      self.cursor.callproc('myapp.uid_by_regdate', regdate)
      while True: 
         result = self.cursor.fetchmany()
         if not result:
            break 
         yield result

   def user_is_subscribed(self, uid): 
      self.cursor.callproc('myapp.uid_subscribed', uid)
      result = self.cursor.fetchone()
      val = result[0]
      return val

Now, in the code that uses this class, I want to grab all of the users registered on a given date, and see if they’re subscribed to, say, a mailing list, an RSS feed, a service, or whatever. See if you can predict the issue I had when I executed this:

    db = MyData(dsn)
    for id in db.get_users_by_regdate([joindate]):
        idcount += 1
        print idcount
        param = [id]
        if db.user_is_subscribed(param):
            print "User subscribed"
            skip_count += 1
            continue
        else:
            print "Not good"
            continue

Note that the above is test code. I don’t actually want to continue to the top of the loop regardless of what happens in production :)

So what I found happening is that, if I just commented out the portion of the code that makes a database call *inside* the for loop, I could print ‘idcount’ all the way up to thousands of results (however many results there were). But if I left it in, only 100 results made it to ‘db.user_is_subscribed’.

Hey, ’100′ is what I’d set the curs.arraysize() to! Hey, I’m using the *same cursor* to make both calls! And with the for loop, the cursor is being called upon to produce one recordset while it’s still trying to produce the first recordset!

Tom Roberts, on the psycopg list, states the issue concisely:

The cursor is stateful; it only contains information about the last
query that was executed. On your first call to “fetchmany”, you fetch a
block of results from the original query, and cache them. Then,
db.user_is_subscribed calls “execute” again. The cursor now throws away all
of the information about your first query, and fetches a new set of
results. Presumably, user_is_subscribed then consumes that dataset and
returns. Now, the cursor is position at end of results. The rows you
cached get returned by your iterator, then you call fetchmany again, but
there’s nothing left to fetch…

…So, the lesson is if you need a new recordset, you create a new cursor.

Lesson learned. I still think it’d be nice if psycopg2 had more/better docs, though.

New Job, Car, Baby, and Other News

New Baby!

I know this is my geek blog, but geeks have kids too, so first I want to announce the birth of our second daughter, Sadie, who was born on September 15th. She’s now over a month old. This is the first time I’ve stayed up late enough to blog about her. Everyone is healthy, if slightly sleep-deprived :)

New Job!

The day before Sadie’s birth, I got a call with an offer for a job. A *full-time* job, as a Senior Operations Developer for MyYearbook.com. After learning about the cool and very geeky things going on at MyYearbook during the interview process, I couldn’t turn it down. I started on October 5, and it’s been a blast digging into all of the cool stuff going on there. While I’m certainly doing my fair share of PHP code review, maintenance, and general coding, I’m also getting plenty of hours in working out the Python side of my brain. I’m finding that while it’s easier switching gears than I had anticipated, I do make some really funny minor syntax errors, like using dot notation to access object attributes in PHP ;-P

What I find super exciting is something that might turn some peoples’ stomachs: at the end of my first week, I sat back and looked at my monitors to find roughly 15 tabs in Firefox open to pages explaining various tools I’d never gotten to use, protocols I’ve never heard of, etc. I had my laptop and desktop both configured with 2 virtual machines for testing and playing with new stuff. I had something north of 25 terminal windows open, and 8 files open in Komodo Edit.

Now THAT, THAT is FUN!

The projects I’m working on run the gamut from code cleanups that nobody else has had time to do (a good tool for getting my brain wrapped around various parts of the code base), to working on scalability solutions and new offerings involving my background in coding *and* system administration. It’s like someone cherry-picked a Bay Area startup and dropped it randomly 30 minutes from my house.

My own business is officially “not taking new clients”. I have some regular clients that I still do work for, so my “regulars” are still being served, but they’ve all been put on notice that I’m unavailable until the new year.

New Car!

I’m less excited about the new car, really. I used to drive a Jeep Liberty, and I loved it. However, in early September, before Sadie’s arrival, it became clear to me that putting two car seats in that beast wasn’t going to happen. The Jeep is great for drivers, and it has some cargo space. It’s not a great vehicle for passengers, though.

At the same time, I was running a business (this was before the job offer came along), and I was finding myself slightly uncomfortable delivering rather serious business proposals in a well-used 2003 Jeep. So, I needed something that could fit my young family (my oldest is 2 yrs), and that was presentable to clients. So, I got a Lexus ES350.

I like most things about the car, except for the audio system. It seems schizophrenic to me to have like 6 sound ‘zones’ to isolate the audio to certain sets of speakers, but then controls like bass and treble only go from 0 to 5. Huh? And the sound always sounds like it’s lying on the floor for some reason. It’s not at all immersive. The sound system on my Jeep completely kicked ass. I miss it. A lot.

Other News

I’ve submitted an article to Python Magazine about my (relatively) recent work with Django and my (temporarily stalled) overhaul of LinuxLaboratory.org, and my experiences with various learning resources related to Django. If you’re looking to get into Django, it’s probably a good read.

I’ve been getting into some areas of Python that were previously dark, dusty corners, so hopefully I’ll be writing more about Python here, because writing about something helps me to solidify things in my own brain. Short of that, it serves as a future reference point in case it didn’t get solidified enough :)

My sister launched The Dance Jones, a blog where she talks about fitness, balance, dance, and stuff I should probably pay much more attention to (I’m close to declaring war on my gut). Also, if you ever wanted to know how to shoulder shimmy (and who hasn’t wanted to do that?), you should check it out :)

LinuxLaboratory woes, Drupal -> Django?

Ugh…

So, today I tried browsing to one of my sites, linuxlaboratory.org, and found a 403 “Forbidden” error. Calling support, they said it was a “billing issue”. Well, I pay my bills, and I haven’t received any new credit cards, so I’m not sure what that’s about. Further, they haven’t contacted me in any way shape or form at all in a very long time, and I’ve had the same email addresses for years now. Last time they failed to contact me, it was because they were sending all of the mail to “root@localhost” on the web server.

What’s more, the tech support guy, having determined that this wasn’t a technical but an administrative problem, transferred me to a sales person who was not there. I left a message. That was 3 hours ago. So I took matters into my own hands and changed the name server records to my webfaction account, and linuxlaboratory.org now points to an old test version of the site that uses Drupal.

It’s Over Between Us…

Drupal holds the record for the CMS that has run LinuxLaboratory the longest. Since its launch in 2001, LinuxLaboratory has used all of the major, and some of the minor open source PHP CMSes. Drupal gave me something very close to what I wanted, out of the box. Nowadays, Drupal is even nicer since they redid some of the back end APIs and attracted theme and module developers to the project. I’ve even done some coding in Drupal myself, and have to say that it really is a breeze.

But the problem is this: I’m a consultant, trainer, and author/editor. I am an experienced system admin, database admin, and infrastructure architect who makes a living solving other peoples’ problems. I really can’t afford to have something that is super high overhead to maintain running my sites. With Drupal releasing new versions with major security fixes once per month on average, and no automated update mechanism (and no built-in automated backup either), it becomes pretty cumbersome just to keep it updated.

This is in addition to my experiences trying to do e-commerce with Drupal. I tried to use one plugin, but soon found myself in dependency hell — a situation I’m not used to being in unless I’m on a command line somewhere. So, out with Drupal. I know it well and I’m sure I’ll find a use for it somewhere in my travels, but not now, and not for this.

Is Django the Future of LinuxLaboratory?

So I’m thinking of giving Django another shot. In fact, I thought I might try something new and interesting. Maybe I’ll build my Django app right in front of everyone, so that anyone who is interested can follow along, and so people can give me feedback and tips along the way. It also lets me share with people who have questions about a feature I’m implementing or something like that.

For fanboys of <insert technology here>, know this: I’m a technology whore. I consume technology like some people consume oxygen. I love technology, and I get on kicks, and every now and then, a “kick” turns into a more permanent part of my tool chest. Python is one such example. I’ve done lots with Python, but have never really made friends with it for web development. I got a webfaction account specifically because they support Python (and Django). I’ve done nothing with it. Now I think I might.

But not to worry! I own lots of domains that are sitting idle right now, and I’m considering doing a Ruby on Rails app for one of them, and I’m dying to do more with Lua. There’s only so much time!

Webfaction Django Users: Advice Hereby Solicited

So if you’re a webfaction customer using Django, please share your tips with me about the best way to deploy it. I’ve used nothing but PHP apps so far, and found that rather than use the one-click installs webfaction provides, it’s a lot easier to just choose the generic “CGI/PHP” app type and install the code myself. This allows me to, for example, install and update wordpress using SVN. Is Django a similar story, or does webfaction actually have an auto-upgrade mechanism for this? How are you keeping Django up to date?

Thanks!

I’m Offering Pro-Bono Consulting

I started my company about a year ago, but I’ve been doing consulting for a long time. In fact, my first job in the IT industry was working for a consulting firm. Before that, starting as far back as grade school, I was involved in a lot of volunteer civic and community service activities. I admire companies who get involved in their communities, or even outside of their communities, wherever help is needed.

As part of my business plan, I’ve put in place a policy of accepting one pro-bono consulting project per year. So far, I haven’t gotten any requests for free consulting work, so here’s my public shout out to let you know what types of services are available:

1. Speaking or Training. My specialties are things like advanced Linux administration and SQL, but I’m perfectly capable of delivering content for people who just need to know how the internet works, or want to know more about social media.Training, funny enough, has been the bulk of my business for the past year.

2. I can help with MySQL performance tuning on *nix systems, including finding hotspots related to the design of the database itself, or how your application code interacts with the database. If it happens that your MySQL server is performing poorly due to an underpowered system, I can also pinpoint which resource is dragging on the performance of your database.

3. If you just need random scripts written to perform *nix system administration tasks, I can consult with you about the requirements and write them for you. Note that while I can script in several languages, my preference for anything longer than 40 lines of code is Python.

4. I can build PC’s, install networks, set up firewalls and wireless routers, and all of the normal “office IT” functions, but note that my consulting is Linux consulting. I don’t work with Windows (well, I do, but not for free) ;-)

5. If there’s some other thing you’ve seen me blog about here, chances are I’ll be willing to perform a pro-bono consulting engagement to do it for you, or show you how to approach a problem, a large project, a migration, automation, monitoring, security or whatever.

Unless you happen to live within commuting distance to Princeton, NJ, work will be done remotely :)

Please email your request to jonesy at owladvisors dot com. Include your organization’s name, your contact info, and as much detail about the project and what your organization does as possible. The decision of which project to take on will be based solely on the information in your request!

Activity Lapse: I blame Twitter

To all my geek/nerd friends in the blogosphere: I’ll be posting updates on Fedora Directory Server, my Linux training courses, and more in the coming weeks, but I wanted to let you know that I’ve recently been stricken with… umm… Twitter. I’m @bkjones on twitter, so if you’re into beer, brewing, billiards, cooking, guitar/music, linux, system administration, perl, shell, python, php, databases, sql, or anything like that, lemme know, or follow me!

Fedora Directory Server on RHEL 4 and 5, Pt. 1

The last time I had to do a NIS->LDAP migration, it was in a heterogenous environment with Solaris and Linux boxes, and it was around 2004 or so. Although I hit some rough patches adjusting to changes in how FDS is packaged, the community was awesome, and helped me get back up to speed in no time. We shouldn’t forget that the community was what drove me from OpenLDAP to FDS in the first place.

But I digress. The purpose of this article (first of a series) is to share with you some technical information about how to get things going. How, exactly, do you get RHEL 4, and RHEL 5 to utilize Fedora Directory Server’s data to support NSS and PAM for user information and authentication, and autofs for automounting directories? There are documents on this, written by people who clearly do (or did) care, but at times they can be a little disjointed, a little outdated, and require some tweaking.

This document talks specifically about installing the fedora-ds-1.1.2-1.fc6 package on RHEL 5.2, populating the People and Groups trees, and testing that it actually works. Later posts will deal with getting RHEL 4 and 5 clients to talk to it for various purposes, using TLS (with certificate verification, btw).

If your real issue is understanding how LDAP data works, why it looks the way it does, or you need a refresher, I would urge you to look at two other articles I wrote for O’Reilly, devoted completely to the topic: here, and here.

Get it installed

There is no precompiled binary package of Fedora Directory Server built specifically for Red Hat
Enterprise Server (because Red Hat, of course, provides that, with support, for a fee). If you want to run FDS for free on a RHEL server, the installation process is somewhat non-trivial.  First, you must add a couple of new package repositories to your yum configuration:

cd /etc/yum.repos.d/
sudo wget http://directory.fedoraproject.org/sources/idmcommon.repo
sudo wget http://directory.fedoraproject.org/sources/dirsrv.repo

Then, you’ll need to import a couple of keys in order to verify signatures of the packages we’ll install
later:

sudo rpm --import \

http://archives.fedoraproject.org/pub/archive/fedora/linux/core/6/i386/os/RPM-GPG-KEY-fedora

sudo rpm --import \

http://archives.fedoraproject.org/pub/archive/fedora/linux/core/6/i386/os/RPM-GPG-KEY-Fedora-Extras

Next, install some prerequisite packages (you could do this first – these come from standard
repositories, not the new ones we added):

sudo yum install svrcore mozldap perl-Mozilla-LDAP libicu

You’ll need jss, and I wasn’t able to get it via a repository, so I downloaded it using a URL directly:

sudo rpm -ivh http://download.fedoraproject.org/pub/fedora/linux/extras/6/x86_64/jss-4.2.5-1.fc6.x86_64.rpm

Next, install ldapjdk (used by the FDS console application), and finally, the directory server itself:

sudo yum install ldapjdk
sudo yum install fedora-ds

WIth these packages installed, the next thing to check is that permissions are set up correctly, otherwise the initial setup script will fail:

sudo chown -R nobody:nobody /var/lock/dirsrv; sudo chmod -R u=rwX,go=
/var/lock/dirsrv
sudo chown nobody:nobody /var/run/dirsrv; sudo chmod -R u=rwX,go= /var/run/dirsrv

Finally, run the setup script which was installed with the fedora-ds package:

sudo /usr/sbin/setup-ds-admin.pl

Populating the Direcotory

The directory initially consists of a top-level entry representing the domain, and by default, FDS creates for you two “organizational units”, which are subtrees representing “People” and “Groups”. I’ll create an LDIF file for the Groups first, but there’s no reason to go in any particular order. We’re just adding data, and LDAP isn’t relational: you can add People objects who are members of Groups that aren’t in the tree yet. Here’s my LDIF file for the groups:

dn: cn=wheel,ou=Groups,dc=example,dc=com
objectClass: posixGroup
objectClass: top
cn: wheel
gidNumber: 1000
memberUid: jonesy
memberUid: tasha
memberUid: molly 

dn: cn=eng,ou=Groups,dc=example,dc=com
objectClass: posixGroup
objectClass: top
cn: eng
gidNumber: 1001

For the moment, only ‘wheel’ contains any actual members. No biggie, you can add members to groups later, or add more groups later whenever you want. Once the clients are configured, there’s no restarting of anything to get them to pick up changes to data in the LDAP data.

It’s easy to use the OpenLDAP tools to add data to FDS, but I’m going to use the FDS-supplied tool here to insert this data:

/usr/lib64/mozldap/ldapmodify -a -D "cn=Directory Manager" -w - -h localhost -p 389
-f ~/groups.ldif -c

If you’re familiar with the OpenLDAP tools, this probably doesn’t look too scary. The OpenLDAP tools require a ‘-x’ flag to bypass SASL. Aside from that, pretty straightforward.

To populate the “People” tree in FDS, or any other LDAP product, I wrote a really cheesy awk script that I can pipe the contents of /etc/passwd or ‘ypcat passwd’ through and get good results with only minor tweaking. Redirect the output to a file called ‘people.ldif’, and then you can populate your “People” tree:

/usr/lib64/mozldap/ldapmodify -a -D "cn=Directory Manager" -w - -h localhost -p 389
-f ~/people.ldif

At any time, you can verify that your FDS installation is returning results by running a query like this:

/usr/lib64/mozldap/ldapsearch -b dc=example,dc=com objectclass=organizationalUnit

I have a few more posts to follow this one. One is one getting SSL/TLS working (either one, perhaps both), creating a root CA and setting things up with certutil, another on getting the RHEL 4 and 5 clients to use LDAP, and another separate one for configuring autofs to talk to LDAP, which is a little different between RHEL 4 and 5. Subscribe to this blog in your reader to be updated as those posts come out over the next 2 weeks.

Teaching a Course on Profiling and Debugging in Linux

Dear Lazyweb,

So, I’ve been in Chicago for a week teaching a beginner and an intermediate course on using and administering Linux machines. This week, I’ll teach an intermediate and an advanced course on Linux, and the advanced course will cover profiling and debugging. The main tools I’m covering will be valgrind and oprofile, though I’ll be going over lots of other stuff, like iostat, vmstat, strace, what’s under /proc, and some more basic stuff like sending signals and the like.

So what makes me a bit nervous is, being that the advanced students are mostly CS-degree-holding system developers, they’ll probably be expecting me to know very low-level details of how things are implemented at  the system/kernel level. I’d love to know more about that myself, and actively try to increase my knowledge in that area! Alas, most of my experience with low-level tools like this is in the context of trying to understand how things like MySQL do their jobs.

They may also turn up their nose at the admin-centric coverage that I believe is actually very important in order to get a complete view of the system and to reduce duplication of effort. Of course, I’ll use a bit of time at the beginning of day 1 to properly set the expectation, and we’ll see how they respond. As they say in the hospitality industry, presentation is everything.

The portion of the course that covers valgrind and oprofile won’t be until Thursday, or perhaps even Friday, so I figured I’d take this opportunity to ping the lazyweb and find out a couple of things:

  • What tools do you use in conjunction with valgrind and/or oprofile?
  • What kinds of problems are you solving with these and similar tools?
  • What most annoys you about these and similar tools?
  • Do you use these tools for development, administration, or both?
  • If you have cool links, share!
  • If you’ve been able to make effective use of oprofile inside of a vmware instance, share (because my thinking is that this probably *should* be nearly impossible unless vmware actually simulates the hardware counters oprofile needs access to!)
  • This one is just for me, not the course: are there any demos/tutorials on using valgrind with Python? I’ve seen the standard suppression file, but it still seems like profiling a Python script would be difficult being that you are actually going to be profiling the interpreter (or so it seems).

Thanks!

2009: Waiting to Exhale

Lots of blogs list a bunch of stuff that happened in the year just past, and I have done a year-in-review post before, but in looking back at posts on this blog and elsewhere, what strikes me most is not the big achievements that took place in technology in 2008, but rather the questions that remain unanswered. So much got started in 2008 — I’m really excited to see what happens with it all in 2009!

Cloud Computing

Technically, the various utility or ‘cloud’ computing initiatives started prior to 2008, but in my observation, they gained more traction in 2008 than at any other time. At the beginning of 2008, I was using Amazon’s S3, and testing to expand into more wide use of EC2 during my time as Technology Director for AddThis.com (pre-buyout). I was also investigating tons of other technologies that take different approaches to the higher-level problem these things all try to solve: owning, and housing (and cooling… and powering…) equipment. Professionally, I’ve used or tested heavily AppLogic, GoGrid, and all of the Amazon services. Personally, I’ve also tried Google App Engine.

2008 was a banner year for getting people to start tinkering with these technologies, and we’ve seen the launch of ‘helper’ services like RightScale, which puts a very pretty (and quite powerful) face on the Amazon services. The question now is whether the cost-benefit analyses, and the security and availability story is going to be compeling enough to lure in more and bigger users. I think 2009 is going to be the year that makes or breaks some of these initiatives.

The other question I have about cloud computing, which I’ve been asking since the last half of 2007, is “where does all of this leave the sysadmin?” It seems to me that a great many of the services being trotted out for users to play with seek to provide either user-level GUI interfaces, or low-level developer-centric interfaces to solve problems that historically have been the purview of system administrators. I’ve been wondering if it will force sysadmins to become more dev-centric, developers to become more system-savvy, if it will force more interaction between the two camps, or if it means death to sysadmins on some level, to some degree, or for some purposes.

I really think there’s a lot of hype surrounding the services, but I also think there’s enough good work being done here that 2009 could begin to reveal a sea change in how services are delivered and deployed on the web.

Drizzle

If you’re working in the web 2.0, uber-scaling space, and you’re using MySQL, chances are your relationship with your database is less ideal than it was when you were using it to run your blog or your recipe database. As you try to scale MySQL through various means, you find that there are lots of things that could be handled better to make MySQL scale more gracefully. Some extra internal accounting and instrumentation would also be nice. In many cases, it would also be nice to just cut out all of the crap you know you’re not going to use. If you’re looking to sharding, it would be good if there was a database that was born after the notion of sharding became widely understood.

Drizzle is a project started by some MySQL gurus to take a great experimental leap toward what could become a beacon in the dark sea of high scalability. At the very least, it will serve as a foundation for future work in creating databases that are more flexible, more manageable, and, more easily scaled. Of course, it’s also likely that Drizzle will be tied more closely to a slightly narrower audience, but I can say from experience that had the ideals of the Drizzle team been fully realized in an open source product prior to 2008, I may not have even installed MySQL in the first place. I had at least a passing familiarity with what I was getting myself into, and pulled the trigger to use MySQL based on criteria that deviated somewhat from pure technological merit. ;-)

I don’t believe Drizzle has announced any kind of timeline for releases. I wouldn’t expect them to. Instead, the first release will probably be announced on blogs in various places with links to downloads or something. The Cirrus Milestone for the project seems to focus quite a bit on cleanup, standardization, and things that, to prospective deployers, are relatively uninteresting. But I think 2009 will at least see Drizzle getting to the point where it can support more developers, and make more progress, more quickly. In 2009, I think we’ll see people doing testing with Drizzle with more serious goals in mind than just tinkering, and I think in 2010 we’ll see production employments. Call me crazy – it’s my prediction.

Microsoft

Windows market share on the desktop, it was recently reported by IDC, has dropped below 90% for the first time in something like 15 years, to 89.6%. Mac users now represent 9.1% of the market, and the rest is owned by Linux, at a paltry 0.9%.

It would seem that OS X has eaten away a few percentage points from Windows, and done perhaps more damage to the Linux space. I have no data to back that up at the moment – I’m going by the enormous shift from Linux to OS X between OSCON 2006 and OSCON 2008. I’ll let you know what I see at LISA 2009, which I plan to attend.

But what about Microsoft? Sure, they’re the company IT wonks love to hate, but the question of how their apparent (marketed) direction will affect their products and business is one that truly fascinates me. Microsoft has become the Herbert Hoover of American software companies, while Apple is FDR, perceived as having saved many of us from the utter depression and despair of the Hoover years (insert joke about sucking here).

Microsoft is enormous. It moves horribly slowly. It has shown a stubborness in the past that would seem difficult for something so large to shake off. Their products reflect this big, slow, obstinacy. What end users need is a software company that is going to lead its users in the direction they’re all moving in already on their own. It can no longer be about “allowing users” to do things (Ballmer has used such phrasing in the past). It needs to be about enabling and empowering, and getting the hell out of the user’s way.

The big question I think 2009 will answer is whether or not Ray Ozzie can affect change to either the culture, or the mechanics of how Microsoft does business (either one is likely to have a drastic effect on the other).

Python 3.0

It’s here already. I, for one, am quite excited about it. I think that GvR, Alex Martelli, Steve Holden, and others have put forth a very admirable effort to communicate with users and developers about what changes are imminent, what they mean, and how to prepare to move forward. I think 2009 is going to require 100% of the communication effort expended in 2008 in order to continue to rally the troops. I don’t know, but would imagine that the powers that be can see that as well, and so it will be. Assuming I’m right there, adoption will increase in the community, and the community buzz resulting from the wider adoption will begin to take some of the pressure off of the really big names, who quite honestly have craploads of other things to work on!

I believe that by summer 2009 we’ll see Python 2.6 migrations happening more rapidly, and a year out from that point we’ll start to see the wave of 3.0 migrations building to more tsunami-like proportions.

Another question: is there sufficien new adoption of Python going on to register 3.0 on the usage scale? Probably not now, but hopefully in 2009…

USA Gets a CTO

I’ve read a few articles about this, but all I’ve read really just amounts to noise and speculation. What, exactly, will the CTO be charged with? I’ve seen Ed Felten floated as a candidate for the position, but he’s not a person who’s going to want to run in and try to herd cats to try to standardize their desktop computing platform. I think if the CTO position is going to take charge of the things Felten has already shown a keen interest in (namely, high-level IT policy, the effect of technology on society, privacy and security… as it relates to the former two items, etc), then there could be nobody better for the job. Princeton’s Center for Information Technology Policy is one of the few places (maybe the only place) I’d actually take a pay cut to join ;-P

I imagine that 2009 will answer the questions surrounding the nation’s very first CTO.

It’s The Economy!

I’m a freelance technology consultant and trainer. Anyone who is making a living freelancing is probably wondering about the state of the economy, no matter where they live (incidentally, I live in the US). The numbers aren’t good. The S&P is down something like 41% this year – the largest drop on record. The state of the markets in general, along with the failing of the banks and their subsequent appearance in Senate committee hearings, as well as the deflationary spiral in the housing market (and predicted more general deflationary spiral) invoke images of bread lines and soup kitchens… or at least very little work for freelancers.

Personally, I have a lot to lose if things *really* go south to the degree that they did in the 1930′s, but I have to say that I don’t think it’ll happen. If you’re worried about this becoming the next Great Depression and are really losing sleep over it, I recommend you read a book called “The Great Depression” by Robert S. McElvaine. There are probably tons of books you can read, but this is one I happen to like. It’s full of both fact and opinion, but the opinions are well-reasoned, and loudly advertised as being opinions (you’re not likely to find a book about any topic relating to economics that isn’t full of opinions anyway).

What I think you’ll find is that, while there are a lot of parallels between now and then, there are lots of things that *aren’t* parallel as well (partly as a result of the depression – for example, the US is no longer on the gold standard, and both banks and securities trading are infinitely more regulated now). Also, not all of the parallels are bad. For example, things began to improve (though slightly at first) almost the day a new Democratic leader replaced the outgoing Republican regime.

My advice (which I hope I can follow myself): If the market numbers bother you, don’t look. Service your customers, don’t burn any bridges, rebuild the ones you can, build new ones where you can, and above all, Do Good Work. When you don’t have work, market, volunteer, and build your network and friendships. Don’t eat lunch alone, as they say.

What are you wondering about?

My list is necessarily one-sided. A person can be into only so many things at once. What kinds of tech-related questions are you searching for answers on as we enter the new year?

What do you find lacking/awesome in tech training classes?

Dear lazyweb, 

Over the past year, I’ve spoken to a few clients about performing on-site training for their staff in things like Linux administration, SQL, PHP, etc. I’ve also gotten a few training contracts as a result, and those contracts have gone quite well, and I have some repeat business already! I really really enjoy that line of work (and my consulting work keeps my skillset sharp and insures I won’t get ‘stale’). 

What I think my current clients like is that they already know my work and are confident in my knowledge of the areas I’m training in, and they love that I’ll create custom content for them instead of having static, inflexible, prepackaged classes. 

Technical people, though, are extremely, excruciatingly scrutinizing, though. We’re a lot that likes to find problems with things, because we like to fix problems. We also (some of us, at least) believe that anything worth doing is worth doing right, and that’s my goal. So, although I’m also a part of that scrutinizing, problem-solving crowd, I’m also aware that I don’t have a monopoly on valuable opinions regarding how training is put together, delivered, etc. 

So, if you have had experiences, good or bad, with in-person training classes, or if something in one of those classes stood out to you, or something won’t leave your brain about your experience, I’d love to hear it!