Finding Needles With ’sort’ and ‘uniq’

I had to do this recently, and so I thought it would be useful to share this for two reasons:

  1. Someone else may need to do it and find this technique useful
  2. Someone else may know a better way of doing this

Quick ‘n’ dirty explanation: you have two lists. One list is a superset of the other list. You want to identify all of the items that exist *only* in the larger list. Here’s how you do that:

cat small_list >> largelist; sort largelist | uniq -u

Note that ‘uniq -u’ is not the same as ‘sort -u’. The former will display only the lines in the file that occur once. The latter displays all lines in the file, *once*, regardless of how many times they occur in the file.

Longer example explanation: I have an LDAP server, and at some point we added an objectclass and associated attribute to every user account. However, new accounts weren’t being *created* with the objectclass and attribute. At some point, I figured out that there was some inconsistency between account objects, and figured I had better get a list of accounts that didn’t have the objectclass and attribute so I could correct the situation. Problem is, you can’t negate a search using the standard ‘ldapsearch’ command line tools. So I can’t ask for all objects where ‘objectclass != myobjectclass’ or something.

What I did was two ldapsearches. One for all of the objects in that part of the tree, and then another for all objects in that part of the tree with the objectclass in place. Of course, the former list is a superset of the latter, and then we do ‘cat subset >> superset; sort superset | uniq -u’ – and that will be the list of people who do *not* have the objectclass associated with their account entry in the directory server.

Technorati Tags: , , , , , , , , ,

Social Bookmarks:

View Vindication

So, about 9 months ago, I worked with a team of researchers. They were building a pretty hardcore global distributed system, and associated management infrastructure. My job was to simply advise them on issues revolving around how they use their database back end. For the most part, I just made suggestions here and there about normalizing their schema and helping them adjust a few queries to deal with the normalization. But there was this one thing….

I told them that if they had certain views of the data that were used all over the place by their applications, they might consider creating “views” in the database to provide that view of the data without denormalizing the data in the base tables. The team quickly caught on, and I left them to it. A few days later, it was clear that I had created a monster. One of the team members had created some views that were… unique. He was using them as a means of storing not only complex queries, but also some logic to manipulate the data, and format it for him. And, of course, since different functions might use the same basic data, just formatted or manipulated in a different way, there must’ve been lots of these views. And… well, I think you get the picture.

I pointed out that this was bad news on so many levels that to move in this direction was just No Bueno(tm) all over the place. There was like a week-long email thread involving most of the development team at the time, and in the end, I told them that if they still wanted to go this route, after everything I had told them about database design and usage in general, then at least they were making that decision with the knowledge of the possible consequences, and so my job, really, was done. I knew I had at least convinced one of the guys, and he fought tooth and nail with the other guy (they sat right next to eachother). I figured they’d come to some compromise and find a sane way to move forward. My time with them was over and I moved head first right into another project.

Since that time, there was a little turnover in the group. A couple of guys left, and a *few* new guys were brought on. Just a couple of weeks ago, I spoke to one of them about doing a database cluster to work around some performance issues. Then, I got an email from another guy saying they were trying to find the source of some database slowness. This is months after I left the group, so I never dreamed that this had anything to do with “the views”.

Then I got this IM from some nick I didn’t recognize saying “after all these months of fighting, we’ve finally proven that you were right all along”. The guy called me by name, but I didn’t know the nick, so I said “um. What?”. And he proceeded to tell me that, after removing the “views of death” that were put in place in spite of the many super-long email-based database lectures I had sent, their database performance woes were gone. The load on their database server went from “maxed out” to under 5% immediately. The load graph was astounding.

I had no idea, and in fact could not fathom, that these views would ever make it to production. Vindication is good.

For the record, it’s not views themselves that are bad. They’re *good*. In fact, I’m the one who suggested using views! It was the misuse of views that killed the database.

Technorati Tags: , , , , , ,

Social Bookmarks:

Three tips to keep you focused

If you’ve read my previous posts relating to time management, you might realize by now that I tend to approach it from the opposite direction of a lot of other information sources. My philosophy is that it is easier (for me) to identify things that represent a mismanagement of time and find creative solutions to those problems than it is to collect ‘rules of thumb’ and try to wedge them into your daily routine.

Below I’ve outlined three ways that you can get time wasting activities out of your face and into the background so that your *real* work stays in the forefront of your environment and your brain. I hope you find them useful!

Have your mail check in with you

How many times do you check your mail before lunch? 10? 20? 50 times? It’s probably too often. If you’re not using some mechanism to alert you of mail that really *requires* your attention, you might be wasting more time than necessary.

“But how much time is that really going to save me? It only takes a second to look at my mail!” Well, I guess that might be true, but the context switch of detaching your brain from the current activity, looking at your email, maybe clicking through various folders or checking mail from multiple accounts, and then getting back to your original task can be quite expensive. While the physical act of checking mail might take only a minute, getting your brain back up to speed on where you left off can take a bit longer.

I’ve employed a couple of tools to help make sure I’m not checking mail unless I need to.

  1. Filters. Just about all email clients have some way of shuffling messages into different folders or tagging them based on the subject, sender, some other header, or even the content of the message. In an ideal world, you’d use procmail to handle filters, specifically because it’s not tied to a single client, and so any client you use will have the same view of your email, which is wonderful. What has really helped me is after setting up filters to shuffle things off into the various folders, I created a folder called “Priority”. Anything that warrants dragging me away from work goes in there, and then I use an alerting system that only monitors that one folder. Which brings us to….
  2. Alerts. No matter what OS you’re on, no matter what mail client you use, there’s a way to get alerts working for you. Email alerts have evolved from simply saying ‘hey, there’s new mail’. Nowadays, you should be able to find one that will show you some portion of the incoming message and give you some options on what to do.

Make friends with virtual desktops

This is especially important on a laptop that doesn’t have as much screen real estate as most desktop systems. Popular Linux desktop environments all have virtual desktops, and on the Mac, I highly recommend installing Virtue Desktops to be able to use this functionality on that Mac platform. And I don’t recommend sticking with default settings for your virtual desktops, because it leads to wasting more time. For example, if I put my code editor on one desktop, and keep my email, music player, and IRC client on another desktop, then that means checking mail also throws all of these other time wasters up in my face!

I keep a separate desktop for just for mail and calendar applications. Another one called “chat” holds my group’s Jabber chat room session and an IRC client. I also keep separate desktops called “Coding: Work” and “Coding: Play”, and “Browser: Work” and “Browser: Play”. In the end, I hardly ever actually launch applications, or spend time minimizing or maximizing applications. If I’m in “Coding: Work” and decide to take a break, I might go to “Browser: Play” to check my news feed aggregator, or I might go to “Chat” to see what’s up with my digital buddies.

This might seem like overkill at first. I have 12 virtual desktops! But after a couple of weeks of tweaking things to how you work, you’ll find that it saves you an immense amount of time by making things easier to switch between, which makes context switching a much less expensive operation. Sysadmins are interrupt driven – we can’t control that. But we can control how it affects our work, and this is one way to do that.

Put more at your fingertips

Having separate desktops for certain things can actually cause more frequent context switching. For example, I have a separate desktop for my music player, which plays throughout the day. If I had to go to that desktop every time I decided I wanted to skip a particular song, that would waste a lot of time. Instead, I put controls for the music player in my task bar. Now if I want to skip a song, I just click a button and go back to work. It’s a non-event compared to going to the desktop and being distracted by art work, other songs in the list, etc., etc.

Maybe there are other things that waste your time that you could put at your fingertips. Some people have to know the weather at all times, or they need to have a dictionary handy, or whatever. If it’s something that you can make a non-event without making your environment distractingly cluttered, do it!

Technorati Tags: , , , , , ,

Social Bookmarks:

Vi Key Bindings for your Mac Apps

Well, I found Komodo Editor, and I’m using that right now for coding Python. It’s Vi key bindings are a little quirky sometimes because of the interplay with the popup hints and stuff, but it hasn’t driven me totally crazy yet. However, I just found out that there’s a way to get Vi key bindings on *any* application on a Mac that uses the Cocoa Text System. So now I’m starting to fiddle a bit with XCode, because with this little plugin in place, and since XCode uses the Cocoa Text System, XCode now has Vi key bindings on my system. :-D

What I *haven’t* found (yet) is a list of applications that use the Cocoa Text System, which would be nicer than launching all of the applications on my system and testing them. It’s not a consistent thing, it seems. XCode uses CTS, but Mail doesn’t, so it’s not like you can just say “all Apple apps use it” or something.

Technorati Tags: , , , , , , ,

Social Bookmarks:

Google Calendar Syncing

So, I’m kinda tired of trying to find a solution to this. What I want is a non-commercial, freely available application (NOT service) that will sync bidirectionally between Google Calendar and Apple iCal, Evolution, and whatever Mozilla calls its calendar today (Sunbird?).

I’ve used Spanning Sync, which worked well enough, but I never liked that my data traversed their servers. Then, they decided to charge me for the privilege of sending my data through their servers, which I didn’t want to do in the first place.

Then I looked at gsync, but I decided not to bother with it because they have commercial aspirations as well, and while they say it’ll only be $20, and it’s a standalone application, I don’t really want to get all set up with it and be let down when they decide it’s worth much more or something.

So I downloaded gcaldaemon, which is an open source, freely available, standalone application, but after about a day of fiddling with it, I couldn’t get it to do much of anything that was useful. What’s more, it’s written in java, so I’m not going to go mucking about with the source code (I don’t code java).

I’ve decided to figure out a solution on my own, as a pet project to help me get used to programming in Python and using the various Google APIs. I just started last night, and I’ve gotten as far as logging in and getting a list of my calendars. I have an extremely long way to go, but between this project and another one I’m doing at work, I should become pretty adept at using Python, and it’s been great fun!

It should be noted that there’s already a Python library for the Google Calendar API. I downloaded it, and I’m using some of that code as example code to get me going, but I’m not using that library as of right now, because I’m more interested in learning than I am in getting a working product immediately. Maybe I’ll do something useful and will be able to contribute code back to that project. Maybe after some time of mucking with this I’ll see the light and decide to use the API. Either way, whatever code I produce will be available to whoever wants it in some form.

Technorati Tags: , , , , , , , , , , ,

Social Bookmarks:

Code Editor Goodness: Komodo Editor

Geez…. for a sysadmin I sure seem to write a lot of code. In the past year I’ve written an assignment type for Moodle in PHP, cobbled together an API in Perl to manage various LDAP resources, and I’ve just completed a prototype for an XML-RPC server that will be an interface to our data warehouse (which I designed, and wrote ETL scripts for in Perl, awk, and shell). Whew!

With all of this going on, a good editor is necessary. I’m a sysadmin *first*, and that is where my training is, so naturally I use vi. I understand that “real” programmers use Emacs, but it really doesn’t fit my brain, and if you use vi for any length of time, I believe it becomes dang near impossible to convert without a frontal lobotomy. :-o

Though I like vi a lot for day-to-day administration, I’m not always a fan of how it handles programming, and it seems to take a lot of work to get it to work the way you want it to. My biggest pet peeve about vi is when you enter insert mode and then paste in some code that was already indented. Vi likes to indent it again. Usually I can get around this by setting all of the *indent settings to “no”. For example “:set noautoindent”. However, I set everything I could find yesterday and it was still doing some goofy things with my pasted in code.

Once I got my code in there, and manually removed all of the stupid indentations, I realized another thing: I hate vi’s python syntax highlighting, and it doesn’t handle python indentation in a particularly “smart” way. For this reason, I generally use JEdit for programming, but there’s no Vi key bindings for JEdit, so I went looking for a new editor to see what I could find. What the heck, it was Sunday. What else did I have to do?

I scored. Komodo Editor is a free-of-charge code editor that runs on Windows, Mac and Linux, is as customizable as you’re likely to ever need it to be, and….. wait for it…. it has a Vi mode!

Since my current project is working with Python, it’s also super, super nice to have the indentation guides, and since I’m a Python newbie, it’s also fantastic to have some of the gentle reminders that I’ve indented wrong, or forgotten the colon after my def. I’ve also made use of the basic-but-useful file comparison interface, which does a diff on files of your choosing. It also does some things that I liked about JEdit, like handling file changes on disk in an intelligent (and flexible, should your definition of ‘intelligent’ differ from Komodo’s) way.

The simple code folding is as one would expect in most graphical editors, and it’s a tad nicer than JEdit, though I wish that there was a keyboard shortcut to collapse/expand the current block of code. I also wish I could define language-specific syntax highlighting based on a regex or something like that. For example, I’d like to color the word “self” in Python differently from what Komodo Editor calls “identifiers”. I also wish that I could find a way to split the window vertically in Komodo. I’ll miss that feature of JEdit. It might warrant me keeping JEdit around for some things.

For the record, I also tried SubEthaEdit, Smultron, a couple of Vim plugins, XCode, and I downloaded and tried (and failed) to get SPE running, too. Komodo fit my brain best. I recommend it if you want some of the graphical goodness of an IDE but don’t want to lose your Vi key bindings. Enjoy!

Technorati Tags: , , , , , , , , , , , , , , , , , , , , ,

Social Bookmarks:

Trying to make friends with Python… again

I like the idea of Python. I have diverse interests, technically, and I like to think that there’s a language out there that I can use to write small script, a large website, a stored procedure, or a distributed system. The same language is used to write a very large chunk of systems code on Red Hat systems can also be used to make pretty graphical interfaces. I like that it’s cross platform.

My trouble with Python has been twofold: time, and support. I actually *have* read the introductory tutorial, but it was in 2002. I’ve forgotten just about all of it. I have a copy of the printed Python Reference Library, but it’s from 2000 (if memory serves). I own *both* editions of “Learning Python”, because by the time I got around to reading the first edition, the second edition made it completely obsolete. The other side of the time issue was making time to actually do something useful with the language so as to cement the fundamentals into my brain. That’s sometimes difficult when you’re a sysadmin and don’t really program for a living.

On the support side, I’ve had a lot of problems. Every time I go to do something with Python, I have no idea which route to take. There are so many frameworks and modules that have overlapping problem scopes that it’s hard for me to make a decision. What’s worse, nobody seems to know which module or framework is the canonical way of doing things. I guess things are still young enough to be schizophrenic. With Perl, when they say “there’s more than one way to do it”, that’s speaking more about the syntax of the language than the modules you might use (though it speaks to that, too, somewhat). With Python, the syntax is the (relatively) stable part – it’s choosing modules that can be a challenge.

Right now I’m building an XML-RPC server and a small test client. The client calls functions on the server, and in response, the server queries a PostgreSQL database and returns the results. I got a simple working prototype working with real data yesterday, but it took me a long time to figure out exactly which module should be used to talk to PostgreSQL from Python, and which module should be used for implementing the XML-RPC server. I’m comfortable with psycopg2 for the database calls, but I’m using SimpleXMLRPCServer for the server implementation, and I’m just waiting for one of its limitations to bite me. However, Twisted doesn’t seem like it’s quite soup yet in this particular area, and using xmlrpclib to implement a server seems silly with a ready made solution already built in (I know a project that does that, maybe because SimpleXMLRPC didn’t exist at the time they started?).

So, wish me luck. If you have any input on what you’ve done in this area with Python, fill me in! Also, if you’re an admin who uses Python and knows of a good reference site for simple day-to-day UNIX admin scripting in Python, let me know that too!

Technorati Tags: , , , , , , , , , , , , , , ,

Social Bookmarks:

More news for Spanning Sync Refugees

First, there are lots of people who are pretty outraged by the new Spanning Sync pricing of $25/year for a subscription service or $65 for a one-time license. The people who are the most outraged are those who are intimately familiar with how buggy it is because they were beta testers. I’m in that camp myself. I no longer use Spanning Sync.

Second, I found this post talking about future pricing of gSync, which is currently in beta and plans to go commercial, but there are two important distinctions:

  1. There’s no central server involved. gSync connects directly to Google Calendar with no intermediary.
  2. They only plan to charge $20 for the download.

Finally, check out this quote from Charlie Wood of the Spanning Sync team:

“For example, another poster on this group (see

http://groups.google.com/group/spanningsync/msg/429d64a0f961092f)

explained that he thinks, “Spanning Sync is a great product,” but that
he is, “unfortunately, a supporter of open source or free software,”
and therefore won’t be buying a subscription. My point is that
regardless of the price of the service (unless it was free), he
wouldn’t have ever been a customer of ours.”

This shows a complete lack of understanding about what open source and free software is about. To be clear: NEITHER THE OPEN SOURCE NOR THE FREE SOFTWARE COMMUNITIES SPECIFY THAT SOFTWARE SHOULD NOT BE A COMMERCIAL, MONEY-MAKING PRODUCT.

From the Free Software Foundation site:

You may have paid money to get copies of free software, or you may have obtained copies at no charge. But regardless of how you got your copies, you always have the freedom to copy and change the software, even to sell copies.

“Free software” does not mean “non-commercial”. A free program must be available for commercial use, commercial development, and commercial distribution. Commercial development of free software is no longer unusual; such free commercial software is very important.

And, from the Open Source Initiative website:

“How do I make money on software if I can’t sell my code?

You can sell your code. Red Hat does it all the time.”

Also, *I* am a supporter of free *and* open source software, and regularly pay for software, as do most people who have to get actual work done using tools for which there is no free/open alternative. “Free and Open” does NOT mean “no money changes hands”.

Please, if you’re a software developer, put some due dilligence into this, and if you’re a free/open source software supporter, try to work with the community on better relaying the message, because after, like, 20 years, people should’ve started to get this by now.

Technorati Tags: , , , , , , , , , , ,

Social Bookmarks:

Safety Precautions When Using the ‘rm’ Command

Usually, if I have a bunch of files that need to go away, I’ll see what I can do to avoid using ‘rm’. Many times, I can move a directory containing the files out of the way, or I can make a backup directory and move the files there. However, at some point, those files are just taking up space, and need to be removed with ‘rm’. I treat this with a lot of caution.

The first thing I do before running the ‘rm’ command is run ‘which rm’. I’m in an environment where some utilities are in a mounted directory, and they duplicate what’s on the local system. I want to know what I’m using.

If I’m on an unfamiliar system, I run ‘man rm’, make sure that man page refers to the binary from ‘which rm’, and then check to see how it handles symlinks. I have yet to see an ‘rm’ that follows symlinks and removes things referenced by them, but I don’t make assumptions. I used to make assumptions like this until one day I ran ‘chown’ on a Solaris system without ‘-h’ and systems all around the department started having issues because they suddenly couldn’t access what they needed to get their work done. :-/

At this point, I used to *type* ‘rm -i’ using the full path to the directory I wanted to work on (which was confirmed using ‘pwd -P’ just to be safe).

Then I’d take my hands off the keyboard and just sit for a moment. I always do this, no matter how stressful the situation. It’s a weird meditative thing. Running the wrong command, or the right one incorrectly, will only make your day worse, no matter how bad it already is. Sit back, close your eyes, and think about what you’re about to do. Then open your eyes, take note of the directory you’re in, take note of the files in there, take note of what user you’re running as, take note of the command you’re running, inspect it character-by-character, and assuming everything is good, I’d hit enter.

The other day I thought of another safety precaution that, while it changes my ritual, might also save me some time. I had to delete all of the PHP files in a directory. In order to insure that only the intended files got removed, I cd’d to the directory, ran ‘ls -l *.php’ and inspected the output. Carefully. Yep – those are all the files I wanted to delete, so then I did this (in a bash shell, but it works in tcsh, csh, and ksh as well):

^ls -l^rm -f

And that’s it. It removes the possibility of having a typo in the *argument* part of the command, which, when rm is involved, is often what gets you in trouble. If you’ve never seen this notation before, it’s a way to repeat the same command line you just ran, substituting what’s after the first caret with what’s after the second caret. So if I do this:

ls -l *.php

And get the proper output, running

^ls -l^rm -f

Will cause this to be run:

rm -f *.php

Hope this helps!

Technorati Tags: , , , , , , , , , ,

Social Bookmarks: