Finding Needles With ’sort’ and ‘uniq’ | Musings of an Anonymous Geek

I had to do this recently, and so I thought it would be useful to share this for two reasons:

Someone else may need to do it and find this technique useful
Someone else may know a better way of doing this

Quick ‘n’ dirty explanation: you have two lists. One list is a superset of the other list. You want to identify all of the items that exist *only* in the larger list. Here’s how you do that:

cat small_list >> largelist; sort largelist | uniq -u

Note that ‘uniq -u’ is not the same as ‘sort -u’. The former will display only the lines in the file that occur once. The latter displays all lines in the file, *once*, regardless of how many times they occur in the file.

Longer example explanation: I have an LDAP server, and at some point we added an objectclass and associated attribute to every user account. However, new accounts weren’t being *created* with the objectclass and attribute. At some point, I figured out that there was some inconsistency between account objects, and figured I had better get a list of accounts that didn’t have the objectclass and attribute so I could correct the situation. Problem is, you can’t negate a search using the standard ‘ldapsearch’ command line tools. So I can’t ask for all objects where ‘objectclass != myobjectclass’ or something.

What I did was two ldapsearches. One for all of the objects in that part of the tree, and then another for all objects in that part of the tree with the objectclass in place. Of course, the former list is a superset of the latter, and then we do ‘cat subset >> superset; sort superset | uniq -u’ – and that will be the list of people who do *not* have the objectclass associated with their account entry in the directory server.

Technorati Tags: sysadmin, unix, linux, systemadministration, sort, uniq, commandline, cli, gnu, scripting ldap

Social Bookmarks:

Share this: