Taming MySQL is… challenging. Especially in very large, fast-growth, ‘always-on’ environments. It’s one of those things where you seemingly can never know all there is to know about it. That’s why I really like coming across posts like this one from FreshBooks that describes a very real problem that was affecting their users, how they dealt with it, why *that* failed, and what the final fix was. Post a link to your favorite MySQL Problem and Solution post in the comments (oh yeah, and “subscribe to comments” should be working now!)
Archive for the ‘Uncategorized’ Category
I’m still digesting certain parts of OSCON 2008, but I think I’ve finally settled on a conclusion for my thoughts on Sam Ramji’s presentation of Microsoft’s contributions to open source projects, and their progress toward rethinking how they do business to be more friendly toward competing platforms and technologies, so that other platforms can integrate with Microsoft products more easily.
I think the talk should’ve been entitled “What Microsoft Has Done for You Lately”. To Microsoft’s credit, they have, in fact, done things that would’ve been unimaginable just a few years ago. However, I think they sort of still miss the point — or they didn’t and just don’t know how to do anything about it.
One must admit that Microsoft has made strides in making various technologies run in a Windows environment. PHP can be run on IIS, and it’s a supported language in their development tools. They’ve opened up the SMB/CIFS spec so that the Samba team can create more robust tools to integrate Windows and UNIX environments, and they’ve become a major sponsor of the Apache Software Foundation. Things like this are nice, but I can’t really accept them as any kind of long-term, reliable solution. Sure, they’ve opened up the SMB spec, but what happens when Microsoft decides to change the spec, or they decide to deprecate SMB altogether?
The answer is that we’re left in the dust, and that leads to my point. The major difference between opening up a spec and opening up the source code is that if you open the spec and then decide to go in another direction, we’re left in the lurch. If you open the source code and decide to change directions, the community building tools around that functionality can decide for themselves to either change directions or fork and exec, so to speak. The real point here is that Microsoft is under no obligation to involve any of the communities with the Open Source world in discussions about their direction with regards to any of the technologies they use. They’re a closed-source, proprietary company, with a fiduciary responsibility to look out for their investors and their bottom line. If they see an opportunity to make more money because more customers want to see them do something that requires something other than SMB, they’re free to do that, and if I were a shareholder, I think I’d want them to do it, because I’d want my stock price to go higher.
I have yet to see a corporate entity involve the community in their direction with regards to technology. I have seen lots of hand-waving, a lot of lip service, and a lot of people speaking at conferences, mostly trying to hide really big elephants behind a well-worded (but otherwise empty) speech, at least vetted, if not written by, marketing and PR folks. Talks like this are pretty transparent.
I think the issue is a hard one. How do you involve a community mostly unconcerned about financial matters (or, at least, mostly not involved with them directly on a regular basis), in a decision-making process that necessarily involves coming to a solution that is “profitable”? Well, actually, I think the issue starts at a higher level than that. The real trouble is a situation which was beautifully illustrated in a talk by Robert “r0ml” Lefkowitz at this year’s OSCON: the corporations are viewing these interactions as taking place between “thinkers” and “doers”, instead of two “thinker” parties which happen to have different philosophies about how things should be done. Given the numerous cultural differences between the groups, one could forgive the corporate types for making the error (by the way, if anyone finds the “Praxis/Techne” talk by r0ml online in video format - post the link! There *was* a video camera there!)
Of course, specific to Microsoft, there are other issues besides simple matters of tech-level software interaction. There’s the issue of software patents, and Microsoft’s “promise” not to sue. There’s also the issue of standards, and MIcrosoft’s attempts to either own them, or destroy those which it can’t own. These are issues that are deep, cultural issues at Microsoft that no single person giving a talk at OSCON is in a position to solve (or to convince me they’ve solved). If Microsoft really does want to interact with the community and interoperate with our software, they should probably be prepared to spend years and dollars just to earn some level of trust. I sincerely hope that they are sincerely trying to open up, but I can’t help but have my doubts.
Hi. My name is Brian, and I’m a tech bibliophile.
I have owned more books covering more technologies than I care to admit. Some of my more technical friends have stood in awe of the number of tech books I own. I am also constantly rotating old books that almost *can’t* be useful anymore out of my collection because there’s just no room to keep them all, and it would be an almost embarrassingly large collection if not for the fact that I have no shame or guilt associated with my need for dead trees.
If you need further proof:
- I have, on more than one occasion, suggested to my wife that we take a walk around our local mall so I could browse the computer section of the book store, not to buy, but just to keep up with the new titles and stuff.
- Ok, I usually buy.
- I also go into book stores whenever I’m out of town to get a comparison of what seems to be popular in different areas of the state/country/world.
- I just got a head rush because I just remembered that, since I’m attending OSCON in Portland, OR this year, I’ll get to go back to Powell’s, which, mind you, has a huge, city-block-sized store, which is very nice, but *also* has an entire store dedicated to geekery that rivals anything like it that I’ve ever seen, and contains a computer museum! (You can see some shots of it in my flickr set from OSCON ‘06)
- I once owned a book about VBScript.
I have also co-authored a book for O’Reilly, and in addition to my day job (I’m the director of IT for AddThis.com), I also work for a publisher, MTA, the publisher of Python Magazine, php|architect, as well as a line of books. Oh yeah. I’m into it. It’s bad.
I’ve learned quite a bit about buying books, and some of that learning came from unexpected places. There’s even more that I don’t know, but at least now I know that I don’t know it, and can try to figure out more stuff
So here are a few things to keep in mind when you need to buy a technical book, or one just tugs at your impulse buy strings.
Give Any New Version 6 Months Before Buying a Book About It.
The first books about PHP 5 were dreadful. I never, ever return books to a book store, even if I don’t particularly care for them, but I returned a book about PHP 5 because the level of inadequacy was just insulting to me as a consumer. This was quite some time ago (when PHP 5 books first hit the shelves), and thinking about it now I’m still amazed at how terrible that book was. Of course, PHP 5 is just one example. Way, way back in the day (1998-9 or so?) when the first books about Java 2 hit the shelves (some might remember that booksellers actually put stickers over the part of the title that said “1.2″ when it was renamed “2″), I had the same experience.
It’s not exclusive to languages either. When the first MySQL books came out that said “covers mysql 5″, they just barely covered MySQL 5. In fact, there’s a new edition of High Performance MySQL coming out that is *going* to say “covers MySQL 5.1″ on it, and it’s not really going to cover much about 5.1, so says one of the books authors (whose honesty I greatly appreciate, by the way - I’d love to see that from the various book publishers).
At the OS level, I’m mostly a Linux guy, and at this point I wouldn’t take a book about a specific version of a specific distribution of Linux if you paid me to take it. Those books are mostly rehashes of the last version of the book put together as marketing objects. I know, because when the “<distro><version> Bible” series first came out, I read them (I think RedHat was the only distro covered initially), and I followed up with later versions of the books, and was always disappointed. Nowadays, I don’t know how you can think that a book about something as fast moving as Fedora Core is going to be useful. Maybe if you’re learning it for the first time something like this can work out, but if you’re looking to exploit new features, you’re really better off just reading the release notes and changelog.
Lesson learned. Books take time to write, to edit, to format, to print, to distribute, and to get on the shelves. Keep that in mind when you see a book about Python 3000 on the shelves within days of a GA release of Python 3000. It’s likely that that book was completely written and in an editor’s hands 3 months ago, and writing for that project began probably 9 months ago… 9 months before Python 3000 was a reality in this example. Some changes can be accounted for during the writing process, but a book that is released 6 months after the release of a new technology is likely to be built on more solid ground (of course, this is only part of assessing the quality of the book - but I suspect it’s often overlooked).
I’d also like to note that this probably wasn’t the case quite so much in the days when, for each language or technology or application or whatever, there were far fewer titles in print on the topic, and an authoritative title was more easily identified. Nowadays, the number of books about Ruby is dizzying to witness on the shelves of your local retailer. I just don’t think there was a market to support that kind of sensationalistic publishing model back when, say, C++ hit the scene. Maybe I’m mistaken there and some more… distinguished folks can enlighten all of us.
Take reviews with several grains of salt
Book reviews are lame, unless you know the source. When I say “know”, I don’t mean “have heard of”. I mean “know” in the sense that you have some idea what this reviewer is working with on a day-to-day basis, you know what their leanings are within the technological landscape, and you recognize that person as an authority on some topic at least loosely related to the book being reviewed.
I wouldn’t put much faith in the reviews on Amazon unless it is an established title that’s in its second edition. First edition books that all of a sudden have 20 reviews on Amazon within the first week of the release are probably reviews done by other authors who work for the same publisher, or who have some other motivation for writing the review.
You can learn to identify lame reviews or astroturfing on sight (now that you’re aware of it, it’s not all that hard to recognize), so be on the lookout. If you can, google the reviewer by name. Some of those folks work for the same publisher, and should likely just be discarded. I hate astroturfing, but I guess the publishers feel like they have to do it to compete with everyone else who is doing it and creating buzz around their titles. Sad.
By the way, astroturfing in this context means sending everyone you know (and/or works for you, or wants to) to do reviews, talk up the book, link to the book’s web site, or the author’s blog (where his book is probably displayed prominently) or run ads on their blog, or mention the book on irc, digg, del.icio.us, slashdot, etc. If you get enough people to do this, it gives the impression that there’s a lot of buzz and “grass roots” enthusiasm around the book. Except the “grass” is fake. Hence “astroturfing”. This is the kind of thing that Digg fights against all the time. Mostly unsuccessfully. It goes like this:
- Big Media Inc publishes article on big web site.
- Big Media emails all editors, writers, bloggers, designers, etc., to go blog, talk, post, link, submit the article everywhere they can.
- Some of these people have multiple accounts on each service you can possibly submit links to, some have multiple blogs, they link to other peoples’ blogs who are also talking up the article… you get the idea.
- Big Media’s article is read by thousands, for no really good reason other than they happen to be good astroturfers.
…But I digress. Just take reviews with a grain of salt. Same goes for big numbers on Digg and other like services.
Look for “Timeless Tomes”
The K&R book on C is a timeless tome. The GoF book on Design Patterns is a timeless tome. Stevens on TCP/IP is a timeless tome. C.J. Date’s early Intro to Database Systems is a timeless tome. These books came out a shockingly long time ago considering how often they are referenced and recommended and handed down through generations of technologists. If you need a solid foundation in some technology like this, you should look for books on the topic that have stood the test of time.
However, time isn’t always your friend, and some of these tomes are enormous. That’s why there are books like “Learn Java in 24 Hours”. If you go after this type of book, fine. I have tons of books like this. Just know that going through it does NOT mean you “know” Java. See here for details.
Timeless tomes seem harder to find now that there are stores with 150,000 titles in stock. They get lost in the noise. They’re out there, though. I have a built-in Amazon storefront on LinuxLaboratory.org that I try to keep updated with books I have read and found genuinely useful. I’m a little behind on that, but the books there are a mix of huge tomes (Understanding and Deploying LDAP is enormous), and useful reference or “contextual” books that explain how to use a technology in a particular context (Perl for System Administration, for example, is a good book). The next book I need to add there is “The Art of SQL”, which completely rocks and I highly recommend if you *already know* SQL.
Look at the Copyright Date
Technology moves at break neck speed. Some books that are still on the shelves that say “PHP and MySQL” cover versions that aren’t even supported anymore. Oracle 8i books are still around. Some books about Apache only make passing references to Apache 2. It would take some time to sit around flipping through pages to figure out if the version you need information about is covered. If you have some familiarity with the subject, checking the copyright date is a quick reference that can let you know if this book is the one you need. It can also help you avoid the dreaded “written before the technology was GA” problem mentioned above. If you know that FooLang 24 came out in February of 2008, the book in your hands that says “FooLang 24″ on the cover should not have a 2007 copyright, ideally.
Be Wary of Growth in Second Editions
First: there are “Volumes” and there are “Editions”. A second volume is a completely different book from the first volume. A second *edition* is an updating of the first edition. It’s the same basic material. Or… that’s how it used to be. Nowadays, marketing sometimes dictates that new editions should include whole new sections about new and exciting buzzwords of the day and stuff like that. Have you seen the most recent edition of “Programming Python”? It’s probably the thickest technology book I own, even beating out Understanding and Deploying LDAP Directories. I have no idea if anything in there was put upon the author by O’Reilly - and I’m not making accusations (I’ve worked for O’Reilly and have no reason to believe they’re guilty of this practice) - I’m just saying that the first edition was probably around half the size of the second.
For what it’s worth, I own the latest edition of Programming Python, and am not sorry I bought it. In my editing work for Python Magazine, I came across code that used seemingly every conceivable Python module, and I had to be able to quickly reference and read up on stuff that was in unfamiliar territory. Of course, we have tech editors (who rock, by the way), but I still needed to make sure the text was explaining things in a way that made sense and didn’t contradict the code (or vice versa). That book covers a ton of stuff, and I was glad to have it.
I’ve worked with a good number of publishers, and I have definitely been encouraged to make mention of different things I had no interest in writing about, because it was good for Google rankings, or blog buzz, or tag clouds, or whatever. I have friends in tech publishing circles (and tech authors) who have confirmed that this *does* happen.
Understanding that publishers, no matter how granola they look, run businesses, and businesses need to grow and make money, which is an enormously large feat to pull off in publishing. Eventually, they hire marketing people, and priorities can conflict, and bad things can happen. This is not a diatribe against the publishers. It’s a guide for the reader and technical bibliophiles.
My $.02.
As usual, the more information the better, so share your thoughts!!
UPDATE: if anyone knows of a non-broken syntax highlighting plugin for wordpress that supports bash or some other shell syntax, let me know :-/
Apache logs, database backups, etc., on busy web sites, can get large. If you rotate logs or perform backups regularly, they can get large and numerous, and as we all know, large * numerous = expensive, or rapidly filling disk partitions, or both.
Amazon’s S3 service, along with a simple downloadable suite of tools, and a shell script or two can ease your life considerably. Here’s one way to do it:
- Get an Amazon Web Services account by going to the AWS website.
- Download the ‘aws’ command line tool from here and install it.
- Write a couple of shell scripts, and schedule them using cron.
Once you have your Amazon account, you’ll be able to get an access key and secret key. You can copy these to a file and aws will use them to authenticate operations against S3. The aws utility’s web site (in #2 above) has good documentation on how to get set up in a flash.
With items 1 and 2 out of the way, you’re just left with writing a shell script (or two) and scheduling them via cron. Here are some simple example scripts I used to get started (you can add more complex/site-specific stuff once you know it’s working).
The first one is just a simple log compression script that gzips the log files and moves them out of the directory where the active log files are. It has nothing to do with Amazon web services. You can use it on its own if you want:
#!/bin/bash LOGDIR='/mnt/fs/logs/httplogs' ARCHIVE='/mnt/fs/logs/httplogs/archive' cd $LOGDIR if [ $? -eq 0 ]; then for i in `find . -maxdepth 1 -name "*_log.*" -mtime +1`; do gzip $i done mv $LOGDIR/*.gz $ARCHIVE/. else echo "Failed to cd to log directory" fi
Before launching this in any kind of production environment, you might want to add some more features, like checking to make sure the archive partition has enough space before trying to copy things to it and stuff like that, but this is a decent start.
The second one is a wrapper around the aws ’s3put’ command, and it moves stuff from the archive location to S3. It checks a return code, and then if things went ok, it deletes the local gzip files.
#!/bin/bash cd /mnt/fs/logs/httplogs/archive for i in `ls *.gz`; do s3put addthis-logs/ $i if [ $? -eq 0 ]; then echo "Moved $i to s3" rm -f $i continue else echo "Failed to move $i to s3... Continuing" fi done
I wish there was a way in aws to check for the existence of an object in a bucket without it trying to cat the file to stdout, but I don’t think there is. This would be a more reliable check than just checking the return code. I’ll work on that at some point.
Scheduling all of this in cron is an exercise for the user. I purposely created two scripts to do this work, so I could run the compression script every day, but the archival script once every week or something. You could also write a third script that checks your disk space in your log partition and runs either or both of these other scripts if it gets too high.
I used ‘aws’ because it was the first tool I found, by the way. I have only recently found ‘boto‘, a Python-based utility that looks like it’s probably the equivalent of the Perl-based ‘aws’. I’m happy to have found that and look forward to giving it a shot!
