Clone a table in MySQL without killing your server

So, I recently ran into one of those situations where a customer complains that his MySQL database is slow, and “it worked great until about two weeks ago”. The first thing I look at in these situations is not the queries or any code, but the indexes. Once I determined that the indexes were almost nonsensical, I had a look at the queries themselves, which proved that the indexes were, in fact, nonsensical. Running the queries as written in the code, from a mysql shell, with EXPLAIN, I was able to further confirm that the indexes (most of them, anyway) were never used.

Easy right? Just reindex the table!

NOTE: I’m going to skip the talk about all of the database design issues on this gig and just go straight to solving the immediate problem at hand. Just know that I had nothing to do with the design.

But, supposing this table has 15 million rows and is running on a machine with 2GB of RAM and only a  single (well, mirrored) drive? It will take forever to reindex that table, and the machine can’t be made completely unavailable at any time, either by driving the load up so high that the Linux OOM killer reaps the mysql process, or by putting a lock on the table for, oh, 3 days or so :-)

The solution is to create a brand new table, which I’ll call “good” using the “SHOW CREATE TABLE” output from the bad table, which I’ll call “bad”. I do this right in the shell, actually. I run “SHOW CREATE TABLE bad”, cut and paste the output, remove the part that defines the indexes, rename the table to “good”, and bam! New, empty table.

Of course, you still have to define your new indexes, so run whatever statements are needed to do that. You might even want to populate it with some test data (say, 10000 rows from the bad table) to test your new indexes are being used as expected (or that they can be by altering the query and getting back the same results… but faster).

Once done, it’s time to start copying rows from bad to good. I’ve written a shell script to help with that part. It’s designed to run on a Linux host running MySQL.

The variables at the top of the script are pretty self-explanatory, except to note that there are separate NEWDB and OLDDB variables in case your new table also lives in a new database. The INCREMENT is the number of rows you want to copy over at a time. If you set INCREMENT to 1000, it’ll copy 1000 rows, check the load average, and if it’s under MAXLOAD, it’ll copy over another 1000 rows. It also keeps track of the number of rows in each database as it goes, since writes are still happening on the bad table while this is going on.

So here’s my nibbler script, in shell. I would’ve written it in Python, but it wasn’t my machine, and I couldn’t install the python mysql module :-/

#!/bin/bash

###
### Written by Brian K. Jones (bkjones@gmail.com)
### 
### Takes an increment, old db, and new db, and copies rows from olddb to newdb. 
### Along the way, it'll check system load and sleep if it's too high. 
### There's too much hard-coding right now, but it's a useful template, and 
### has been tested. The script takes no CLI arguments. 
###

INCR=10000
NEWDB=shiny
OLDDB=busted
OLDTABLE=bad
NEWTABLE=good
MAXLOAD=3
DBUSER=mydbuser
DBPASS=mydbpass

rows_old=`mysql -N -D ${OLDDB} -u ${DBUSER} -p${DBPASS} -e "select count(*) from ${OLDTABLE}"`
echo "rows_old is now ${rows_old}" 

rows_new=`mysql -N -D ${NEWDB} -u ${DBUSER} -p${DBPASS} -e "select count(*) from ${NEWTABLE}"`     ## num. rows in new table
echo "rows_new is now ${rows_new}" 

for (( rows_new=$rows_new; rows_new < $rows_old; rows_new+=$INCR )); do
        if [ $((rows_old - (rows_new + INCR))) -gt 0 ]; then         ## Check to see if there are at least $INCR rows left to copy over. 
            mysql -N -D ${NEWDB} -u ${DBUSER} -p${DBPASS} -e "INSERT INTO ${NEWTABLE} SELECT * FROM ${OLDDB}.${OLDTABLE} LIMIT ${rows_new},${INCR}"
      
            while [ "`awk -v max=${MAXLOAD} '$1 > max {print "TRUE"}' /proc/loadavg`" = "TRUE" ]; do 
               echo "sleeping due to load > ${MAXLOAD}"
               sleep 30
            done
            # we update rows_old because it'll be growing while this script runs. 
            rows_old=`mysql -N -D ${OLDDB} -u ${DBUSER} -p${DBPASS} -e "select count(*) from ${OLDTABLE}" `
            rows_new=`mysql -N -D ${NEWDB} -u ${DBUSER} -p${DBPASS} -e"select count(*) from ${NEWTABLE}"`
            time=`date +%R`
            echo "${time} -- rows_new = ${rows_new}, rows_old = ${rows_old}"  

        else                           ## There are < $INCR rows left. Select remaining rows. 
            remaining=$((rows_old - rows_new))
            mysql -N -D ${NEWDB} -u ${DBUSER} -p${DBPASS} -e "INSERT INTO ${NEWTABLE} SELECT * FROM ${OLDDB}.${OLDTABLE} LIMIT ${rows_new},${remaining}" 
            echo "All done!" 
            exit
        fi
done

  • Matt J.

    Handy idea. Thanks.

  • danielj

    There are some drawbacks to this technique…
    It only makes sense in the rather unlikely case that the data in the source table is not modified while the copy is in process.
    Another problem will occur when using this with foreign key constraints. Copying the table means that afterwards a renaming must take place so the table “good” will be renamed to “bad”, something like RENAME TABLE bad TO bad_legacy, good TO bad. Should there be tables with foreign keys refering to “bad” – the one initially called bad – these will now refer to bad_legacy. Example:
    mysql> CREATE TABLE tab_parent( id INT NOT NULL PRIMARY KEY ) Engine=Innodb;
    Query OK, 0 rows affected (0.01 sec)

    mysql> CREATE TABLE tab_child( id INT NOT NULL PRIMARY KEY, parent_id INT NOT NULL, FOREIGN KEY( parent_id ) REFERENCES tab_parent( id ) ) Engine=Innodb;
    Query OK, 0 rows affected (0.01 sec)

    mysql> CREATE TABLE tab_parent_copy( id INT NOT NULL PRIMARY KEY ) Engine=Innodb;
    Query OK, 0 rows affected (0.01 sec)

    mysql> RENAME TABLE tab_parent TO tab_parent_legacy, tab_parent_copy TO tab_parent;
    Query OK, 0 rows affected (0.01 sec)

    mysql> SHOW CREATE TABLE tab_child;
    +———–+———————————————————————————————————
    | Table | Create Table
    +———–+———————————————————————————————————
    | tab_child | CREATE TABLE `tab_child` (
    `id` int(11) NOT NULL,
    `parent_id` int(11) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `parent_id` (`parent_id`),
    CONSTRAINT `tab_child_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `tab_parent_legacy` (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german2_ci |
    +———–+———————————————————————————————————
    1 row in set (0.00 sec)

    So while this technique might not kill your server it is very likely to kill your data.

  • http://hiredgnu.co.za/ Ryan

    Cool script. Will give it a try.

    Before you run the script, though, you don’t need to copy and paste the output from “SHOW CREATE TABLE badtable”.

    The following will create a new table without any indexes:

    “CREATE TABLE new AS SELECT * FROM old LIMIT 1″

    (if you don’t want any data in the new table just truncate it.

  • Skip

    An even simpler way to create the new table would be:

    CREATE TABLE new LIKE old;

  • http://oksoft.antville.org Shantanu Oak

    For medium sized tables, use outfile. Test case…
    mysql> SELECT * INTO OUTFILE ‘shantanu.txt’ FIELDS TERMINATED BY ‘|’ OPTIONALLY ENCLOSED BY ‘”‘ LINES TERMINATED BY ‘\n’ FROM bad;
    Query OK, 2681793 rows affected (59.72 sec)

    mysql> alter table good disable keys;
    Query OK, 0 rows affected (0.00 sec)

    mysql> LOAD DATA INFILE ‘shantanu.txt’ INTO TABLE good
    FIELDS TERMINATED BY ‘|’ OPTIONALLY ENCLOSED BY ‘”‘;
    Query OK, 2681793 rows affected (3 min 49.91 sec)
    Records: 2681793 Deleted: 0 Skipped: 0 Warnings: 0

    mysql> alter table good enable keys;
    Query OK, 0 rows affected (11 min 1.68 sec)

    You can use Lock on source table so that writes will temporarily halt but can continue once the data is out.
    LOCK TABLES bad WRITE;
    UNLOCK TABLES;

  • http://www.seunosewa.com/ Seun Osewa

    Methinks the best way to deal with unused indexes is to just drop them. It doesn’t take any time at all because indexes are stored separately.

  • http://hiredgnu.co.za/ Ryan

    @Skip

    Yeah but the “CREATE TABLE new LIKE old;” method will create the new table with the same indexes as the old table. The idea in this case seems to be to create the table without any indexes. “CREATE TABLE new AS SELECT * from old;” will create new with the same schema as old, but without any indexes.

  • http://wleeper.com Bill

    I recently looked this article back up because I am a new company and needed this script again. Great script with one exception. The limit command starts to deteriorate in performance when you have larger tables (at least with MyISAM which we were using). I had a database with 40M rows and after a million or two performance came to a crawl. I don’t have access to the script anymore, but at the time I rewrote it to use the Primary key column instead. You do run into dead spots if they table has had deletes but for the most part it is quick. Be sure to order on the primary key if you use that method.