<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: A Couple of MySQL Performance Tips</title>
	<atom:link href="http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/</link>
	<description>Made with only the finest 1's and 0's</description>
	<pubDate>Fri, 10 Oct 2008 23:45:30 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
		<item>
		<title>By: Tom Passin</title>
		<link>http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1078</link>
		<dc:creator>Tom Passin</dc:creator>
		<pubDate>Tue, 13 May 2008 16:57:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1078</guid>
		<description>I used to work a lot with SQL Anywhere, and had a lot of queries with multi-way joins.  Of course, they tended to be very slow - v-e-r-y s-l-o-w in some cases.  I found that often the existing indexes weren't being used.  It turned out that I could get the optimizer to use them by specifying apparently redundant conditions in a where clause.

IOW, if you can discover how to get the optimizer to help you (which may take some trickery), you can turn an O(n2) or worse query into something quite reasonable.  I don't know about MySQL - I haven't needed to make similar queries since I've been using it.</description>
		<content:encoded><![CDATA[<p>I used to work a lot with SQL Anywhere, and had a lot of queries with multi-way joins.  Of course, they tended to be very slow - v-e-r-y s-l-o-w in some cases.  I found that often the existing indexes weren&#8217;t being used.  It turned out that I could get the optimizer to use them by specifying apparently redundant conditions in a where clause.</p>
<p>IOW, if you can discover how to get the optimizer to help you (which may take some trickery), you can turn an O(n2) or worse query into something quite reasonable.  I don&#8217;t know about MySQL - I haven&#8217;t needed to make similar queries since I&#8217;ve been using it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ira Pfeifer</title>
		<link>http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1077</link>
		<dc:creator>Ira Pfeifer</dc:creator>
		<pubDate>Tue, 13 May 2008 16:42:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1077</guid>
		<description>Using clustered indices raises a few other issues as well.  For starters, you can only 1 clustered index per table.  This may seem self-evident, but you'd be surprised how many developers don't realize it.  

Also, besides the memory issue, another potential performance problem that can be caused by clustered indices involves page splits.  I'll try to explain as briefly as possible:

The typical clustered index on a table is on the Primary Key, which is usually an INT IDENTITY column.  This key is monotonically increasing, so any new rows inserted will come after all existing rows.  This means that each data page will be filled before creating a new one, so the minimum number of new pages is created and the minimum number of IOs is performed.

If you put a clustered index on something else, the physical data needs to be kept in that order.  So say you've put a clustered index on UserId.

UserId  Data
1          a
1          b
1          c
2          a
2          d

If you insert (1,d), that row has to go in between (1,c) and (2,a).  If the data page is full (which is optimal for minimizing IOs and space utilization), then you have to split it, move half the data to the new page, and then insert the row.  

Now imagine how often this is going to happen if you're regularly inserting rows in the middle of your clustered index.  You're either going to have significantly fragmented indices, which will be slow, or you're going to get lots of page splits, which will slow down inserts.  There ARE situations in which a clustered index on this sort of data is warranted, such as when inserts are minimal, but often the best solution for an OLTP database with a balanced workload is a clustered index on the PK and non-clustered indices on the other columns you're interested in.

Of course, as with everything DB-related, you'll need to apply these concepts to your specific implementation, but they should be considered.</description>
		<content:encoded><![CDATA[<p>Using clustered indices raises a few other issues as well.  For starters, you can only 1 clustered index per table.  This may seem self-evident, but you&#8217;d be surprised how many developers don&#8217;t realize it.  </p>
<p>Also, besides the memory issue, another potential performance problem that can be caused by clustered indices involves page splits.  I&#8217;ll try to explain as briefly as possible:</p>
<p>The typical clustered index on a table is on the Primary Key, which is usually an INT IDENTITY column.  This key is monotonically increasing, so any new rows inserted will come after all existing rows.  This means that each data page will be filled before creating a new one, so the minimum number of new pages is created and the minimum number of IOs is performed.</p>
<p>If you put a clustered index on something else, the physical data needs to be kept in that order.  So say you&#8217;ve put a clustered index on UserId.</p>
<p>UserId  Data<br />
1          a<br />
1          b<br />
1          c<br />
2          a<br />
2          d</p>
<p>If you insert (1,d), that row has to go in between (1,c) and (2,a).  If the data page is full (which is optimal for minimizing IOs and space utilization), then you have to split it, move half the data to the new page, and then insert the row.  </p>
<p>Now imagine how often this is going to happen if you&#8217;re regularly inserting rows in the middle of your clustered index.  You&#8217;re either going to have significantly fragmented indices, which will be slow, or you&#8217;re going to get lots of page splits, which will slow down inserts.  There ARE situations in which a clustered index on this sort of data is warranted, such as when inserts are minimal, but often the best solution for an OLTP database with a balanced workload is a clustered index on the PK and non-clustered indices on the other columns you&#8217;re interested in.</p>
<p>Of course, as with everything DB-related, you&#8217;ll need to apply these concepts to your specific implementation, but they should be considered.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: m0j0</title>
		<link>http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1073</link>
		<dc:creator>m0j0</dc:creator>
		<pubDate>Tue, 13 May 2008 14:15:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1073</guid>
		<description>Heh. Having answered that question is a prerequisite. 

Also, people have differing opinions about when it's ok to use a database. I'm assuming that the reader has an interest in tuning performance, and is probably dealing with large(r) amounts of data, in which case you probably aren't going to get better performance with the same flexibility out of some non-database-backed solution. I'd be interested to hear examples of, say, HDF5 or XML or some other file-based mechanism outperforming a database and still being able to do complex queries. I tend to find more situations where people *should* use a database and don't than the reverse.</description>
		<content:encoded><![CDATA[<p>Heh. Having answered that question is a prerequisite. </p>
<p>Also, people have differing opinions about when it&#8217;s ok to use a database. I&#8217;m assuming that the reader has an interest in tuning performance, and is probably dealing with large(r) amounts of data, in which case you probably aren&#8217;t going to get better performance with the same flexibility out of some non-database-backed solution. I&#8217;d be interested to hear examples of, say, HDF5 or XML or some other file-based mechanism outperforming a database and still being able to do complex queries. I tend to find more situations where people *should* use a database and don&#8217;t than the reverse.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jack Diederich</title>
		<link>http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1072</link>
		<dc:creator>Jack Diederich</dc:creator>
		<pubDate>Tue, 13 May 2008 12:59:22 +0000</pubDate>
		<guid isPermaLink="false">http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1072</guid>
		<description>You missed the first question "should I be using a database for this?"  Most applications don't but use one anyway.</description>
		<content:encoded><![CDATA[<p>You missed the first question &#8220;should I be using a database for this?&#8221;  Most applications don&#8217;t but use one anyway.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steve Lee</title>
		<link>http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1068</link>
		<dc:creator>Steve Lee</dc:creator>
		<pubDate>Tue, 13 May 2008 11:38:10 +0000</pubDate>
		<guid isPermaLink="false">http://www.protocolostomy.com/2008/05/12/a-couple-of-mysql-performance-tips/#comment-1068</guid>
		<description>Interesting read and raised some issues I've not dealt with.
A strategy I used in MSSQLServer and worked well on one MYSQL system is to index all columns involved in joins (as well as ensuring all PKs are indexed) and important where clause columns. I assume it's clustered as SQLServer.

Re you denormalising comments, it seems you may be getting close to data warehousing with its star and snowflake layouts optimised for queries (not insert). However that wont work if you need to search a live DB.</description>
		<content:encoded><![CDATA[<p>Interesting read and raised some issues I&#8217;ve not dealt with.<br />
A strategy I used in MSSQLServer and worked well on one MYSQL system is to index all columns involved in joins (as well as ensuring all PKs are indexed) and important where clause columns. I assume it&#8217;s clustered as SQLServer.</p>
<p>Re you denormalising comments, it seems you may be getting close to data warehousing with its star and snowflake layouts optimised for queries (not insert). However that wont work if you need to search a live DB.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
