> To give you some perspective on the size of the dataset and the
> performance level we are hitting, here are some "good" results based on
> some explains:
Before Tom jumps in taking all the fun out of trying to solve it...
The estimates in the slow queries seem perfectly reasonable. In fact,
the cost estimates of both the slow and fast queries are the same which
is what would be expected if all of the data was distributed evenly
amongst the table.
Given it's a date, I would guess that the data is generally inserted
into the table in an order following the date but for some reason those
'high' dates have their data distributed more evenly amongst the table.
Clustered data will have fewer disk seeks and deal with fewer pages of
information in general which makes for a much faster query. Distributed
data will have to pull out significantly more information from the disk,
throwing most of it away.
I would guess that sometime on 2002-05-25 someone did a bit of data
cleaning (deleting records). Next day the free space map had entries
available in various locations within the table, and used them rather
than appending to the end. With 89 Million records with date being
significant, I'm guessing there aren't very many modifications or
deletes on it.
So.. How to solve the problem? If this is the type of query that occurs
most often, you do primarily inserts, and the inserts are generally
created following date, cluster the table by index "some_table_ix_0".
The clustering won't degrade very much since that is how you naturally
insert the data.