Postgres version 8.0.9 on Solaris 2.8. I know it's old but...
I have a table with a million rows.
I need to select data from this table based on an indexed column; I need
to select 600 possible values from the column, returning around 24,000
rows of data.
In perl I have a hash which has 600 key values in it.
So I did:
"select stuff from table where index_key in (" .
join(",",keys %hash) . ") AND non_index_row in ('xyz','abc','def')
And in the perl
while(fetch())
{
do_stuff
}
This resulted in a query string which executed in 12 minutes. If I
did an "explain" on the query string then I can see it was being expanded
to 600 OR statements
where (index_key = 1) OR (index_key = 2) OR .....
Now as an alternative option I did
select stuff from table where non_index_row in ('xyz','abc','def')
and in the perl
while(fetch())
{
next unless $hash{$_->{index_key}};
do_stuff
}
To me this should be slower since we're selecting more rows, throwing
the data back to the perl and then discarding values I didn't want.
Imagine my surprise when the result took 3 minutes to execute.
Have I broken something, somewhere? Or are IN comparisons really that
slow?
For what it's worth, a simple
explain select count(*) from table where index_key in (1,2,3,4,....)
uses the index up until 156 values but then switches to sequential scan
when there are 157 or more values in query.
Any thoughts? I fear my poor tuning attempts may have caused other
slow downs!
--
rgds
Stephen