Tom Lane wrote:
> Jack Orenstein <jack.orenstein@hds.com> writes:
>> Limit (cost=0.00..324.99 rows=100 width=451)
>> -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451)
>> Index Cond: (pk > 1000000000)
>
>> Adding the value restriction at the top of this query plan wouldn't increase the
>> cost very much.
>
> You're missing the point: with the value restriction in place, it's
> estimating that it will probably have to scan all 91000 rows (because
> there are fewer than 100 satisfying the value restriction). And that
> is going to cost somewhere north of 296027 cost units --- the cost
> shown, plus 91000 invocations of the value-restriction check.
> Which is considerably more than the cost of the other plan.
I see -- the optimizer is calculating that it will have to examine a very large
fraction of the rows.
Actually, pk and value are highly correlated. If a row gets past the index scan,
odds are very high that the value predicate will evaluate to true. So I am sure
that the index scan is the right way to go. I'm just trying to convince the
optimizer of this.
One thing I am considering is introducing a function with high execution cost.
E.g., if I do this:
create function return_input(bytea) returns bytea
as '
begin
return $1;
end;
' language 'plpgsql' cost 10000;
explain
select *
from t
where pk > 1000000
and return_input(value = 'asdf'::bytea)
order by pk
limit 100;
then I get the plan I want.
Limit (cost=0.00..563490.32 rows=100 width=451)
-> Index Scan using t_pkey on t (cost=0.00..34226402.07 rows=6074 width=451)
Index Cond: (pk > 1000000)
Filter: (return_input(value) = 'asdf'::bytea)
Is there a more elegant way of forcing the plan that I want?
Jack