Обсуждение: Slight change in query leads to unexpected change in query plan
I have a table that looks like this: create table T(pk int not null, value bytea, ..., primary key(pk)) I want to scan the table in batches of 100. I'll do this by issuing a sequence of queries like this: select * from T where pk > ? and value = ? order by pk limit 100 After each query, I'll record the last value seen and use that to drive the next query. The obvious (to me) execution plan is to use the index, do an index scan, and then filter using the restriction on value. I have some realistic data (loaded into postgres 8.3.7) and I've run analyze. I'm not getting a very good execution plan: ris=# explain ris-# select * ris-# from T ris-# where pk > 1000000000 ris-# and value = 'asdf'::bytea ris-# order by pk ris-# limit 100; QUERY PLAN --------------------------------------------------------------------------------------------- Limit (cost=78352.20..78352.24 rows=16 width=451) -> Sort (cost=78352.20..78352.24 rows=16 width=451) Sort Key: pk -> Bitmap Heap Scan on t (cost=2091.60..78351.88 rows=16 width=451) Recheck Cond: (pk > 1000000000) Filter: (value = 'asdf'::bytea) -> Bitmap Index Scan on t_pkey (cost=0.00..2091.60 rows=91088 width=0) Index Cond: (pk > 1000000000) But if I remove the value restriction, I get the plan I was hoping for: ris=# explain ris-# select * ris-# from T ris-# where pk > 1000000000 ris-# order by pk ris-# limit 100; QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=0.00..324.99 rows=100 width=451) -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451) Index Cond: (pk > 1000000000) (3 rows) Why is this? This is an obvious rewrite, e.g. select * from (select * from T where pk > ? order by pk limit 100) x where value = ? and this produces a good query plan. But this means that fewer than 100 rows are returned. For reasons too boring to go into, that would be very inconvenient for my application. Why does adding the value restriction so radically change the execution plan? Jack Orenstein
On Mon, Jun 22, 2009 at 05:55:28PM -0400, Jack Orenstein wrote: > ris-# select * > ris-# from T > ris-# where pk > 1000000000 > ris-# and value = 'asdf'::bytea > ris-# order by pk > ris-# limit 100; PG thinks that you're going to get 16 rows back matching those conditions, bitmap heap scans are faster in some cases and this is likely to be one of those cases so PG is optimizing things correctly. > Limit (cost=78352.20..78352.24 rows=16 width=451) > ris-# select * > ris-# from T > ris-# where pk > 1000000000 > ris-# order by pk > ris-# limit 100; With this query, PG thinks that you may get 91088 rows back but because you've got a LIMIT in there you only needs the first 100 of them. It will therefore prefer a plan that will stop short and thus is preferring an index scan. > Limit (cost=0.00..324.99 rows=100 width=451) > -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451) > Why does adding the value restriction so radically change the execution > plan? PG doesn't have any cross column statistics and hence it assumes that pk and value are uncorrelated. You may get better results with increasing the statistics target[1] for those columns as that will give PG more information, but if the columns are indeed correlated then that's not going to help. -- Sam http://samason.me.uk/ [1] http://www.postgresql.org/docs/current/static/sql-altertable.html
Sam Mason <sam@samason.me.uk> writes: > On Mon, Jun 22, 2009 at 05:55:28PM -0400, Jack Orenstein wrote: >> Why does adding the value restriction so radically change the execution >> plan? > PG doesn't have any cross column statistics and hence it assumes that pk > and value are uncorrelated. Even if they are correlated, they aren't necessarily correlated in a good way; the query plan Jack is hoping for could easily be pessimal. We routinely see complaints where the planner does what he's hoping for and gets burnt ... If the speed of this particular type of query is critical, it might be worth maintaining a 2-column index on (value, pk). That would provide guaranteed good performance since the desired rows form a contiguous run in the index. regards, tom lane
Sam Mason wrote: > On Mon, Jun 22, 2009 at 05:55:28PM -0400, Jack Orenstein wrote: >> ris-# select * >> ris-# from T >> ris-# where pk > 1000000000 >> ris-# and value = 'asdf'::bytea >> ris-# order by pk >> ris-# limit 100; > > PG thinks that you're going to get 16 rows back matching those > conditions, bitmap heap scans are faster in some cases and this is > likely to be one of those cases so PG is optimizing things correctly. But that's 16 rows at the end. I'm concerned about the earlier processing. Here's the plan: Limit (cost=78352.20..78352.24 rows=16 width=451) -> Sort (cost=78352.20..78352.24 rows=16 width=451) Sort Key: pk -> Bitmap Heap Scan on t (cost=2091.60..78351.88 rows=16 width=451) Recheck Cond: (pk > 1000000000) Filter: (value = 'asdf'::bytea) -> Bitmap Index Scan on t_pkey (cost=0.00..2091.60 rows=91088 width=0) Index Cond: (pk > 1000000000) If I'm reading this right, the optimizer estimates that the bitmap index scan will identify 91088 rows, which all have to be retrieved from the heap for evaluation of the restriction on value. The row width is 451, so even if the rows are packed into 8k pages perfectly, that's ~5000 pages that have to be read. If there is an index scan (non-bitmap) with the value restriction checked at the end, we know the optimizer thinks that is cheaper. Here is the plan without the value restriction: Limit (cost=0.00..324.99 rows=100 width=451) -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451) Index Cond: (pk > 1000000000) Adding the value restriction at the top of this query plan wouldn't increase the cost very much. So given that estimate of 91088 rows coming out of the bitmap index scan, I still don't understand the optimizer's reasoning. Am I reading the plan incorrectly? Jack > >> Limit (cost=78352.20..78352.24 rows=16 width=451) > >> ris-# select * >> ris-# from T >> ris-# where pk > 1000000000 >> ris-# order by pk >> ris-# limit 100; > > With this query, PG thinks that you may get 91088 rows back but because > you've got a LIMIT in there you only needs the first 100 of them. It > will therefore prefer a plan that will stop short and thus is preferring > an index scan. > >> Limit (cost=0.00..324.99 rows=100 width=451) >> -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451) > > >> Why does adding the value restriction so radically change the execution >> plan? > > PG doesn't have any cross column statistics and hence it assumes that pk > and value are uncorrelated. You may get better results with increasing > the statistics target[1] for those columns as that will give PG more > information, but if the columns are indeed correlated then that's not > going to help. >
Jack Orenstein <jack.orenstein@hds.com> writes: > Limit (cost=0.00..324.99 rows=100 width=451) > -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451) > Index Cond: (pk > 1000000000) > Adding the value restriction at the top of this query plan wouldn't increase the > cost very much. You're missing the point: with the value restriction in place, it's estimating that it will probably have to scan all 91000 rows (because there are fewer than 100 satisfying the value restriction). And that is going to cost somewhere north of 296027 cost units --- the cost shown, plus 91000 invocations of the value-restriction check. Which is considerably more than the cost of the other plan. regards, tom lane
Tom Lane wrote: > Jack Orenstein <jack.orenstein@hds.com> writes: >> Limit (cost=0.00..324.99 rows=100 width=451) >> -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451) >> Index Cond: (pk > 1000000000) > >> Adding the value restriction at the top of this query plan wouldn't increase the >> cost very much. > > You're missing the point: with the value restriction in place, it's > estimating that it will probably have to scan all 91000 rows (because > there are fewer than 100 satisfying the value restriction). And that > is going to cost somewhere north of 296027 cost units --- the cost > shown, plus 91000 invocations of the value-restriction check. > Which is considerably more than the cost of the other plan. I see -- the optimizer is calculating that it will have to examine a very large fraction of the rows. Actually, pk and value are highly correlated. If a row gets past the index scan, odds are very high that the value predicate will evaluate to true. So I am sure that the index scan is the right way to go. I'm just trying to convince the optimizer of this. One thing I am considering is introducing a function with high execution cost. E.g., if I do this: create function return_input(bytea) returns bytea as ' begin return $1; end; ' language 'plpgsql' cost 10000; explain select * from t where pk > 1000000 and return_input(value = 'asdf'::bytea) order by pk limit 100; then I get the plan I want. Limit (cost=0.00..563490.32 rows=100 width=451) -> Index Scan using t_pkey on t (cost=0.00..34226402.07 rows=6074 width=451) Index Cond: (pk > 1000000) Filter: (return_input(value) = 'asdf'::bytea) Is there a more elegant way of forcing the plan that I want? Jack