Re: Slight change in query leads to unexpected change in query plan
От | Jack Orenstein |
---|---|
Тема | Re: Slight change in query leads to unexpected change in query plan |
Дата | |
Msg-id | 4A40DE98.2040902@hds.com обсуждение исходный текст |
Ответ на | Re: Slight change in query leads to unexpected change in query plan (Sam Mason <sam@samason.me.uk>) |
Ответы |
Re: Slight change in query leads to unexpected change in query plan
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-general |
Sam Mason wrote: > On Mon, Jun 22, 2009 at 05:55:28PM -0400, Jack Orenstein wrote: >> ris-# select * >> ris-# from T >> ris-# where pk > 1000000000 >> ris-# and value = 'asdf'::bytea >> ris-# order by pk >> ris-# limit 100; > > PG thinks that you're going to get 16 rows back matching those > conditions, bitmap heap scans are faster in some cases and this is > likely to be one of those cases so PG is optimizing things correctly. But that's 16 rows at the end. I'm concerned about the earlier processing. Here's the plan: Limit (cost=78352.20..78352.24 rows=16 width=451) -> Sort (cost=78352.20..78352.24 rows=16 width=451) Sort Key: pk -> Bitmap Heap Scan on t (cost=2091.60..78351.88 rows=16 width=451) Recheck Cond: (pk > 1000000000) Filter: (value = 'asdf'::bytea) -> Bitmap Index Scan on t_pkey (cost=0.00..2091.60 rows=91088 width=0) Index Cond: (pk > 1000000000) If I'm reading this right, the optimizer estimates that the bitmap index scan will identify 91088 rows, which all have to be retrieved from the heap for evaluation of the restriction on value. The row width is 451, so even if the rows are packed into 8k pages perfectly, that's ~5000 pages that have to be read. If there is an index scan (non-bitmap) with the value restriction checked at the end, we know the optimizer thinks that is cheaper. Here is the plan without the value restriction: Limit (cost=0.00..324.99 rows=100 width=451) -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451) Index Cond: (pk > 1000000000) Adding the value restriction at the top of this query plan wouldn't increase the cost very much. So given that estimate of 91088 rows coming out of the bitmap index scan, I still don't understand the optimizer's reasoning. Am I reading the plan incorrectly? Jack > >> Limit (cost=78352.20..78352.24 rows=16 width=451) > >> ris-# select * >> ris-# from T >> ris-# where pk > 1000000000 >> ris-# order by pk >> ris-# limit 100; > > With this query, PG thinks that you may get 91088 rows back but because > you've got a LIMIT in there you only needs the first 100 of them. It > will therefore prefer a plan that will stop short and thus is preferring > an index scan. > >> Limit (cost=0.00..324.99 rows=100 width=451) >> -> Index Scan using t_pkey on t (cost=0.00..296027.98 rows=91088 width=451) > > >> Why does adding the value restriction so radically change the execution >> plan? > > PG doesn't have any cross column statistics and hence it assumes that pk > and value are uncorrelated. You may get better results with increasing > the statistics target[1] for those columns as that will give PG more > information, but if the columns are indeed correlated then that's not > going to help. >
В списке pgsql-general по дате отправления: