Re: Bitmap scan is undercosted?
От | Vitaliy Garnashevich |
---|---|
Тема | Re: Bitmap scan is undercosted? |
Дата | |
Msg-id | 84b6f7b5-df1a-94be-9b82-5a2756119ebe@gmail.com обсуждение исходный текст |
Ответ на | Re: Bitmap scan is undercosted? (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-performance |
On 03/12/2017 01:44, Tom Lane wrote: > I think it'd be a serious error to screw around with your cost settings > on the basis of a single case in which the rowcount estimates are so > far off. It's really those estimates that are the problem AFAICS. > > The core issue in this example is that, the way the test data is set up, > the "flag = true" condition actually adds no selectivity at all, because > every row with "num = 1" is certain to have "flag = true". If the planner > realized that, it would certainly not bother with BitmapAnd'ing the flag > index onto the results of the num index. But it doesn't know that those > columns are correlated, so it supposes that adding the extra index will > give a 10x reduction in the number of heap rows that have to be visited > (since it knows that only 1/10th of the rows have "flag = true"). > *That* is what causes the overly optimistic cost estimate for the > two-index bitmapscan, and no amount of fiddling with the cost parameters > will make that better. Here I've tried to make a test which would not have correlation between the two columns. shared_buffers = 512MB effective_cache_size = 512MB work_mem = 100MB set seq_page_cost = 1.0; set random_page_cost = 1.5; set cpu_tuple_cost = 0.01; set cpu_index_tuple_cost = 0.005; set cpu_operator_cost = 0.0025; drop table if exists aaa; create table aaa as select floor(random()*100)::int num, (random()*10 < 1)::bool flag from generate_series(1, 10000000) id; create index i1 on aaa (num); create index i2 on aaa (flag); analyze aaa; select relname, reltuples::bigint, relpages::bigint, (reltuples/relpages)::bigint tpp from pg_class where relname in('aaa','i1','i2') order by relname; "aaa";10000033;44248;226 "i1";10000033;27422;365 "i2";10000033;27422;365 select flag, count(*) from aaa group by flag order by flag; f;9000661 t;999339 select num, count(*) from aaa group by num order by num; 0;99852 1;99631 2;99699 3;100493 ... 96;100345 97;99322 98;100013 99;100030 explain (analyze,verbose,costs,buffers) select * from aaa where num = 1 and flag = true; Bitmap Heap Scan on public.aaa (cost=12829.83..24729.85 rows=10340 width=5) (actual time=104.941..112.649 rows=9944 loops=1) Output: num, flag Recheck Cond: (aaa.num = 1) Filter: aaa.flag Heap Blocks: exact=8922 Buffers: shared hit=11932 -> BitmapAnd (cost=12829.83..12829.83 rows=10340 width=0) (actual time=102.926..102.926 rows=0 loops=1) Buffers: shared hit=3010 -> Bitmap Index Scan on i1 (cost=0.00..1201.44 rows=103334 width=0) (actual time=15.459..15.459 rows=99631 loops=1) Index Cond: (aaa.num = 1) Buffers: shared hit=276 -> Bitmap Index Scan on i2 (cost=0.00..11622.97 rows=1000671 width=0) (actual time=76.906..76.906 rows=999339 loops=1) Index Cond: (aaa.flag = true) Buffers: shared hit=2734 Planning time: 0.110 ms Execution time: 113.272 ms Index Scan using i1 on public.aaa (cost=0.44..66621.56 rows=10340 width=5) (actual time=0.027..47.075 rows=9944 loops=1) Output: num, flag Index Cond: (aaa.num = 1) Filter: aaa.flag Rows Removed by Filter: 89687 Buffers: shared hit=39949 Planning time: 0.104 ms Execution time: 47.351 ms > > I tried creating multiple-column statistics using the v10 facility for > that: > > regression=# create statistics s1 on num, flag from aaa; > CREATE STATISTICS > regression=# analyze aaa; > ANALYZE > > but that changed the estimate not at all, which surprised me because > dependency statistics are supposed to fix exactly this type of problem. > I suspect there may be something in the extended-stats code that causes it > not to work right for boolean columns --- this wouldn't be excessively > surprising because of the way the planner messes around with converting > "flag = true" to just "flag" and sometimes back again. But I've not > looked closer yet. > > regards, tom lane > . >
В списке pgsql-performance по дате отправления: