Re: Bitmap scan is undercosted?

Поиск
Список
Период
Сортировка
От Vitaliy Garnashevich
Тема Re: Bitmap scan is undercosted?
Дата
Msg-id 84b6f7b5-df1a-94be-9b82-5a2756119ebe@gmail.com
обсуждение исходный текст
Ответ на Re: Bitmap scan is undercosted?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
On 03/12/2017 01:44, Tom Lane wrote:
> I think it'd be a serious error to screw around with your cost settings
> on the basis of a single case in which the rowcount estimates are so
> far off.  It's really those estimates that are the problem AFAICS.
>
> The core issue in this example is that, the way the test data is set up,
> the "flag = true" condition actually adds no selectivity at all, because
> every row with "num = 1" is certain to have "flag = true".  If the planner
> realized that, it would certainly not bother with BitmapAnd'ing the flag
> index onto the results of the num index.  But it doesn't know that those
> columns are correlated, so it supposes that adding the extra index will
> give a 10x reduction in the number of heap rows that have to be visited
> (since it knows that only 1/10th of the rows have "flag = true").
> *That* is what causes the overly optimistic cost estimate for the
> two-index bitmapscan, and no amount of fiddling with the cost parameters
> will make that better.
Here I've tried to make a test which would not have correlation between 
the two columns.

shared_buffers = 512MB
effective_cache_size = 512MB
work_mem = 100MB

set seq_page_cost = 1.0;
set random_page_cost = 1.5;
set cpu_tuple_cost = 0.01;
set cpu_index_tuple_cost = 0.005;
set cpu_operator_cost = 0.0025;

drop table if exists aaa;
create table aaa as select floor(random()*100)::int num, (random()*10 < 
1)::bool flag from generate_series(1, 10000000) id;
create index i1 on aaa  (num);
create index i2 on aaa  (flag);
analyze aaa;

select relname, reltuples::bigint, relpages::bigint, 
(reltuples/relpages)::bigint tpp from pg_class where relname 
in('aaa','i1','i2') order by relname;
"aaa";10000033;44248;226
"i1";10000033;27422;365
"i2";10000033;27422;365

select flag, count(*) from aaa group by flag order by flag;
f;9000661
t;999339

select num, count(*) from aaa group by num order by num;
0;99852
1;99631
2;99699
3;100493
...
96;100345
97;99322
98;100013
99;100030

explain (analyze,verbose,costs,buffers)
select * from aaa where num = 1 and flag = true;

Bitmap Heap Scan on public.aaa  (cost=12829.83..24729.85 rows=10340 
width=5) (actual time=104.941..112.649 rows=9944 loops=1)
   Output: num, flag
   Recheck Cond: (aaa.num = 1)
   Filter: aaa.flag
   Heap Blocks: exact=8922
   Buffers: shared hit=11932
   ->  BitmapAnd  (cost=12829.83..12829.83 rows=10340 width=0) (actual 
time=102.926..102.926 rows=0 loops=1)
         Buffers: shared hit=3010
         ->  Bitmap Index Scan on i1  (cost=0.00..1201.44 rows=103334 
width=0) (actual time=15.459..15.459 rows=99631 loops=1)
               Index Cond: (aaa.num = 1)
               Buffers: shared hit=276
         ->  Bitmap Index Scan on i2  (cost=0.00..11622.97 rows=1000671 
width=0) (actual time=76.906..76.906 rows=999339 loops=1)
               Index Cond: (aaa.flag = true)
               Buffers: shared hit=2734
Planning time: 0.110 ms
Execution time: 113.272 ms

Index Scan using i1 on public.aaa  (cost=0.44..66621.56 rows=10340 
width=5) (actual time=0.027..47.075 rows=9944 loops=1)
   Output: num, flag
   Index Cond: (aaa.num = 1)
   Filter: aaa.flag
   Rows Removed by Filter: 89687
   Buffers: shared hit=39949
Planning time: 0.104 ms
Execution time: 47.351 ms

>
> I tried creating multiple-column statistics using the v10 facility for
> that:
>
> regression=# create statistics s1 on num, flag from aaa;
> CREATE STATISTICS
> regression=# analyze aaa;
> ANALYZE
>
> but that changed the estimate not at all, which surprised me because
> dependency statistics are supposed to fix exactly this type of problem.
> I suspect there may be something in the extended-stats code that causes it
> not to work right for boolean columns --- this wouldn't be excessively
> surprising because of the way the planner messes around with converting
> "flag = true" to just "flag" and sometimes back again.  But I've not
> looked closer yet.
>
>             regards, tom lane
> .
>



В списке pgsql-performance по дате отправления:

Предыдущее
От: Vitaliy Garnashevich
Дата:
Сообщение: Re: Bitmap scan is undercosted?
Следующее
От: Vitaliy Garnashevich
Дата:
Сообщение: Re: Bitmap scan is undercosted?