Re: Bitmap scan is undercosted?

Поиск
Список
Период
Сортировка
От Vitaliy Garnashevich
Тема Re: Bitmap scan is undercosted?
Дата
Msg-id d2b6a335-8c62-4983-3b5b-c8a75cf37d6f@gmail.com
обсуждение исходный текст
Ответ на Re: Bitmap scan is undercosted?  (Justin Pryzby <pryzby@telsasoft.com>)
Ответы Re: Bitmap scan is undercosted?
Re: Bitmap scan is undercosted?
Список pgsql-performance
On 02/12/2017 01:11, Justin Pryzby wrote:
> I tried to reproduce this issue and couldn't, under PG95 and 10.1:
>
> On Fri, Dec 01, 2017 at 12:34:27PM -0600, Justin Pryzby wrote:
>> On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote:
>>> We recently had an issue in production, where a bitmap scan was chosen
>>> instead of an index scan. Despite being 30x slower, the bitmap scan had
>>> about the same cost as the index scan.
>>> drop table if exists aaa;
>>> create table aaa as select (id%100)::int num, (id%10=1)::bool flag from
>>> generate_series(1, 10000000) id;
>>> create index i1 on aaa  (num);
>>> create index i2 on aaa  (flag);
>>> analyze aaa;
> What is:
> effective_io_concurrency
> max_parallel_workers_per_gather (I gather you don't have this)
effective_io_concurrency = 0
max_parallel_workers_per_gather = 0

Did you notice random_page_cost = 1.5 ?

For this test I'm using SSD and Windows (if that matters). On production 
we also use SSD, hence lower random_page_cost. But with the default 
random_page_cost=4.0, the difference in cost between the index scan plan 
and the bitmap scan plan is even bigger.
>
> Note:
> postgres=# SELECT correlation FROM pg_stats WHERE tablename='aaa' AND attname='num';
> correlation | 0.00710112
>
> ..so this is different from the issue corrected by the patch I created while
> testing.
>
>> Note that id%100==1 implies flag='t', so the planner anticipates retrieving
>> fewer rows than it will ultimately read, probably by 2x.  It makes sense that
>> causes the index scan to be more expensive than expected, but that's only
>> somewhat important, since there's no joins involved.
> I changed the query from COUNT(*) TO * for easier to read explain:
>
> CREATE TABLE aaa AS SELECT (id%100)::int num, (id%10=1)::bool flag FROM generate_series(1, 10000000) id;
> CREATE INDEX i1 ON aaa(num);
> CREATE INDEX i2 ON aaa (flag);
> ANALYZE VERBOSE aaa;
> EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
>   Bitmap Heap Scan on public.aaa  (cost=20652.98..45751.75 rows=10754 width=5) (actual time=85.314..185.107
rows=100000loops=1)
 
>     ->  BitmapAnd  (cost=20652.98..20652.98 rows=10754 width=0) (actual time=163.220..163.220 rows=0 loops=1)
>           ->  Bitmap Index Scan on i1  (cost=0.00..1965.93 rows=106333 width=0) (actual time=26.943..26.943
rows=100000loops=1)
 
>           ->  Bitmap Index Scan on i2  (cost=0.00..18681.42 rows=1011332 width=0) (actual time=133.804..133.804
rows=1000000loops=1)
 
>
> ..which is what's wanted with no planner hints (PG10.1 here).
Yes, that's what you get without planner hints, but it's strange to get 
this plan, when there is another one, which runs 2-3 times faster, but 
happens to be estimated to be twice more costly than the one with bitmap 
scans:

# set enable_bitmapscan = off; set enable_indexscan = on;  set 
enable_seqscan = off;
# explain analyze select * from aaa where num = 1 and flag = true;
Index Scan using i1 on aaa  (cost=0.44..66369.81 rows=10428 width=5) 
(actual time=0.020..57.765 rows=100000 loops=1)

vs.

# set enable_bitmapscan = on;  set enable_indexscan = off; set 
enable_seqscan = off;
# explain analyze select * from aaa where num = 1 and flag = true;
Bitmap Heap Scan on aaa  (cost=13099.33..25081.40 rows=10428 width=5) 
(actual time=122.137..182.811 rows=100000 loops=1)
   ->  BitmapAnd  (cost=13099.33..13099.33 rows=10428 width=0) (actual 
time=110.168..110.168 rows=0 loops=1)
         ->  Bitmap Index Scan on i1  (cost=0.00..1181.44 rows=101667 
width=0) (actual time=20.845..20.845 rows=100000 loops=1)
         ->  Bitmap Index Scan on i2  (cost=0.00..11912.43 rows=1025666 
width=0) (actual time=80.323..80.323 rows=1000000 loops=1)

>
> Same on PG95:
> postgres=# EXPLAIN (analyze,verbose,costs,buffers) SELECT * FROM aaa WHERE num=1 AND flag=true;
>   Bitmap Heap Scan on public.aaa  (cost=19755.64..43640.32 rows=9979 width=5) (actual time=230.017..336.583
rows=100000loops=1)
 
>     ->  BitmapAnd  (cost=19755.64..19755.64 rows=9979 width=0) (actual time=205.242..205.242 rows=0 loops=1)
>           ->  Bitmap Index Scan on i1  (cost=0.00..1911.44 rows=103334 width=0) (actual time=24.911..24.911
rows=100000loops=1)
 
>           ->  Bitmap Index Scan on i2  (cost=0.00..17838.96 rows=965670 width=0) (actual time=154.237..154.237
rows=1000000loops=1)
 
>
> The rowcount is off, but not a critical issue without a join.
>
> Justin
>



В списке pgsql-performance по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: Bitmap scan is undercosted?
Следующее
От: pryzby@telsasoft.com (Justin Pryzby)
Дата:
Сообщение: Re: Bitmap scan is undercosted?