Re: Bitmap scan is undercosted?
От | Vitaliy Garnashevich |
---|---|
Тема | Re: Bitmap scan is undercosted? |
Дата | |
Msg-id | 2409b7fa-33dc-fa06-87b3-408774201885@gmail.com обсуждение исходный текст |
Ответ на | Re: Bitmap scan is undercosted? (Justin Pryzby <pryzby@telsasoft.com>) |
Список | pgsql-performance |
On 01/12/2017 20:34, Justin Pryzby wrote: > On Fri, Dec 01, 2017 at 07:40:08PM +0200, Vitaliy Garnashevich wrote: >> We recently had an issue in production, where a bitmap scan was chosen >> instead of an index scan. Despite being 30x slower, the bitmap scan had >> about the same cost as the index scan. > Me too, see also: > https://www.postgresql.org/message-id/flat/CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA%40mail.gmail.com#CAH2-WzkRTggiy_LKQUu-oViyp6y_Hhz-a1yWacPy4tcYWV1HoA@mail.gmail.com > >> drop table if exists aaa; >> create table aaa as select (id%100)::int num, (id%10=1)::bool flag from >> generate_series(1, 10000000) id; >> create index i1 on aaa (num); >> create index i2 on aaa (flag); >> analyze aaa; >> >> select relname, reltuples::bigint, relpages::bigint, >> (reltuples/relpages)::bigint tpp from pg_class where relname >> in('aaa','i1','i2') order by relname; >> "aaa";9999985;44248;226 >> "i1";9999985;27422;365 >> "i2";9999985;27422;365 >> >> The query was: >> explain (analyze,verbose,costs,buffers) >> select count(*) from aaa where num = 1 and flag = true; > Note that id%100==1 implies flag='t', so the planner anticipates retrieving > fewer rows than it will ultimately read, probably by 2x. It makes sense that > causes the index scan to be more expensive than expected, but that's only > somewhat important, since there's no joins involved. I don't think the planner is that smart to account for correlation between values in different columns. When different values are used in filter (num=2, num=39, num=74), the query actually runs faster, while still being about twice slower than using an index scan. But the cost does not change much. It jumps up and down for different values, but it's still close to the initial value. Aggregate (cost=24239.02..24239.03 rows=1 width=8) (actual time=105.239..105.239 rows=1 loops=1) Output: count(*) Buffers: shared hit=3011 -> Bitmap Heap Scan on public.aaa (cost=12812.05..24214.48 rows=9816 width=0) (actual time=105.236..105.236 rows=0 loops=1) Output: num, flag Recheck Cond: (aaa.num = 39) Filter: aaa.flag Buffers: shared hit=3011 -> BitmapAnd (cost=12812.05..12812.05 rows=9816 width=0) (actual time=105.157..105.157 rows=0 loops=1) Buffers: shared hit=3011 -> Bitmap Index Scan on i1 (cost=0.00..1134.94 rows=97667 width=0) (actual time=15.725..15.725 rows=100000 loops=1) Index Cond: (aaa.num = 39) Buffers: shared hit=276 -> Bitmap Index Scan on i2 (cost=0.00..11671.96 rows=1005003 width=0) (actual time=77.920..77.920 rows=1000000 loops=1) Index Cond: (aaa.flag = true) Buffers: shared hit=2735 Planning time: 0.104 ms Execution time: 105.553 ms Aggregate (cost=65785.99..65786.00 rows=1 width=8) (actual time=48.587..48.587 rows=1 loops=1) Output: count(*) Buffers: shared hit=44524 -> Index Scan using i1 on public.aaa (cost=0.44..65761.45 rows=9816 width=0) (actual time=48.583..48.583 rows=0 loops=1) Output: num, flag Index Cond: (aaa.num = 39) Filter: aaa.flag Rows Removed by Filter: 100000 Buffers: shared hit=44524 Planning time: 0.097 ms Execution time: 48.620 ms > > The reason why it's more than a bit slower is due to the "density" [0] of the > heap pages read. num=1 is more selective than flag=true, so it scans i1, > reading 1% of the whole table. But it's not reading the first 1% or > some other 1% of the table, it reads tuples evenly distributed across the > entire table (226*0.01 = ~2 rows of each page). Since the index was created > after the INSERT, the repeated keys (logical value: id%100) are read in > physical order on the heap, so this is basically doing a seq scan, but with the > additional overhead of reading the index, and maybe doing an fseek() before > each/some read()s, too. You could confirm that by connecting strace to the > backend before starting the query. > > Since you did that using % and with indices added after the INSERT, you can't > improve it by reindexing (as I was able to for my case). That's an elegant > test case, so thanks. > > I think shared_buffers=512MB is just small enough for this test to be painful > for 1e7 rows. I see the table+index is 559MB. table | ~count | size | toast | idx | size + toast + idx ---------------------------+---------+------------+------------+--------+-------------------- aaa | 9999994 | 346 MB | 0 bytes | 428 MB | 774 MB But the plan says all buffers are "shared hit", and none "read", so that's probably not an issue. > > I don't know if that's really similar to your production use case, but I would > recommend trying BRIN indices, which always require a bitmap scan. Note that > some things (like max()) that can use an btree index cannot use brin. PG10.1 > has WITH autosummarize, which was important for our use, since we rarely do > UPDATEs or DELETEs so tables are rarely vacuumed (only analyzed). Yes, BRIN indexes should be beneficial for our case, because there is a date column which grows over time (not strictly regularly, but still). Unfortunately, we're still migrating our databases from 9.3 to 9.6. Anyway, thanks for the advice. > > Justin > > [0] I'm borrowing Jeff's language from here: > https://www.postgresql.org/message-id/CAMkU%3D1xwGn%2BO0jhKsvrUrbW9MQp1YX0iB4Y-6h1mEz0ffBxK-Q%40mail.gmail.com > "density" wasn't our problem, but it's a perfect description of this issue. >
В списке pgsql-performance по дате отправления: