Re: Bitmap scan is undercosted? - overestimated correlation andcost_index
От | Justin Pryzby |
---|---|
Тема | Re: Bitmap scan is undercosted? - overestimated correlation andcost_index |
Дата | |
Msg-id | 20171217023701.GQ18184@telsasoft.com обсуждение исходный текст |
Ответ на | Re: Bitmap scan is undercosted? - overestimated correlation andcost_index (Justin Pryzby <pryzby@telsasoft.com>) |
Список | pgsql-performance |
On Fri, Dec 15, 2017 at 02:54:06PM -0600, Justin Pryzby wrote: > SSD: good question. > > Here's an rackspace VM with PG9.6.6, 2GB shared_buffers, 8GB RAM (~4GB of which > is being used as OS page cache), and 32GB SSD (with random_page_cost=1). The > server is in use by our application. > > I believe you could scale up the size of the table to see this behavior with > any cache size. 0.0001 controls the "jitter", with smaller values being more > jittery.. On Sat, Dec 16, 2017 at 01:18:38PM -0600, Justin Pryzby wrote: > I realized I've made a mistake here; the table is on SSD but not its index... > So all this cost is apparently coming from the index and not the heap. > > -> Bitmap Heap Scan on t (cost=855041.91..1901994.06 rows=40298277 width=4) (actual time=14202.624..27754.982 rows=40009853loops=1) > -> Bitmap Index Scan on t_i_idx1 (cost=0.00..844967.34 rows=40298277 width=0) (actual time=14145.877..14145.877rows=40009853 loops=1) I'm rerunning with this: postgres=# CREATE TABLE t(i int,j int) TABLESPACE tmp; CREATE INDEX ON t(i) TABLESPACE tmp; INSERT INTO t SELECT (0.0001*a+9*(random()-0.5))::intFROM generate_series(1,99999999) a; VACUUM ANALYZE t; CREATE INDEX ON t(i) TABLESPACE tmp; That doesn't seem to invalidate my conclusions regarding the test data. The non-fresh index: #1 -> Index Scan using t_i_idx on t (cost=0.57..1103588.59 rows=39536704 width=4) (actual time=2.295..60094.704 rows=40009646loops=1) Rerun: #2 -> Index Scan using t_i_idx on t (cost=0.57..1103588.59 rows=39536704 width=4) (actual time=1.671..54209.037 rows=40009646loops=1) #3 -> Index Scan using t_i_idx on t (cost=0.57..1103588.59 rows=39536704 width=4) (actual time=1.743..46436.538 rows=40009646loops=1) Scan fresh index: -> Index Scan using t_i_idx1 on t (cost=0.57..1074105.46 rows=39536704 width=4) (actual time=1.715..16119.720 rows=40009646loops=1) bitmap scan on non-fresh idx: -> Bitmap Heap Scan on t (cost=543141.78..1578670.34 rows=39536704 width=4) (actual time=4397.767..9137.541 rows=40009646loops=1) Buffers: shared hit=91235 read=225314 -> Bitmap Index Scan on t_i_idx (cost=0.00..533257.61 rows=39536704 width=0) (actual time=4346.556..4346.556 rows=40009646loops=1) Buffers: shared read=139118 seq scan: -> Seq Scan on t (cost=0.00..1942478.00 rows=39536704 width=4) (actual time=6093.269..17880.164 rows=40009646 loops=1) I also tried an idx only scan (note COUNT i vs j / "eye" vs "jay"), which I think should be like an index scan without heap costs: postgres=# SET max_parallel_workers_per_gather=0;SET enable_bitmapscan=off;SET enable_indexscan=on; begin; DROP INDEX t_i_idx1;explain(analyze,buffers) SELECT COUNT(i) FROM t WHERE i BETWEEN 4000 AND 8000; rollback; -> Index Only Scan using t_i_idx on t (cost=0.57..928624.65 rows=39536704 width=4) (actual time=0.515..12646.676 rows=40009646loops=1) Buffers: shared hit=276 read=139118 However, in this test, random reads on the INDEX are still causing a large fraction of the query time. When cached by the OS, this is much faster. Compare: #1 -> Bitmap Heap Scan on t (cost=543141.78..1578670.34 rows=39536704 width=4) (actual time=25498.978..41418.870 rows=40009646loops=1) Buffers: shared read=316549 written=497 -> Bitmap Index Scan on t_i_idx (cost=0.00..533257.61 rows=39536704 width=0) (actual time=25435.865..25435.865rows=40009646 loops=1) Buffers: shared read=139118 written=2 #2 -> Bitmap Heap Scan on t (cost=543141.78..1578670.34 rows=39536704 width=4) (actual time=5863.003..17531.860 rows=40009646loops=1) Buffers: shared read=316549 written=31 -> Bitmap Index Scan on t_i_idx (cost=0.00..533257.61 rows=39536704 width=0) (actual time=5799.400..5799.400 rows=40009646loops=1) Buffers: shared read=139118 written=31 Note that for the test data, the index is a large fraction of the table data (since the only non-indexed column is nullfrac=1): public | t | table | pryzbyj | 3458 MB | public | t_i_idx | index | pryzbyj | t | 2725 MB | public | t_i_idx1 | index | pryzbyj | t | 2142 MB | (that could be 10% smaller with fillfactor=100) I think the test case are reasonably reproducing the original issue. Note that the 2nd invocation of the bitmap scan scanned the index in 5.8sec and the heap in 11sec, but the 2nd invocation of the index scan took 54sec, of which I gather ~6sec was from the index. So there's still 48sec spent accessing the heap randomly, rather than 11sec sequentially. I'm also playing with the tables which were the source of the original problem, for which index reads in bitmap scan do not appear to be a large fraction of the query time, probably because the index are 1-2% of the table size rather than 60-70%. I'll mail about that separately. Justin
В списке pgsql-performance по дате отправления:
Предыдущее
От: Justin PryzbyДата:
Сообщение: Re: Bitmap scan is undercosted? - overestimated correlation andcost_index