bitmaps and correlation

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема bitmaps and correlation
Дата
Msg-id 20190101225615.GF25379@telsasoft.com
обсуждение исходный текст
Ответы Re: bitmaps and correlation  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
It's a new year and I'm getting reflective, so resuming a portion of
conversation we had here:
https://www.postgresql.org/message-id/CAMkU%3D1yVbwEAugaCmKWxjaX15ZduWee45%2B_DqCw--d_3N_O_%3DQ%40mail.gmail.com

Find attached patch which implements use of correlation statistic in costing
for bitmap scans.

An opened question in my mind is how to combine the correlation statistic with
existing cost_per_page:

 . sqrt(a^2+b^2)
 . MIN()
 . multiply existing computation by new correlation component

On Wed, Dec 20, 2017 at 09:55:40PM -0800, Jeff Janes wrote:
> On Tue, Dec 19, 2017 at 7:25 PM, Justin Pryzby <pryzby@telsasoft.com> wrote:
> 
> > I started playing with this weeks ago (probably during Vitaliy's problem
> > report).  Is there any reason cost_bitmap_heap_scan shouldn't interpolate
> > based on correlation from seq_page_cost to rand_page_cost, same as
> > cost_index ?
> 
> I think that doing something like that is a good idea in general, but someone
> has to implement the code, and so far no one seems enthused to do so.  You
> seem pretty interested in the topic, so....

I tested patch using CDR data which was causing issues for us years ago:
https://www.postgresql.org/message-id/20160524173914.GA11880%40telsasoft.com

Note: since that original problem report:
 . the SAN is backed by SSD rather than rotational storage;
 . we're using relkind=p partitioned tables;
 . PG12 uses pread() rather than lstat()+read(), so overhead of seek()+read()
  is avoided (but probably wasn't a substantial component of the problem);

Unpatched, I'm unable to get bitmap scan without disabling indexscan and seqscan.
| Bitmap Heap Scan on cdrs_huawei_pgwrecord_2018_12_25  (cost=55764.07..1974230.46 rows=2295379 width=1375)
|   Recheck Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp with time zone) AND (recordopeningtime <=
'2018-12-2510:00:00-06'::timestamp with time zone))
 
|   ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2018_12_25_recordopeningtime_idx  (cost=0.00..55190.22 rows=2295379
width=0)
|         Index Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp with time zone) AND (recordopeningtime
<='2018-12-25 10:00:00-06'::timestamp with time zone))
 

Patched, I get bitmap scan when random_page_cost is reduced enough that startup
cost (index scan component) is not prohibitive.  But for simplicity, this just
forces bitmap by setting enable_indexscan=off;
| Bitmap Heap Scan on cdrs_huawei_pgwrecord_2018_12_25  (cost=55764.07..527057.45 rows=2295379 width=1375)
|   Recheck Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp with time zone) AND (recordopeningtime <=
'2018-12-2510:00:00-06'::timestamp with time zone))
 
|   ->  Bitmap Index Scan on cdrs_huawei_pgwrecord_2018_12_25_recordopeningtime_idx  (cost=0.00..55190.22 rows=2295379
width=0)
|         Index Cond: ((recordopeningtime >= '2018-12-25 05:00:00-06'::timestamp with time zone) AND (recordopeningtime
<='2018-12-25 10:00:00-06'::timestamp with time zone))
 

That addresses the issue we had in part.  A remaining problem is that
correlation fails to distinguish between "fresh" index and fragmented index,
and so heap access of a correlated index may looks cheaper than it is.  Which
is why I still have to set random_page_cost to get bitmap scan.

Justin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Refactoring the checkpointer's fsync request queue
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: bitmaps and correlation