Bitmap table scan cost per page formula

Поиск
Список
Период
Сортировка
От Haisheng Yuan
Тема Bitmap table scan cost per page formula
Дата
Msg-id CAPW_87H0X2ZAr_oZu6MTS62h=_oyi9NWU=jnPApcxjB8BfcK0w@mail.gmail.com
обсуждение исходный текст
Ответы Re: Bitmap table scan cost per page formula
Re: Bitmap table scan cost per page formula
Re: Bitmap table scan cost per page formula
Список pgsql-hackers
Hi hackers,

This is Haisheng Yuan from Greenplum Database.

We had some query in production showing that planner favors seqscan over bitmapscan, and the execution of seqscan is 5x slower than using bitmapscan, but the cost of bitmapscan is 2x the cost of seqscan. The statistics were updated and quite accurate. 

Bitmap table scan uses a formula to interpolate between random_page_cost and seq_page_cost to determine the cost per page. In Greenplum Database, the default value of random_page_cost is 100, the default value of seq_page_cost is 1. With the original cost formula, random_page_cost dominates in the final cost result, even the formula is declared to be non-linear. However, it is still more like linear, which can't reflect the real cost per page when a majority of pages are fetched. Therefore, the cost formula is updated to real non-linear function to reflect both random_page_cost and seq_page_cost for different percentage of pages fetched. 

Even though PostgreSQL has much smaller default value of random_page_cost (4), the same problem exists there if we change the default value. 

Old cost formula:
cost_per_page = random_page_cost - (random_page_cost - seq_page_cost) * sqrt(pages_fetched / T);
Inline image 1

New cost formula:
cost_per_page = seq_page_cost * pow(random_page_cost / seq_page_cost, 1 - sqrt(pages_fetched / T));
Inline image 2

Below is the graph (credit to Heikki) that plots the total estimated cost of a bitmap heap scan, where table size is 10000 pages, and random_page_cost=10 and seq_page_cost=1. X axis is the number of pages fetche. I.e. on the left, no pages are fetched, and on the right end, at 10000, all pages are fetched. The original formula is in black, the new formula in red, and the horizontal line, in blue, shows the cost of a Seq Scan.
Inline image 3


Thoughts? Any better ideas?

Thanks!
Haisheng Yuan
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] logical decoding of two-phase transactions
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] replace GrantObjectType with ObjectType