Proposal for new SET variables for optimizer costs

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Proposal for new SET variables for optimizer costs
Дата	5 февраля 2000 г. 16:30:10
Msg-id	14601.949786166@sss.pgh.pa.us обсуждение
Ответы	Re: [HACKERS] Proposal for new SET variables for optimizer costs Re: [HACKERS] Proposal for new SET variables for optimizer costs
Список	pgsql-hackers

Дерево обсуждения

I am about to implement some changes to the planner/optimizer's cost
model, following up to the thread on pghackers beginning on 20 Jan.
The main conclusion of that thread was that we needed to charge more for
a page fetched nonsequentially than for a page fetched sequentially.
After further investigation I have concluded that it is also appropriate
to include explicit modeling of the cost of evaluation of WHERE clauses.
For example, using the regression database and a query like

select * from tenk1 where
(unique1 = 1 and unique2 = 101) or
(unique1 = 2 and unique2 = 102) or
(unique1 = 3 and unique2 = 103) or
... 100 OR clauses ...
(unique1 = 100 and unique2 = 200);

(which is not too implausible for certain automatic query generators),
I observe that a sequential scan takes about 6 seconds, vs. less than
a second for a similar query with only 10 clauses.  That says that the
cost of evaluating a WHERE clause this large is far from negligible.
The optimizer needs to account for this because different query plans
can have a considerable impact on the number of tuples that the WHERE
clause is evaluated for --- in this example, if we use indexscans to
pull out just the tuples with the right values of 'unique1', then the
WHERE clause need only be checked at 100 tuples, not all 10000.

I believe it would be reasonable to charge a certain amount per operator
or function appearing in the WHERE clause in order to account for this
effect.  (Currently I see no need to model the cost of evaluating the
targetlist expressions.  The same expressions should get evaluated for
the same tuples no matter what query plan the optimizer picks, so we
might as well just leave that cost out of our comparisons.)

Also, as was previously mentioned on pghackers, I would like to add SET
variables to control enabling/disabling of particular query plan types,
so that different plans can be checked with less hassle than restarting
psql with a new PGOPTIONS setting.

This all leads to the following proposal for redoing the optimizer
plan cost SET variables.  The variables proposed below would replace
COST_HEAP and COST_INDEX, which are poorly named IMHO and are definitely
very misleadingly documented at present.

(Note that all costs will still be referenced to the cost of a disk page
fetch.  We will take 1.0 as the cost of a sequential page fetch.)


SET variable name    Internal variable    Proposed default

RANDOM_PAGE_COST    random_page_cost    4.0

Cost of fetching a disk block nonsequentially (as a multiple of the cost
of a sequential block fetch).

CPU_TUPLE_COST        cpu_tuple_cost        0.01

Cost of CPU time per tuple processed within a query (as a fraction of
the cost of a sequential disk block fetch).  This renames the existing
SET variable COST_HEAP (cpu_page_weight); but the default value is
smaller than it used to be, since WHERE clause evaluation will now be
accounted for separately.

CPU_INDEX_TUPLE_COST    cpu_index_tuple_cost    0.001

Cost of CPU time per index tuple processed within a query (as a fraction
of the cost of a sequential disk block fetch).  This renames the
existing SET variable COST_INDEX (cpu_index_page_weight); but the
default value is much smaller than it used to be, since the operator
evaluation cost will account for the bulk of the cost of visiting an
index tuple.

CPU_OPERATOR_COST    cpu_operator_cost    0.0025

Cost of CPU time per operator or function evaluated in a WHERE clause.
Note that this would apply to operators evaluated at index tuples as
well as those evaluated against heap tuples.
(The proposed default corresponds to a ratio of 5 microsec against 2
millisec for a sequential block fetch, which seems to be about right
on my workstation.)

ENABLE_SEQSCAN        enable_seqscan        ON

ENABLE_INDEXSCAN    enable_indexscan    ON

ENABLE_TIDSCAN        enable_tidscan        ON

ENABLE_SORT        enable_sort        ON

ENABLE_NESTLOOP        enable_nestloop        ON

ENABLE_MERGEJOIN    enable_mergejoin    ON

ENABLE_HASHJOIN        enable_hashjoin        ON

Provide access via SET to the already-existing internal optimizer
control flags.

Currently, it is possible to have COST_HEAP and COST_INDEX set
automatically during connection startup; libpq will do that if
the environment variables PGCOSTHEAP and/or PGCOSTINDEX are defined
on the client side.  If we want to continue that behavior, the
environment variables for these variables would be named
PGRANDOMPAGECOST etc (remove underscores and prepend PG).
I'm not sure if we want to continue inventing client-side environment
variables, however.


Comments?  Ideas for better names?  Anyone object to renaming the
existing variables?  (BTW, although it could be argued that this
might break existing scripts that set COST_HEAP or COST_INDEX,
I doubt that there are any ... and given the existing doco,
I doubt even more that anyone is setting appropriate values ...)
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Proposal for new SET variables for optimizer costs