Re: Risk Estimation WAS: Planner hints in Postgresql

Поиск
Список
Период
Сортировка
От Josh Berkus
Тема Re: Risk Estimation WAS: Planner hints in Postgresql
Дата
Msg-id 53288CC2.8090803@agliodbs.com
обсуждение исходный текст
Ответ на Planner hints in Postgresql  (Rajmohan C <csrajmohan@gmail.com>)
Ответы Re: Risk Estimation WAS: Planner hints in Postgresql  (Atri Sharma <atri.jiit@gmail.com>)
Список pgsql-hackers
> On Mon, Mar 17, 2014 at 8:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah.  I would like to see the planner's cost estimates extended to
>> include some sort of uncertainty estimate, whereupon risk-averse people
>> could ask it to prefer low-uncertainty plans over high-uncertainty ones
>> (the plans we typically choose for ORDER BY ... LIMIT queries being great
>> examples of the latter).  But it's a long way from wishing that to making
>> it so.  Right now it's not even clear (to me anyway) how we'd measure or
>> model such uncertainty.

This is not a model, but here's some starting thoughts:

A "high risk" plan has two components:

a) our statistical data is out-of-date or inadequate

b) the potential execution time if our estimates of selectivity are
wrong is high

c) the cost ratio of certain operations is wrong.

Factor (a) can be modeled two ways:

1. If last_analyze is a long time ago, we have increased the risk.  (Ideally, we'd have some idea of the change rate on
thetable vs.   the last analyze time; right now we don't have those stats)
 

2. Certain patterns, such as multi-column selectivity and GIN/GiST
selectivity are known to have poor estimates, and be higher risk.
Certainly selectivity functions which have been programmed with a flat
coefficient (like default 0.05 selectivity for gist_ops) could also
return a risk factor which is fairly high.

Factor (b) can be modeled simply by estimating the cost of a plan where
all row estimates are changed by 10X, or even better by a calculation on
the risk factor calculated in (a).  This would then give us the "failure
cost" of the bad plan.  Note that we need to estimate in both
directions, both for higher estimates and lower ones; "abort early"
plans fail because the rows returned are lower than expected, for example.

(b) estimation would be expensive if we did every combination of the
entire plan with wrong estimates, so I'm wondering if it would be
adequate to just estimate the node selectivity being off on a per-node
basis.

(c) we can't realistically estimate for at all (i.e. if we knew the cost
factor was wrong, we'd fix it) so I suggest ignoring it for risk estimation.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: pg_archivecleanup bug
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_archivecleanup bug