Re:disfavoring unparameterized nested loops

Поиск
Список
Период
Сортировка
От Finnerty, Jim
Тема Re:disfavoring unparameterized nested loops
Дата
Msg-id 1ECE0028-C2F0-43BF-84F1-E3455CE10ED4@amazon.com
обсуждение исходный текст
Ответ на disfavoring unparameterized nested loops  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
> But making everything slower will be a hard sell, because vast majority of
> workloads already running on Postgres don't have this issue at all, so
> for them it's not worth the expense. Following the insurance analogy,
> selling tornado insurance in Europe is mostly pointless.
>

Agree. I've been surprised about NOT hearing complaints from PostgreSQL
customers about a particular "bad" plan choice that was common in other
rdbms products where large, complex queries were the norm.  The situation
occurs late in a plan with many joins where a hash join can be used and  
where either side is estimated to fit in memory.  On one side is a base table 
with cardinality that we have statistics for, while the other side has an
estimated cardinality that is the result of many estimates each of which
has error that can compound, and that in some cases amounts to a wild guess.
(e.g. what is the selectivity of SUM(x) < 12 ?).  If the planner's point estimate 
of cardinality is such that both sides could fit in memory, then a bad plan can
easily be made.  As Peter said, [ most ] humans have no trouble dealing with 
these kind of situations. They take the risk of being wrong into account.

So in our world, the useful numbers are 0, 1, measured N, and estimated N,
but we don't distinguish between measured N and estimated N.

But that doesn't mean that OLTP customers would be willing to accept
slightly suboptimal plans to mitigate a risk they don't experience.

> Insurance is also about personal preference / risk tolerance. Maybe I'm
> fine with accepting risk that my house burns down, or whatever ...

Right, and that's why the problem mentioned above is still out there
annoying customers who have complex plans.  To them it looks like
an obviously bad plan choice.

Something that might help is to have the planner cost be a structure instead
of a number.  Costs of plans that are deemed "risky" are accumulated 
separately from plans that make no risky choices, and for a given set
of join items you keep the minimum cost plan of both types. It may happen that all
plans eventually make a risky choice, in which case you take the plan with the minimum
cost, but if there exists a plan with no risky choices, then the minimum cost
plan with no risky choices is chosen, with a GUC that enables a customer to ignore
risk when making this choice.  This is not in the spirit of the hoped for simple heuristic,
and it would be heuristic in its classification of plans that are risky, but in the NLJ case 
the cost of an unparameterized NLJ could be deemed risky if the cardinality of the inner 
relation is not 0, 1, or measured N.





В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Drouvot, Bertrand"
Дата:
Сообщение: Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys
Следующее
От: Robert Haas
Дата:
Сообщение: Re: disfavoring unparameterized nested loops