Re: disfavoring unparameterized nested loops

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: disfavoring unparameterized nested loops
Дата
Msg-id CAH2-Wzkb+9B=F4PQyGXEf91QVT_=EcW9ZvdqKGcbeskFj4ra+g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: disfavoring unparameterized nested loops  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: disfavoring unparameterized nested loops  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Mon, Jun 21, 2021 at 7:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jun 21, 2021 at 6:41 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > For example, in an unparameterized Nested Loop that estimates the
> > outer Path to have 1 row will cost an entire additional inner cost if
> > there are 2 rows.  With Hash Join the cost is just an additional
> > hashtable lookup, which is dead cheap.   I don't know exactly how
> > add_path() would weigh all that up, but it seems to me that I wouldn't
> > take the risk unless I was 100% certain that the Nested Loop's outer
> > Path would only return 1 row exactly, if there was any chance at all
> > it could return more, I'd be picking some other join method.
>
> It seems like everyone agrees that it would be good to do something
> about this problem, but the question is whether it's best to do
> something that tries to be general, or whether we should instead do
> something about this specific case. I favor the latter approach.

I agree with your conclusion, but FWIW I am sympathetic to David's
view too. I certainly understand why he'd naturally want to define the
class of problems that are like this one, to understand what the
boundaries are.

The heuristic that has the optimizer flat out avoids an
unparameterized nested loop join is justified by the belief that
that's fundamentally reckless. Even though we all agree on that much,
I don't know when it stops being reckless and starts being "too risky
for me, but not fundamentally reckless". I think that that's worth
living with, but it isn't very satisfying.

> Risk
> and uncertainty exist all over the place, but dealing with that in a
> general way seems difficult, and maybe unnecessary. Addressing the
> case of unparameterized nest loops specifically seems simpler, because
> it's easier to reason about what the alternatives are. Your last
> sentence here seems right on point to me.

Right. Part of why this is a good idea is that the user is exposed to
so many individual risks and uncertainties. We cannot see any one risk
as existing in a vacuum. It is not the only risk the user will ever
take in the planner -- if it was then it might actually be okay to
allow unparameterized nested loop joins.

A bad unparameterized nested loop join plan has, in a sense, unknown
and unbounded cost/downside. But it is only very slightly faster than
a hash join, by a fixed well understood amount. With enough "trials"
and on a long enough timeline, it will inevitably blow up and cause
the application to grind to a halt. It seems like no amount of fixed,
bounded benefit from "fast unparameterized nested loop joins" could
possibly make up for that. The life of Postgres users would be a lot
better if bad plans were at least "survivable events".

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Maintaining a list of pgindent commits for "git blame" to ignore
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Add version macro to libpq-fe.h