Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Дата
Msg-id 20200825155240.pjuej4i2c6hxrkbl@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Andy Fan <zhihui.fan1213@gmail.com>)
Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Andy Fan <zhihui.fan1213@gmail.com>)
Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Hi,

On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres@anarazel.de> wrote:
> > While I'm against introducing a separate node for the caching, I'm *not*
> > against displaying a different node type when caching is
> > present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
> > Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
> > probably still want to display the 'Cache Key' similar to your example,
> > but I don't see how it'd be better to display it with one more
> > intermediary node.
> 
> ...Well, this is difficult... For the record, in case anyone missed
> it, I'm pretty set on being against doing any node overloading for
> this.  I think it's a pretty horrid modularity violation regardless of
> what text appears in EXPLAIN. I think if we merge these nodes then we
> may as well go further and merge in other simple nodes like LIMIT.

Huh? That doesn't make any sense. LIMIT is applicable to every single
node type with the exception of hash. The caching you talk about is
applicable only to node types that parametrize their sub-nodes, of which
there are exactly two instances.

Limit doesn't shuttle through huge amounts of tuples normally. What you
talk about does.



> Also, just in case anyone is misunderstanding this Andres' argument.
> It's entirely based on the performance impact of having an additional
> node.

Not entirely, no. It's also just that it doesn't make sense to have two
nodes setting parameters that then half magically picked up by a special
subsidiary node type and used as a cache key. This is pseudo modularity,
not real modularity. And makes it harder to display useful information
in explain etc. And makes it harder to e.g. clear the cache in cases we
know that there's no further use of the current cache. At least without
piercing the abstraction veil.


> However, given the correct planner choice, there will never be
> a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.


> I understand that you've voiced your feelings about this, but what I
> want to know is, how strongly do you feel about overloading the node?
> Will you stand in my way if I want to push ahead with the separate
> node?  Will anyone else?

I feel pretty darn strongly about this. If there's plenty people on your
side I'll not stand in your way, but I think this is a bad design based on
pretty flimsy reasons.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ranier Vilela
Дата:
Сообщение: [PATCH] Dereference null return value (NULL_RETURNS) (src/backend/commands/statscmds.c)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Continuing instability in insert-conflict-specconflict test