Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Дата
Msg-id CAKU4AWqA-jXLLsNa5CCu8rKLOfe8-WTjUU7U3o3-MvgkH1MdXw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Andres Freund <andres@anarazel.de>)
Ответы Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers


On Tue, Aug 25, 2020 at 11:53 PM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> On Tue, 25 Aug 2020 at 08:26, Andres Freund <andres@anarazel.de> wrote:
> > While I'm against introducing a separate node for the caching, I'm *not*
> > against displaying a different node type when caching is
> > present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
> > Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
> > probably still want to display the 'Cache Key' similar to your example,
> > but I don't see how it'd be better to display it with one more
> > intermediary node.
>
> ...Well, this is difficult... For the record, in case anyone missed
> it, I'm pretty set on being against doing any node overloading for
> this.  I think it's a pretty horrid modularity violation regardless of
> what text appears in EXPLAIN. I think if we merge these nodes then we
> may as well go further and merge in other simple nodes like LIMIT.

Huh? That doesn't make any sense. LIMIT is applicable to every single
node type with the exception of hash. The caching you talk about is
applicable only to node types that parametrize their sub-nodes, of which
there are exactly two instances.

Limit doesn't shuttle through huge amounts of tuples normally. What you
talk about does.



> Also, just in case anyone is misunderstanding this Andres' argument.
> It's entirely based on the performance impact of having an additional
> node.

Not entirely, no. It's also just that it doesn't make sense to have two
nodes setting parameters that then half magically picked up by a special
subsidiary node type and used as a cache key. This is pseudo modularity,
not real modularity. And makes it harder to display useful information
in explain etc. And makes it harder to e.g. clear the cache in cases we
know that there's no further use of the current cache. At least without
piercing the abstraction veil.


> However, given the correct planner choice, there will never be
> a gross slowdown due to having the extra node.

There'll be a significant reduction in increase in performance.

If this is a key blocking factor for this topic, I'd like to do a simple hack
to put the cache function into the subplan node, then do some tests to
show the real difference.  But it is better to decide how much difference
can be thought of as a big difference.  And  for education purposes,  
I'd like to understand where these differences come from.  For my
current knowledge,  my basic idea is it saves some function calls? 

 

> I understand that you've voiced your feelings about this, but what I
> want to know is, how strongly do you feel about overloading the node?
> Will you stand in my way if I want to push ahead with the separate
> node?  Will anyone else?

I feel pretty darn strongly about this. If there's plenty people on your
side I'll not stand in your way, but I think this is a bad design based on
pretty flimsy reasons.


Nice to see the different opinions from two great guys and interesting to
see how this can be resolved at last:) 

--
Best Regards
Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ranier Vilela
Дата:
Сообщение: Out-of-bounds access (ARRAY_VS_SINGLETON) (src/backend/access/nbtree/nbtdedup.c)
Следующее
От: Andy Fan
Дата:
Сообщение: Re: Hybrid Hash/Nested Loop joins and caching results from subplans