Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Дата
Msg-id 20200824202640.zihclnxkmho3f5cd@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Hi,

On 2020-08-19 18:58:11 -0400, Alvaro Herrera wrote:
> On 2020-Aug-19, David Rowley wrote:
> 
> > Andres' suggestion:
> > 
> > regression=# explain (analyze, costs off, timing off, summary off)
> > select count(*) from tenk1 t1 inner join tenk1 t2 on
> > t1.twenty=t2.unique1;
> >                                       QUERY PLAN
> > ---------------------------------------------------------------------------------------
> >  Aggregate (actual rows=1 loops=1)
> >    ->  Nested Loop (actual rows=10000 loops=1)
> >           Cache Key: t1.twenty  Hits: 9980  Misses: 20  Evictions: 0 Overflows: 0
> >         ->  Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
> >         ->  Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
> >               Index Cond: (unique1 = t1.twenty)
> > (6 rows)
> 
> I think it doesn't look terrible in the SubPlan case -- it kinda makes
> sense there -- but for nested loop it appears really strange.

While I'm against introducing a separate node for the caching, I'm *not*
against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.


> On the performance aspect, I wonder what the overhead is, particularly
> considering Tom's point of making these nodes more expensive for cases
> with no caching.

I doubt it, due to being a well predictable branch. But it's also easy
enough to just have a different Exec* function for the caching and
non-caching case, should that turn out to be a problem.


> And also, as the JIT saga continues, aren't we going to get plan trees
> recompiled too, at which point it won't matter much?

That's a fair bit out, I think. And even then it'll only help for
queries that run long enough (eventually also often enough, if we get
prepared statement JITing) to be worth JITing.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: Row estimates for empty tables
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Continuing instability in insert-conflict-specconflict test