Hi,
On 2020-08-19 18:58:11 -0400, Alvaro Herrera wrote:
> On 2020-Aug-19, David Rowley wrote:
>
> > Andres' suggestion:
> >
> > regression=# explain (analyze, costs off, timing off, summary off)
> > select count(*) from tenk1 t1 inner join tenk1 t2 on
> > t1.twenty=t2.unique1;
> > QUERY PLAN
> > ---------------------------------------------------------------------------------------
> > Aggregate (actual rows=1 loops=1)
> > -> Nested Loop (actual rows=10000 loops=1)
> > Cache Key: t1.twenty Hits: 9980 Misses: 20 Evictions: 0 Overflows: 0
> > -> Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
> > -> Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
> > Index Cond: (unique1 = t1.twenty)
> > (6 rows)
>
> I think it doesn't look terrible in the SubPlan case -- it kinda makes
> sense there -- but for nested loop it appears really strange.
While I'm against introducing a separate node for the caching, I'm *not*
against displaying a different node type when caching is
present. E.g. it'd be perfectly reasonable from my POV to have a 'Cached
Nested Loop' join and a plain 'Nested Loop' node in the above node. I'd
probably still want to display the 'Cache Key' similar to your example,
but I don't see how it'd be better to display it with one more
intermediary node.
> On the performance aspect, I wonder what the overhead is, particularly
> considering Tom's point of making these nodes more expensive for cases
> with no caching.
I doubt it, due to being a well predictable branch. But it's also easy
enough to just have a different Exec* function for the caching and
non-caching case, should that turn out to be a problem.
> And also, as the JIT saga continues, aren't we going to get plan trees
> recompiled too, at which point it won't matter much?
That's a fair bit out, I think. And even then it'll only help for
queries that run long enough (eventually also often enough, if we get
prepared statement JITing) to be worth JITing.
Greetings,
Andres Freund