Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Дата
Msg-id CAApHDvrzQbc3CsEgz8iYaFsnW155dSjs6D+7aJDzO8p5xBnKXg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
On Tue, 18 Aug 2020 at 21:42, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 11 Aug 2020 at 17:44, Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2020-08-11 17:23:42 +1200, David Rowley wrote:
> > > On Tue, 11 Aug 2020 at 12:21, Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > On 2020-07-09 10:25:14 +1200, David Rowley wrote:
> > > > > On Thu, 9 Jul 2020 at 04:53, Andres Freund <andres@anarazel.de> wrote:
> > > > > > I'm not convinced it's a good idea to introduce a separate executor node
> > > > > > for this. There's a fair bit of overhead in them, and they will only be
> > > > > > below certain types of nodes afaict. It seems like it'd be better to
> > > > > > pull the required calls into the nodes that do parametrized scans of
> > > > > > subsidiary nodes. Have you considered that?
> > > > >
> > > > > I see 41 different node types mentioned in ExecReScan().  I don't
> > > > > really think it would be reasonable to change all those.
> > > >
> > > > But that's because we dispatch ExecReScan mechanically down to every
> > > > single executor node. That doesn't determine how many nodes would need
> > > > to modify to include explicit caching? What am I missing?
> > > >
> > > > Wouldn't we need roughly just nodeNestloop.c and nodeSubplan.c
> > > > integration?
> > >
> > > hmm, I think you're right there about those two node types.  I'm just
> > > not sure you're right about overloading these node types to act as a
> > > cache.
> >
> > I'm not 100% either, to be clear.  I am just acutely aware that adding
> > entire nodes is pretty expensive, and that there's, afaict, no need to
> > have arbitrary (i.e. pointer to function) type callbacks to point to the
> > cache.
>
> Perhaps you're right, but I'm just not convinced of it.  I feel
> there's a certain air of magic involved in any node that has a good
> name and reputation for doing one thing that we suddenly add new
> functionality to which causes it to perform massively differently.
>

[ my long babble removed]

> I'm wondering if anyone else has any thoughts on this?

Just for anyone following along at home. The two variations would
roughly look like:

Current method:

regression=# explain (analyze, costs off, timing off, summary off)
select count(*) from tenk1 t1 inner join tenk1 t2 on
t1.twenty=t2.unique1;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Aggregate (actual rows=1 loops=1)
   ->  Nested Loop (actual rows=10000 loops=1)
         ->  Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
         ->  Result Cache (actual rows=1 loops=10000)
               Cache Key: t1.twenty
               Hits: 9980  Misses: 20  Evictions: 0  Overflows: 0
               ->  Index Scan using tenk1_unique1 on tenk1 t2 (actual
rows=1 loops=20)
                     Index Cond: (unique1 = t1.twenty)
(8 rows)

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select count(*) from tenk1 t1 inner join tenk1 t2 on
t1.twenty=t2.unique1;
                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Aggregate (actual rows=1 loops=1)
   ->  Nested Loop (actual rows=10000 loops=1)
          Cache Key: t1.twenty  Hits: 9980  Misses: 20  Evictions: 0
Overflows: 0
        ->  Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
        ->  Index Scan using tenk1_unique1 on tenk1 t2 (actual rows=1 loops=20)
              Index Cond: (unique1 = t1.twenty)
(6 rows)

and for subplans:

Current method:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
                             QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
   SubPlan 1
     ->  Result Cache (actual rows=1 loops=10000)
           Cache Key: t1.twenty
           Hits: 9980  Misses: 20  Evictions: 0  Overflows: 0
           ->  Aggregate (actual rows=1 loops=20)
                 ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
                       Filter: (t1.twenty = twenty)
                       Rows Removed by Filter: 9500
(9 rows)

Andres' suggestion:

regression=# explain (analyze, costs off, timing off, summary off)
select twenty, (select count(*) from tenk1 t2 where t1.twenty =
t2.twenty) from tenk1 t1;
                             QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on tenk1 t1 (actual rows=10000 loops=1)
   SubPlan 1
    Cache Key: t1.twenty  Hits: 9980  Misses: 20  Evictions: 0  Overflows: 0
    ->  Aggregate (actual rows=1 loops=20)
          ->  Seq Scan on tenk1 t2 (actual rows=500 loops=20)
                Filter: (t1.twenty = twenty)
                Rows Removed by Filter: 9500
(7 rows)

I've spoken to one other person off-list about this and they suggested
that they prefer Andres' suggestion on performance grounds that it's
less overhead to pull tuples through the plan and cheaper executor
startup/shutdowns due to fewer nodes.

I don't object to making the change. I just object to making it only
to put it back again later when someone else speaks up that they'd
prefer to keep nodes modular and not overload them in obscure ways.

So other input is welcome.  Is it too weird to overload SubPlan and
Nested Loop this way?  Or okay to do that if it squeezes out a dozen
or so nanoseconds per tuple?

I did some analysis into the overhead of pulling tuples through an
additional executor node in [1].

David

[1] https://www.postgresql.org/message-id/CAKJS1f9UXdk6ZYyqbJnjFO9a9hyHKGW7B%3DZRh-rxy9qxfPA5Gw%40mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Lu, Chenyang"
Дата:
Сообщение: RE: [PATCH]Fix ja.po error
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: recovering from "found xmin ... from before relfrozenxid ..."