Обсуждение: A performance issue with Memoize

Поиск
Список
Период
Сортировка

A performance issue with Memoize

От
Richard Guo
Дата:
I noticed $subject with the query below.

set enable_memoize to off;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
    (select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
   ->  Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
   ->  Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
         Filter: (t1.two = s.two)
         Rows Removed by Filter: 5000
         ->  Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000 loops=10000)
 Planning Time: 0.666 ms
 Execution Time: 60937.899 ms
(8 rows)

set enable_memoize to on;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
    (select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000 loops=1)
   ->  Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
   ->  Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
         Cache Key: t1.two, t1.two
         Cache Mode: binary
         Hits: 0  Misses: 10000  Evictions: 9999  Overflows: 0  Memory Usage: 1368kB
         ->  Subquery Scan on s (actual time=0.008..5.188 rows=5000 loops=10000)
               Filter: (t1.two = s.two)
               Rows Removed by Filter: 5000
               ->  Seq Scan on tenk1 t2 (actual time=0.004..4.081 rows=10000 loops=10000)
 Planning Time: 0.607 ms
 Execution Time: 124431.388 ms
(12 rows)

The execution time (best of 3) is 124431.388 VS 60937.899 with and
without memoize.

The Memoize runtime stats 'Hits: 0  Misses: 10000  Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different?  Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys.  That is to say, the memoize node's
keyparamids is {1}.

I haven't thought thoroughly about the fix yet.  But one way I'm
thinking is that in create_subqueryscan_plan() we can first add the
subquery's subplan_params to root->curOuterParams, and then replace
outer-relation Vars in scan_clauses afterwards.  That can make us be
able to share the same PARAM_EXEC slot for the same Var that both
belongs to the subquery's uplevel vars and to the NestLoop's
outer-relation vars.  To be concrete, something like attached.

With this change the same query runs much faster and the Memoize runtime
stats looks more normal.

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
    (select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=0.063..21177.476 rows=50000000 loops=1)
   ->  Seq Scan on tenk1 t1 (actual time=0.025..5.415 rows=10000 loops=1)
   ->  Memoize (actual time=0.001..0.234 rows=5000 loops=10000)
         Cache Key: t1.two, t1.two
         Cache Mode: binary
         Hits: 9998  Misses: 2  Evictions: 0  Overflows: 0  Memory Usage: 2735kB
         ->  Subquery Scan on s (actual time=0.009..5.169 rows=5000 loops=2)
               Filter: (t1.two = s.two)
               Rows Removed by Filter: 5000
               ->  Seq Scan on tenk1 t2 (actual time=0.006..4.050 rows=10000 loops=2)
 Planning Time: 0.593 ms
 Execution Time: 22486.621 ms
(12 rows)

Any thoughts?

Thanks
Richard
Вложения

Re: A performance issue with Memoize

От
Pavel Stehule
Дата:


pá 20. 10. 2023 v 13:01 odesílatel Richard Guo <guofenglinux@gmail.com> napsal:
I noticed $subject with the query below.

set enable_memoize to off;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
    (select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
   ->  Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
   ->  Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
         Filter: (t1.two = s.two)
         Rows Removed by Filter: 5000
         ->  Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000 loops=10000)
 Planning Time: 0.666 ms
 Execution Time: 60937.899 ms
(8 rows)

set enable_memoize to on;

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
    (select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000 loops=1)
   ->  Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
   ->  Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
         Cache Key: t1.two, t1.two
         Cache Mode: binary
         Hits: 0  Misses: 10000  Evictions: 9999  Overflows: 0  Memory Usage: 1368kB
         ->  Subquery Scan on s (actual time=0.008..5.188 rows=5000 loops=10000)
               Filter: (t1.two = s.two)
               Rows Removed by Filter: 5000
               ->  Seq Scan on tenk1 t2 (actual time=0.004..4.081 rows=10000 loops=10000)
 Planning Time: 0.607 ms
 Execution Time: 124431.388 ms
(12 rows)

The execution time (best of 3) is 124431.388 VS 60937.899 with and
without memoize.

The Memoize runtime stats 'Hits: 0  Misses: 10000  Evictions: 9999'
seems suspicious to me, so I've looked into it a little bit, and found
that the MemoizeState's keyparamids and its outerPlan's chgParam are
always different, and that makes us have to purge the entire cache each
time we rescan the memoize node.

But why are they always different?  Well, for the query above, we have
two NestLoopParam nodes, one (with paramno 1) is created when we replace
outer-relation Vars in the scan qual 't1.two = s.two', the other one
(with paramno 0) is added from the subquery's subplan_params, which was
created when we replaced uplevel vars with Param nodes for the subquery.
That is to say, the chgParam would be {0, 1}.

When it comes to replace outer-relation Vars in the memoize keys, the
two 't1.two' Vars are both replaced with the NestLoopParam with paramno
1, because it is the first NLP we see in root->curOuterParams that is
equal to the Vars in memoize keys.  That is to say, the memoize node's
keyparamids is {1}.

I haven't thought thoroughly about the fix yet.  But one way I'm
thinking is that in create_subqueryscan_plan() we can first add the
subquery's subplan_params to root->curOuterParams, and then replace
outer-relation Vars in scan_clauses afterwards.  That can make us be
able to share the same PARAM_EXEC slot for the same Var that both
belongs to the subquery's uplevel vars and to the NestLoop's
outer-relation vars.  To be concrete, something like attached.

With this change the same query runs much faster and the Memoize runtime
stats looks more normal.

explain (analyze, costs off)
select * from tenk1 t1 left join lateral
    (select t1.two as t1two, * from tenk1 t2 offset 0) s
on t1.two = s.two;
                                      QUERY PLAN
--------------------------------------------------------------------------------------
 Nested Loop Left Join (actual time=0.063..21177.476 rows=50000000 loops=1)
   ->  Seq Scan on tenk1 t1 (actual time=0.025..5.415 rows=10000 loops=1)
   ->  Memoize (actual time=0.001..0.234 rows=5000 loops=10000)
         Cache Key: t1.two, t1.two
         Cache Mode: binary
         Hits: 9998  Misses: 2  Evictions: 0  Overflows: 0  Memory Usage: 2735kB
         ->  Subquery Scan on s (actual time=0.009..5.169 rows=5000 loops=2)
               Filter: (t1.two = s.two)
               Rows Removed by Filter: 5000
               ->  Seq Scan on tenk1 t2 (actual time=0.006..4.050 rows=10000 loops=2)
 Planning Time: 0.593 ms
 Execution Time: 22486.621 ms
(12 rows)

Any thoughts?

+1

it would be great to fix this problem - I've seen this issue a few times.

Regards

Pavel



Thanks
Richard

Re: A performance issue with Memoize

От
Richard Guo
Дата:

On Fri, Oct 20, 2023 at 6:40 PM Richard Guo <guofenglinux@gmail.com> wrote:
I haven't thought thoroughly about the fix yet.  But one way I'm
thinking is that in create_subqueryscan_plan() we can first add the
subquery's subplan_params to root->curOuterParams, and then replace
outer-relation Vars in scan_clauses afterwards.  That can make us be
able to share the same PARAM_EXEC slot for the same Var that both
belongs to the subquery's uplevel vars and to the NestLoop's
outer-relation vars.  To be concrete, something like attached.

After some more thought, I think this is the right way to fix this
issue.  The idea here is to make sure that the same NLP Var shares the
same PARAM_EXEC slot.  This change can also help to save PARAM_EXEC
slots (which is trivial though since slots are very cheap).

Thanks
Richard

Re: A performance issue with Memoize

От
Richard Guo
Дата:

On Fri, Oct 20, 2023 at 7:43 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
+1

it would be great to fix this problem - I've seen this issue a few times.

Thanks for the input.  I guess this is not rare in the real world.  If
the subquery contains lateral reference to a Var that also appears in
the subquery's join clauses, we'd probably suffer from this issue.

Thanks
Richard

Re: A performance issue with Memoize

От
Andrei Lepikhov
Дата:
On 20/10/2023 17:40, Richard Guo wrote:
> I noticed $subject with the query below.
> 
> set enable_memoize to off;
> 
> explain (analyze, costs off)
> select * from tenk1 t1 left join lateral
>      (select t1.two as t1two, * from tenk1 t2 offset 0) s
> on t1.two = s.two;
>                                       QUERY PLAN
> ------------------------------------------------------------------------------------
>   Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
>     ->  Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
>     ->  Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
>           Filter: (t1.two = s.two)
>           Rows Removed by Filter: 5000
>           ->  Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000 
> loops=10000)
>   Planning Time: 0.666 ms
>   Execution Time: 60937.899 ms
> (8 rows)
> 
> set enable_memoize to on;
> 
> explain (analyze, costs off)
> select * from tenk1 t1 left join lateral
>      (select t1.two as t1two, * from tenk1 t2 offset 0) s
> on t1.two = s.two;
>                                          QUERY PLAN
> ------------------------------------------------------------------------------------------
>   Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000 
> loops=1)
>     ->  Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
>     ->  Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
>           Cache Key: t1.two, t1.two
>           Cache Mode: binary
>           Hits: 0  Misses: 10000  Evictions: 9999  Overflows: 0  Memory 
> Usage: 1368kB
>           ->  Subquery Scan on s (actual time=0.008..5.188 rows=5000 
> loops=10000)
>                 Filter: (t1.two = s.two)
>                 Rows Removed by Filter: 5000
>                 ->  Seq Scan on tenk1 t2 (actual time=0.004..4.081 
> rows=10000 loops=10000)
>   Planning Time: 0.607 ms
>   Execution Time: 124431.388 ms
> (12 rows)
> 
> The execution time (best of 3) is 124431.388 VS 60937.899 with and
> without memoize.
> 
> The Memoize runtime stats 'Hits: 0  Misses: 10000  Evictions: 9999'
> seems suspicious to me, so I've looked into it a little bit, and found
> that the MemoizeState's keyparamids and its outerPlan's chgParam are
> always different, and that makes us have to purge the entire cache each
> time we rescan the memoize node.
> 
> But why are they always different?  Well, for the query above, we have
> two NestLoopParam nodes, one (with paramno 1) is created when we replace
> outer-relation Vars in the scan qual 't1.two = s.two', the other one
> (with paramno 0) is added from the subquery's subplan_params, which was
> created when we replaced uplevel vars with Param nodes for the subquery.
> That is to say, the chgParam would be {0, 1}.
> 
> When it comes to replace outer-relation Vars in the memoize keys, the
> two 't1.two' Vars are both replaced with the NestLoopParam with paramno
> 1, because it is the first NLP we see in root->curOuterParams that is
> equal to the Vars in memoize keys.  That is to say, the memoize node's
> keyparamids is {1}.
> ...
> Any thoughts?

Do you've thought about the case, fixed with the commit 1db5667? As I 
see, that bugfix still isn't covered by regression tests. Could your 
approach of a PARAM_EXEC slot reusing break that case?

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: A performance issue with Memoize

От
Richard Guo
Дата:

On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
Do you've thought about the case, fixed with the commit 1db5667? As I
see, that bugfix still isn't covered by regression tests. Could your
approach of a PARAM_EXEC slot reusing break that case?

Hm, I don't think so.  The issue fixed by commit 1db5667 was caused by
sharing PARAM_EXEC slots between different levels of NestLoop.  AFAICS
it's safe to share PARAM_EXEC slots within the same level of NestLoop.

The change here is about sharing PARAM_EXEC slots between subquery's
subplan_params and outer-relation variables, which happens within the
same level of NestLoop.

Actually, even without this change, we'd still share PARAM_EXEC slots
between subquery's subplan_params and outer-relation variables in some
cases.  As an example, consider

explain (costs off)
select * from t t1 left join
        (t t2 left join
                lateral (select t1.a as t1a, t2.a as t2a, * from t t3) s
        on t2.b = s.b)
on t1.b = s.b and t1.a = t2.a;
                      QUERY PLAN
-------------------------------------------------------
 Nested Loop Left Join
   ->  Seq Scan on t t1
   ->  Nested Loop
         Join Filter: (t1.a = t2.a)
         ->  Seq Scan on t t2
         ->  Subquery Scan on s
               Filter: ((t1.b = s.b) AND (t2.b = s.b))
               ->  Seq Scan on t t3
(8 rows)

For outer-relation Var 't1.a' from qual 't1.a = t2.a', it shares
PARAM_EXEC slot 0 with the PlannerParamItem for 't1.a' within the
subquery (from its targetlist).

Did you notice a case that the change here breaks?

Hi Tom, could you share your insights on this issue and the proposed
fix?

Thanks
Richard

Re: A performance issue with Memoize

От
Andrei Lepikhov
Дата:
On 30/10/2023 14:55, Richard Guo wrote:
> 
> On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov 
> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
> 
>     Do you've thought about the case, fixed with the commit 1db5667? As I
>     see, that bugfix still isn't covered by regression tests. Could your
>     approach of a PARAM_EXEC slot reusing break that case?
> 
> 
> Hm, I don't think so.  The issue fixed by commit 1db5667 was caused by
> sharing PARAM_EXEC slots between different levels of NestLoop.  AFAICS
> it's safe to share PARAM_EXEC slots within the same level of NestLoop.
> 
> The change here is about sharing PARAM_EXEC slots between subquery's
> subplan_params and outer-relation variables, which happens within the
> same level of NestLoop.
> ...
> Did you notice a case that the change here breaks?
> 
> Hi Tom, could you share your insights on this issue and the proposed
> fix?

I think your patch works correctly so far. I mentioned the commit 
1db5667 because, as I see, the origin of the problem was parallel 
workers. I have thought about pushing Memoize down to a parallel worker 
and couldn't imagine whether such a solution would be correct.
Sorry if I disturbed you in vain.

-- 
regards,
Andrei Lepikhov
Postgres Professional




Re: A performance issue with Memoize

От
Richard Guo
Дата:

On Tue, Oct 31, 2023 at 1:36 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
On 30/10/2023 14:55, Richard Guo wrote:
>
> On Thu, Oct 26, 2023 at 12:07 PM Andrei Lepikhov
> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
>
>     Do you've thought about the case, fixed with the commit 1db5667? As I
>     see, that bugfix still isn't covered by regression tests. Could your
>     approach of a PARAM_EXEC slot reusing break that case?
>
>
> Hm, I don't think so.  The issue fixed by commit 1db5667 was caused by
> sharing PARAM_EXEC slots between different levels of NestLoop.  AFAICS
> it's safe to share PARAM_EXEC slots within the same level of NestLoop.
>
> The change here is about sharing PARAM_EXEC slots between subquery's
> subplan_params and outer-relation variables, which happens within the
> same level of NestLoop.
> ...
> Did you notice a case that the change here breaks?
>
> Hi Tom, could you share your insights on this issue and the proposed
> fix?

I think your patch works correctly so far. I mentioned the commit
1db5667 because, as I see, the origin of the problem was parallel
workers. I have thought about pushing Memoize down to a parallel worker
and couldn't imagine whether such a solution would be correct.
Sorry if I disturbed you in vain.

Thanks for mentioning commit 1db5667, which brings my attention to more
aspects about the PARAM_EXEC mechanism.  I don't think the discussion is
in vain.  It helps a lot.

Thanks for looking into this patch.

Thanks
Richard

Re: A performance issue with Memoize

От
Arne Roland
Дата:
Hi Richard,

I can tell this a real world problem. I have seen this multiple times in 
production.

The fix seems surprisingly simple.

I hope my questions here aren't completely off. I still struggle to 
think about the implications.

I wonder, if there is any stuff we are breaking by bluntly forgetting 
about the subplan params. Maybe some table valued function scan within a 
subquery scan? Or something about casts on a join condition, that could 
be performed differently?

I wasn't able to construct a problem case. I might be just missing 
context here. But I am not yet fully convinced whether this is safe to 
do in all cases.

Regards
Arne





Re: A performance issue with Memoize

От
David Rowley
Дата:
On Fri, 20 Oct 2023 at 23:40, Richard Guo <guofenglinux@gmail.com> wrote:
> The Memoize runtime stats 'Hits: 0  Misses: 10000  Evictions: 9999'
> seems suspicious to me, so I've looked into it a little bit, and found
> that the MemoizeState's keyparamids and its outerPlan's chgParam are
> always different, and that makes us have to purge the entire cache each
> time we rescan the memoize node.
>
> But why are they always different?  Well, for the query above, we have
> two NestLoopParam nodes, one (with paramno 1) is created when we replace
> outer-relation Vars in the scan qual 't1.two = s.two', the other one
> (with paramno 0) is added from the subquery's subplan_params, which was
> created when we replaced uplevel vars with Param nodes for the subquery.
> That is to say, the chgParam would be {0, 1}.
>
> When it comes to replace outer-relation Vars in the memoize keys, the
> two 't1.two' Vars are both replaced with the NestLoopParam with paramno
> 1, because it is the first NLP we see in root->curOuterParams that is
> equal to the Vars in memoize keys.  That is to say, the memoize node's
> keyparamids is {1}.

I see the function calls were put this way around in 5ebaaa494
(Implement SQL-standard LATERAL subqueries.), per:

@ -1640,6 +1641,7 @@ create_subqueryscan_plan(PlannerInfo *root, Path
*best_path,
    {
        scan_clauses = (List *)
            replace_nestloop_params(root, (Node *) scan_clauses);
+       identify_nestloop_extparams(root, best_path->parent->subplan);
    }

(identify_nestloop_extparams was later renamed to
process_subquery_nestloop_params in 46c508fbc.)

I think fixing it your way makes sense.  I don't really see any reason
why we should have two. However...

Another way it *could* be fixed would be to get rid of pull_paramids()
and change create_memoize_plan() to set keyparamids to all the param
IDs that match are equal() to each param_exprs.  That way nodeMemoize
wouldn't purge the cache as we'd know the changing param is accounted
for in the cache.  For the record, I don't think that's better, but it
scares me a bit less as I don't know what other repercussions there
are of applying your patch to get rid of the duplicate
NestLoopParam.paramval.

I'd feel better about doing it your way if Tom could comment on if
there was a reason he put the function calls that way around in
5ebaaa494.

I also feel like we might be getting a bit close to the minor version
releases to be adjusting this stuff in back branches.

David



Re: A performance issue with Memoize

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> I'd feel better about doing it your way if Tom could comment on if
> there was a reason he put the function calls that way around in
> 5ebaaa494.

Apologies for not having noticed this thread before.  I'm taking
a look at it now.  However, while sniffing around this I found
what seems like an oversight in paramassign.c's
assign_param_for_var(): it says it should compare all the same
fields as _equalVar except for varlevelsup, but it's failing to
compare varnullingrels.  Is that a bug?  It's conceivable that
it's not possible to get here with varnullingrels different and
all else the same, but I don't feel good about that proposition.

I tried adding

@@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var)
                 pvar->vartype == var->vartype &&
                 pvar->vartypmod == var->vartypmod &&
                 pvar->varcollid == var->varcollid)
+            {
+                Assert(bms_equal(pvar->varnullingrels, var->varnullingrels));
                 return pitem->paramId;
+            }
         }
     }

This triggers no failures in the regression tests, but we know
how little that proves.

Anyway, that's just a side observation unrelated to the problem
at hand.  More later.

            regards, tom lane



Re: A performance issue with Memoize

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> I think fixing it your way makes sense.  I don't really see any reason
> why we should have two. However...

> Another way it *could* be fixed would be to get rid of pull_paramids()
> and change create_memoize_plan() to set keyparamids to all the param
> IDs that match are equal() to each param_exprs.  That way nodeMemoize
> wouldn't purge the cache as we'd know the changing param is accounted
> for in the cache.  For the record, I don't think that's better, but it
> scares me a bit less as I don't know what other repercussions there
> are of applying your patch to get rid of the duplicate
> NestLoopParam.paramval.

> I'd feel better about doing it your way if Tom could comment on if
> there was a reason he put the function calls that way around in
> 5ebaaa494.

I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c.  However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not.  So flipping the order makes sense.
I'd change the comment though, maybe to

    /*
     * Replace any outer-relation variables with nestloop params.
     *
     * We must provide nestloop params for both lateral references of
     * the subquery and outer vars in the scan_clauses.  It's better
     * to assign the former first, because that code path requires
     * specific param IDs, while replace_nestloop_params can adapt
     * to the IDs assigned by process_subquery_nestloop_params.
     * This avoids possibly duplicating nestloop params when the
     * same Var is needed for both reasons.
     */

However ... it seems like we're not out of the woods yet.  Why
is Richard's proposed test case still showing

+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two

Seems like there is missing de-duplication logic, or something.

> I also feel like we might be getting a bit close to the minor version
> releases to be adjusting this stuff in back branches.

Yeah, I'm not sure I would change this in the back branches.

            regards, tom lane



Re: A performance issue with Memoize

От
Richard Guo
Дата:

On Fri, Jan 26, 2024 at 2:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I'm fairly sure I thought it wouldn't matter because of the Param
de-duplication done in paramassign.c.  However, Richard's example
shows that's not so, because process_subquery_nestloop_params is
picky about the param ID assigned to a particular Var while
replace_nestloop_params is not.  So flipping the order makes sense.
I'd change the comment though, maybe to

    /*
     * Replace any outer-relation variables with nestloop params.
     *
     * We must provide nestloop params for both lateral references of
     * the subquery and outer vars in the scan_clauses.  It's better
     * to assign the former first, because that code path requires
     * specific param IDs, while replace_nestloop_params can adapt
     * to the IDs assigned by process_subquery_nestloop_params.
     * This avoids possibly duplicating nestloop params when the
     * same Var is needed for both reasons.
     */

+1.  It's much better.
 
However ... it seems like we're not out of the woods yet.  Why
is Richard's proposed test case still showing

+         ->  Memoize (actual rows=5000 loops=N)
+               Cache Key: t1.two, t1.two

Seems like there is missing de-duplication logic, or something.

When we collect the cache keys in paraminfo_get_equal_hashops() we
search param_info's ppi_clauses as well as innerrel's lateral_vars for
outer expressions.  We do not perform de-duplication on the collected
outer expressions there.  In my proposed test case, the same Var
't1.two' appears both in the param_info->ppi_clauses and in the
innerrel->lateral_vars, so we see two identical cache keys in the plan.
I noticed this before and wondered if we should do de-duplication on the
cache keys, but somehow I did not chase this to the ground.

Thanks
Richard

Re: A performance issue with Memoize

От
David Rowley
Дата:
On Fri, 26 Jan 2024 at 07:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > I'd feel better about doing it your way if Tom could comment on if
> > there was a reason he put the function calls that way around in
> > 5ebaaa494.
>
> I'm fairly sure I thought it wouldn't matter because of the Param
> de-duplication done in paramassign.c.  However, Richard's example
> shows that's not so, because process_subquery_nestloop_params is
> picky about the param ID assigned to a particular Var while
> replace_nestloop_params is not.  So flipping the order makes sense.

Makes sense.

I've adjusted the comments to what you mentioned and also leaned out
the pretty expensive test case to something that'll run much faster
and pushed the result.

> However ... it seems like we're not out of the woods yet.  Why
> is Richard's proposed test case still showing
>
> +         ->  Memoize (actual rows=5000 loops=N)
> +               Cache Key: t1.two, t1.two
>
> Seems like there is missing de-duplication logic, or something.

This seems separate and isn't quite causing the same problems as what
Richard wants to fix so I didn't touch this for now.

David



Re: A performance issue with Memoize

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> I've adjusted the comments to what you mentioned and also leaned out
> the pretty expensive test case to something that'll run much faster
> and pushed the result.

+1, I was wondering if the test could be cheaper.  It wasn't horrid
as Richard had it, but core regression tests add up over time.

>> However ... it seems like we're not out of the woods yet.  Why
>> is Richard's proposed test case still showing
>> +         ->  Memoize (actual rows=5000 loops=N)
>> +               Cache Key: t1.two, t1.two
>> Seems like there is missing de-duplication logic, or something.

> This seems separate and isn't quite causing the same problems as what
> Richard wants to fix so I didn't touch this for now.

Fair enough, but I think it might be worth pursuing later.

            regards, tom lane



Re: A performance issue with Memoize

От
David Rowley
Дата:
On Fri, 26 Jan 2024 at 16:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> However ... it seems like we're not out of the woods yet.  Why
> >> is Richard's proposed test case still showing
> >> +         ->  Memoize (actual rows=5000 loops=N)
> >> +               Cache Key: t1.two, t1.two
> >> Seems like there is missing de-duplication logic, or something.
>
> > This seems separate and isn't quite causing the same problems as what
> > Richard wants to fix so I didn't touch this for now.
>
> Fair enough, but I think it might be worth pursuing later.

Here's a patch for that.

David

Вложения

Re: A performance issue with Memoize

От
Richard Guo
Дата:

On Fri, Jan 26, 2024 at 1:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Apologies for not having noticed this thread before.  I'm taking
a look at it now.  However, while sniffing around this I found
what seems like an oversight in paramassign.c's
assign_param_for_var(): it says it should compare all the same
fields as _equalVar except for varlevelsup, but it's failing to
compare varnullingrels.  Is that a bug?  It's conceivable that
it's not possible to get here with varnullingrels different and
all else the same, but I don't feel good about that proposition.

I tried adding

@@ -91,7 +91,10 @@ assign_param_for_var(PlannerInfo *root, Var *var)
                 pvar->vartype == var->vartype &&
                 pvar->vartypmod == var->vartypmod &&
                 pvar->varcollid == var->varcollid)
+            {
+                Assert(bms_equal(pvar->varnullingrels, var->varnullingrels));
                 return pitem->paramId;
+            }
         }
     }

Yeah, I think it should be safe to assert that the varnullingrels is
equal here.  The Var is supposed to be an upper-level Var, and two same
such Vars should not have different varnullingrels at this point,
although the varnullingrels might be adjusted later in
identify_current_nestloop_params according to which form of identity 3
we end up applying.

Thanks
Richard

Re: A performance issue with Memoize

От
Richard Guo
Дата:

On Fri, Jan 26, 2024 at 12:18 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 26 Jan 2024 at 16:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> However ... it seems like we're not out of the woods yet.  Why
> >> is Richard's proposed test case still showing
> >> +         ->  Memoize (actual rows=5000 loops=N)
> >> +               Cache Key: t1.two, t1.two
> >> Seems like there is missing de-duplication logic, or something.
>
> > This seems separate and isn't quite causing the same problems as what
> > Richard wants to fix so I didn't touch this for now.
>
> Fair enough, but I think it might be worth pursuing later.

Here's a patch for that.

At first I wondered if we should assume that the same param expr must
have the same equality operator. If not, we should also check the
operator to tell if the cache key is a duplicate, like

-           if (!list_member(*param_exprs, expr))
+           if (!list_member(*param_exprs, expr) ||
+               !list_member_oid(*operators, hasheqoperator))

But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator
is set, it seems we can assume that: the operator is from the type cache
entry which is fetched according to the expr datatype.

So I think the patch makes sense.  +1.

Thanks
Richard

Re: A performance issue with Memoize

От
David Rowley
Дата:
On Fri, 26 Jan 2024 at 19:03, Richard Guo <guofenglinux@gmail.com> wrote:
> At first I wondered if we should assume that the same param expr must
> have the same equality operator. If not, we should also check the
> operator to tell if the cache key is a duplicate, like
>
> -           if (!list_member(*param_exprs, expr))
> +           if (!list_member(*param_exprs, expr) ||
> +               !list_member_oid(*operators, hasheqoperator))

hmm, if that were the case you wouldn't do it that way. You'd need to
forboth() and look for an item in both lists matching the search.

> But after looking at how rinfo->left_hasheqoperator/right_hasheqoperator
> is set, it seems we can assume that: the operator is from the type cache
> entry which is fetched according to the expr datatype.

Yip.

> So I think the patch makes sense.  +1.

Thanks for reviewing. I've pushed the patch.

David



Re: A performance issue with Memoize

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> I've adjusted the comments to what you mentioned and also leaned out
> the pretty expensive test case to something that'll run much faster
> and pushed the result.

drongo and fairywren are consistently failing the test case added
by this commit.  I'm not quite sure why the behavior of Memoize
would be platform-specific when we're dealing with integers,
but ...

            regards, tom lane



Re: A performance issue with Memoize

От
David Rowley
Дата:
On Sat, 27 Jan 2024 at 09:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > I've adjusted the comments to what you mentioned and also leaned out
> > the pretty expensive test case to something that'll run much faster
> > and pushed the result.
>
> drongo and fairywren are consistently failing the test case added
> by this commit.  I'm not quite sure why the behavior of Memoize
> would be platform-specific when we're dealing with integers,
> but ...

Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
rather than "2"?

Looking at the code in fmtfloat(), we fallback on the built-in snprintf.

I can try changing the unique1 < 5 to unique1 < 4 to see that's more stable.

David



Re: A performance issue with Memoize

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Sat, 27 Jan 2024 at 09:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> drongo and fairywren are consistently failing the test case added
>> by this commit.  I'm not quite sure why the behavior of Memoize
>> would be platform-specific when we're dealing with integers,
>> but ...

> Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
> rather than "2"?
> Looking at the code in fmtfloat(), we fallback on the built-in snprintf.

Maybe ... I don't have a better theory.

> I can try changing the unique1 < 5 to unique1 < 4 to see that's more stable.

Worth a try.

            regards, tom lane



Re: A performance issue with Memoize

От
Alexander Lakhin
Дата:
Hello,

27.01.2024 00:09, Tom Lane wrote:
> David Rowley <dgrowleyml@gmail.com> writes:
>> On Sat, 27 Jan 2024 at 09:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> drongo and fairywren are consistently failing the test case added
>>> by this commit.  I'm not quite sure why the behavior of Memoize
>>> would be platform-specific when we're dealing with integers,
>>> but ...
>> Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
>> rather than "2"?
>> Looking at the code in fmtfloat(), we fallback on the built-in snprintf.
> Maybe ... I don't have a better theory.

FWIW, I've found where this behaviour is documented:

https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-sprintf-l-swprintf-swprintf-l-swprintf-l?view=msvc-170

(I've remembered a case with test/sql/partition_prune from 2020, where
sprintf on Windows worked the other way.)


Best regards,
Alexander



Re: A performance issue with Memoize

От
Anthonin Bonnefoy
Дата:
Hi,

I've seen a similar issue with the following query (tested on the current head):

EXPLAIN ANALYZE SELECT * FROM tenk1 t1
LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2
ON t1.hundred = t2.hundred WHERE t1.hundred < 5;
                                                               QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=8.46..1495.10 rows=50000 width=256)
(actual time=0.860..111.013 rows=50000 loops=1)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=8.16..376.77 rows=500
width=244) (actual time=0.798..1.418 rows=500 loops=1)
         Recheck Cond: (hundred < 5)
         Heap Blocks: exact=263
         ->  Bitmap Index Scan on tenk1_hundred  (cost=0.00..8.04
rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1)
               Index Cond: (hundred < 5)
   ->  Memoize  (cost=0.30..4.89 rows=100 width=12) (actual
time=0.009..0.180 rows=100 loops=500)
         Cache Key: t1.hundred
         Cache Mode: logical
         Hits: 0  Misses: 500  Evictions: 499  Overflows: 0  Memory Usage: 5kB
         ->  Index Scan using tenk2_hundred on tenk2  (cost=0.29..4.88
rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500)
               Index Cond: (hundred = t1.hundred)
 Planning Time: 0.661 ms
 Execution Time: 113.076 ms
(14 rows)

The memoize's cache key only uses t1.hundred while the nested loop has
two changed parameters: the lateral var t1.two and t1.hundred. This
leads to a chgParam that is always different and the cache is purged
on each rescan.

Regards,
Anthonin

On Sat, Jan 27, 2024 at 5:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>
> Hello,
>
> 27.01.2024 00:09, Tom Lane wrote:
> > David Rowley <dgrowleyml@gmail.com> writes:
> >> On Sat, 27 Jan 2024 at 09:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> drongo and fairywren are consistently failing the test case added
> >>> by this commit.  I'm not quite sure why the behavior of Memoize
> >>> would be platform-specific when we're dealing with integers,
> >>> but ...
> >> Maybe snprintf(buf, "%.*f", 0, 5.0 / 2.0); results in "3" on those
> >> rather than "2"?
> >> Looking at the code in fmtfloat(), we fallback on the built-in snprintf.
> > Maybe ... I don't have a better theory.
>
> FWIW, I've found where this behaviour is documented:
>
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/sprintf-sprintf-l-swprintf-swprintf-l-swprintf-l?view=msvc-170
>
> (I've remembered a case with test/sql/partition_prune from 2020, where
> sprintf on Windows worked the other way.)
>
>
> Best regards,
> Alexander
>
>



Re: A performance issue with Memoize

От
Richard Guo
Дата:

On Thu, Feb 1, 2024 at 3:43 PM Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> wrote:
Hi,

I've seen a similar issue with the following query (tested on the current head):

EXPLAIN ANALYZE SELECT * FROM tenk1 t1
LEFT JOIN LATERAL (SELECT t1.two, tenk2.hundred, tenk2.two FROM tenk2) t2
ON t1.hundred = t2.hundred WHERE t1.hundred < 5;
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=8.46..1495.10 rows=50000 width=256)
(actual time=0.860..111.013 rows=50000 loops=1)
   ->  Bitmap Heap Scan on tenk1 t1  (cost=8.16..376.77 rows=500
width=244) (actual time=0.798..1.418 rows=500 loops=1)
         Recheck Cond: (hundred < 5)
         Heap Blocks: exact=263
         ->  Bitmap Index Scan on tenk1_hundred  (cost=0.00..8.04
rows=500 width=0) (actual time=0.230..0.230 rows=500 loops=1)
               Index Cond: (hundred < 5)
   ->  Memoize  (cost=0.30..4.89 rows=100 width=12) (actual
time=0.009..0.180 rows=100 loops=500)
         Cache Key: t1.hundred
         Cache Mode: logical
         Hits: 0  Misses: 500  Evictions: 499  Overflows: 0  Memory Usage: 5kB
         ->  Index Scan using tenk2_hundred on tenk2  (cost=0.29..4.88
rows=100 width=12) (actual time=0.007..0.124 rows=100 loops=500)
               Index Cond: (hundred = t1.hundred)
 Planning Time: 0.661 ms
 Execution Time: 113.076 ms
(14 rows)

The memoize's cache key only uses t1.hundred while the nested loop has
two changed parameters: the lateral var t1.two and t1.hundred. This
leads to a chgParam that is always different and the cache is purged
on each rescan.

Thanks for the report!  This issue is caused by that we fail to check
PHVs for lateral references, and hence cannot know that t1.two should
also be included in the cache keys.

I reported exactly the same issue in [1], and verified that it can be
fixed by the patch in [2] (which seems to require a rebase).

[1] https://www.postgresql.org/message-id/CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun%2BVtrowcmxb41CzA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAMbWs49%2BCjoy0S0xkCRDcHXGHvsYLOdvr9jq9OTONOBnsgzXOg%40mail.gmail.com

Thanks
Richard