Re: A performance issue with Memoize

Поиск
Список
Период
Сортировка
От Andrei Lepikhov
Тема Re: A performance issue with Memoize
Дата
Msg-id 1d55df6d-92cb-4bbd-9a4c-0d05bb6bc91d@postgrespro.ru
обсуждение исходный текст
Ответ на A performance issue with Memoize  (Richard Guo <guofenglinux@gmail.com>)
Ответы Re: A performance issue with Memoize  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers
On 20/10/2023 17:40, Richard Guo wrote:
> I noticed $subject with the query below.
> 
> set enable_memoize to off;
> 
> explain (analyze, costs off)
> select * from tenk1 t1 left join lateral
>      (select t1.two as t1two, * from tenk1 t2 offset 0) s
> on t1.two = s.two;
>                                       QUERY PLAN
> ------------------------------------------------------------------------------------
>   Nested Loop Left Join (actual time=0.050..59578.053 rows=50000000 loops=1)
>     ->  Seq Scan on tenk1 t1 (actual time=0.027..2.703 rows=10000 loops=1)
>     ->  Subquery Scan on s (actual time=0.004..4.819 rows=5000 loops=10000)
>           Filter: (t1.two = s.two)
>           Rows Removed by Filter: 5000
>           ->  Seq Scan on tenk1 t2 (actual time=0.002..3.834 rows=10000 
> loops=10000)
>   Planning Time: 0.666 ms
>   Execution Time: 60937.899 ms
> (8 rows)
> 
> set enable_memoize to on;
> 
> explain (analyze, costs off)
> select * from tenk1 t1 left join lateral
>      (select t1.two as t1two, * from tenk1 t2 offset 0) s
> on t1.two = s.two;
>                                          QUERY PLAN
> ------------------------------------------------------------------------------------------
>   Nested Loop Left Join (actual time=0.061..122684.607 rows=50000000 
> loops=1)
>     ->  Seq Scan on tenk1 t1 (actual time=0.026..3.367 rows=10000 loops=1)
>     ->  Memoize (actual time=0.011..9.821 rows=5000 loops=10000)
>           Cache Key: t1.two, t1.two
>           Cache Mode: binary
>           Hits: 0  Misses: 10000  Evictions: 9999  Overflows: 0  Memory 
> Usage: 1368kB
>           ->  Subquery Scan on s (actual time=0.008..5.188 rows=5000 
> loops=10000)
>                 Filter: (t1.two = s.two)
>                 Rows Removed by Filter: 5000
>                 ->  Seq Scan on tenk1 t2 (actual time=0.004..4.081 
> rows=10000 loops=10000)
>   Planning Time: 0.607 ms
>   Execution Time: 124431.388 ms
> (12 rows)
> 
> The execution time (best of 3) is 124431.388 VS 60937.899 with and
> without memoize.
> 
> The Memoize runtime stats 'Hits: 0  Misses: 10000  Evictions: 9999'
> seems suspicious to me, so I've looked into it a little bit, and found
> that the MemoizeState's keyparamids and its outerPlan's chgParam are
> always different, and that makes us have to purge the entire cache each
> time we rescan the memoize node.
> 
> But why are they always different?  Well, for the query above, we have
> two NestLoopParam nodes, one (with paramno 1) is created when we replace
> outer-relation Vars in the scan qual 't1.two = s.two', the other one
> (with paramno 0) is added from the subquery's subplan_params, which was
> created when we replaced uplevel vars with Param nodes for the subquery.
> That is to say, the chgParam would be {0, 1}.
> 
> When it comes to replace outer-relation Vars in the memoize keys, the
> two 't1.two' Vars are both replaced with the NestLoopParam with paramno
> 1, because it is the first NLP we see in root->curOuterParams that is
> equal to the Vars in memoize keys.  That is to say, the memoize node's
> keyparamids is {1}.
> ...
> Any thoughts?

Do you've thought about the case, fixed with the commit 1db5667? As I 
see, that bugfix still isn't covered by regression tests. Could your 
approach of a PARAM_EXEC slot reusing break that case?

-- 
regards,
Andrei Lepikhov
Postgres Professional




В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Making aggregate deserialization (and WAL receive) functions slightly faster
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Open a streamed block for transactional messages during decoding