Re: Check lateral references within PHVs for memoize cache keys

Поиск

Список

Период

Сортировка

От	Richard Guo
Тема	Re: Check lateral references within PHVs for memoize cache keys
Дата	25 декабря 2023 г. 10:01:51
Msg-id	CAMbWs4_imG5C8rXt7xdU7zf6whUDc2rdDun+Vtrowcmxb41CzA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Check lateral references within PHVs for memoize cache keys (Richard Guo <guofenglinux@gmail.com>)
Ответы	Re: Check lateral references within PHVs for memoize cache keys (Richard Guo <guofenglinux@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, Jul 13, 2023 at 3:12 PM Richard Guo <guofenglinux@gmail.com> wrote:

So I'm wondering if it'd be better that we move all this logic of
computing additional lateral references within PHVs to get_memoize_path,
where we can examine only PHVs that are evaluated at innerrel. And
considering that these lateral refs are only used by Memoize, it seems
more sensible to compute them there. But I'm a little worried that
doing this would make get_memoize_path too expensive.

Please see v4 patch for this change.

I'd like to add that not checking PHVs for lateral references can lead
to performance regressions with Memoize node. For instance,

-- by default, enable_memoize is on
regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
-------------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.028..105245.547 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.011..3.760 rows=10000 loops=1)
-> Memoize (actual time=0.010..8.051 rows=5000 loops=10000)
Cache Key: t1.two
Cache Mode: logical
Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0 Memory Usage: 1368kB
-> Seq Scan on tenk1 t2 (actual time=0.004..3.594 rows=5000 loops=10000)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 1.943 ms
Execution Time: 106806.043 ms
(11 rows)

-- turn enable_memoize off
regression=# set enable_memoize to off;
SET
regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.048..44831.707 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.026..2.340 rows=10000 loops=1)
-> Seq Scan on tenk1 t2 (actual time=0.002..3.282 rows=5000 loops=10000)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 0.641 ms
Execution Time: 46472.609 ms
(7 rows)

As we can see, when Memoize enabled (which is the default setting), the
execution time increases by around 129.83%, indicating a significant
performance regression.

This is caused by that we fail to realize that 't1.four', which is from
the PHV, should be included in the cache keys. And that makes us have
to purge the entire cache every time we get a new outer tuple. This is
also implied by the abnormal Memoize runtime stats:

Hits: 0 Misses: 10000 Evictions: 9999 Overflows: 0

This regression can be fixed by the patch here. After applying the v4
patch, 't1.four' is added into the cache keys, and the same query runs
much faster.

regression=# explain (analyze, costs off) select * from tenk1 t1 left join lateral (select *, t1.four as x from tenk1 t2) s on t1.two = s.two;
QUERY PLAN
---------------------------------------------------------------------------------
Nested Loop Left Join (actual time=0.060..20446.004 rows=50000000 loops=1)
-> Seq Scan on tenk1 t1 (actual time=0.027..5.845 rows=10000 loops=1)
-> Memoize (actual time=0.001..0.209 rows=5000 loops=10000)
Cache Key: t1.two, t1.four
Cache Mode: binary
Hits: 9996 Misses: 4 Evictions: 0 Overflows: 0 Memory Usage: 5470kB
-> Seq Scan on tenk1 t2 (actual time=0.005..3.659 rows=5000 loops=4)
Filter: (t1.two = two)
Rows Removed by Filter: 5000
Planning Time: 0.579 ms
Execution Time: 21756.598 ms
(11 rows)

Comparing the first plan and the third plan, this query runs ~5 times
faster.

Thanks
Richard

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Michael Paquier
Дата: 25 декабря 2023 г., 09:42:41
Сообщение: Re: pg_basebackup has an accidentaly separated help message

Следующее

От: Kyotaro Horiguchi
Дата: 25 декабря 2023 г., 11:07:28
Сообщение: Re: pg_basebackup has an accidentaly separated help message

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Check lateral references within PHVs for memoize cache keys

Предыдущее

Следующее