BUG #17844: Memory consumption for memoize node

Поиск

Список

Период

Сортировка

От	PG Bug reporting form
Тема	BUG #17844: Memory consumption for memoize node
Дата	15 марта 2023 г. 12:08:18
Msg-id	17844-d2f6f9e75a622bed@postgresql.org обсуждение исходный текст
Ответы	Re: BUG #17844: Memory consumption for memoize node (David Rowley <dgrowleyml@gmail.com>)
Список	pgsql-bugs

Дерево обсуждения

The following bug has been logged on the website:

Bug reference:      17844
Logged by:          Alexey Ermakov
Email address:      alexey.ermakov@dataegret.com
PostgreSQL version: 14.7
Operating system:   Ubuntu 20.04
Description:

Hello, recently I found interesting query which consume a lot of memory
(much more than expected with current work_mem and hash_mem_multiplier
settings) and triggers OOM killer.

After simplifying query looks like there is a problem in part where we join
table on condition like:
id = (((test_json.big_json_column)::json ->> 'id'::text))::integer
and there is a memoize node in the plan. Without memoize node query works
fine.

I was able to create reproducible test case (on machine with default config
and postgresql 14.7):

------------------------------------------------------------------------------
create extension pgcrypto;
--create function to generate dummy data
CREATE OR REPLACE FUNCTION generate_random_string(size INT) RETURNS TEXT AS
$$
DECLARE
  characters TEXT :=
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  bytes BYTEA := gen_random_bytes(size);
  l INT := length(characters);
  i INT := 0;
  output TEXT := '';
BEGIN
  WHILE i < size LOOP
    output := output || substr(characters, get_byte(bytes, i) % l + 1, 1);
    i := i + 1;
  END LOOP;
  RETURN output;
END;
$$ LANGUAGE plpgsql VOLATILE;


--create table with 200k rows and text column with content like
`{"random_string":"....", "id":1}`
create table test_json as
select id, ('{"random_string":"' || generate_random_string(500) || '",
"id":' || id || '}')::text as big_json_column
from generate_series(1,200000) gs(id);

--duplicate rows and update statistics (to trigger memoize node)
insert into test_json select * from test_json;
analyze test_json;

--another table with similar structure and 100k rows
create table test_json2 as
select id, ('{"random_string":"' || generate_random_string(500) || '",
"id":' || id || '}')::json as big_json_column
from generate_series(1,100000) gs(id);
alter table test_json2 add primary key(id);

--table sizes: test_json - 223MB, test_json2 - 56MB

--change settings to trigger plan with memoize node
set work_mem = '128MB';
set enable_hashjoin = off;
set enable_mergejoin = off;
set jit = off;

explain (analyze, buffers) select test_json.id, test_json2.id from test_json
left join test_json2 on test_json2.id =
((test_json.big_json_column::json)->>'id')::int;
 Nested Loop Left Join  (cost=0.32..108047.22 rows=400000 width=8) (actual
time=0.031..3155.261 rows=400000 loops=1)
   Buffers: shared hit=415739 read=12834
   I/O Timings: read=18.254
   ->  Seq Scan on test_json  (cost=0.00..32572.00 rows=400000 width=540)
(actual time=0.005..57.248 rows=400000 loops=1)
         Buffers: shared hit=15738 read=12834
         I/O Timings: read=18.254
   ->  Memoize  (cost=0.32..0.34 rows=1 width=4) (actual time=0.008..0.008
rows=0 loops=400000)
         Cache Key: (((test_json.big_json_column)::json ->>
'id'::text))::integer
         Cache Mode: logical
         Hits: 200000  Misses: 200000  Evictions: 0  Overflows: 0  Memory
Usage: 16797kB
         Buffers: shared hit=400001
         ->  Index Only Scan using test_json2_pkey on test_json2
(cost=0.30..0.33 rows=1 width=4) (actual time=0.001..0.001 rows=0
loops=200000)
               Index Cond: (id = (((test_json.big_json_column)::json ->>
'id'::text))::integer)
               Heap Fetches: 0
               Buffers: shared hit=400001
 Planning Time: 0.069 ms
 Execution Time: 3227.078 ms
------------------------------------------------------------------------------

During execution I looked on "avail Mem" in top output on test machine to
check how much memory process consume. It looked different each time,
usually hundreds of MB, sometime around 1.5GB (which is even bigger than
table size).
I was able to trigger OOM killer with this query and bigger test_json table
with similar data.

I'm wondering:
1) Is it a known bug ? Does it relate to json parsing somehow ?
2) Is it possible to show such memory consumption in explain (analyze,
buffers) output for easier troubleshooting ?

--
Thanks,
Alexey Ermakov

В списке pgsql-bugs по дате отправления:

Предыдущее

От: Kyotaro Horiguchi
Дата: 15 марта 2023 г., 05:10:11
Сообщение: Re: pg_read_server_files doesn't let me use pg_ls_dir() or pg_read_file?

Следующее

От: hubert depesz lubaczewski
Дата: 15 марта 2023 г., 13:46:17
Сообщение: Re: pg_read_server_files doesn't let me use pg_ls_dir() or pg_read_file?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

BUG #17844: Memory consumption for memoize node

Предыдущее

Следующее