Re: Postgresql 16.3 Out Of Memory

Поиск
Список
Период
Сортировка
От Radu Radutiu
Тема Re: Postgresql 16.3 Out Of Memory
Дата
Msg-id CAG4TxrjnFxwWLEYrvt90RJ3EC8_kqbcHt87ei6w70W0003RaOw@mail.gmail.com
обсуждение исходный текст
Ответ на Postgresql 16.3 Out Of Memory  (Radu Radutiu <rradutiu@gmail.com>)
Список pgsql-general
It looks like I did not copy the list. 

I did run VACUUM ANALYZE after the upgrade. I've even run it now before getting a new explain plan that is very similar (just the costs differ):
 Gather Merge  (cost=12336145.92..16111570.23 rows=31531596 width=66)
   Workers Planned: 4
   ->  Sort  (cost=12335145.86..12354853.11 rows=7882899 width=66)
         Sort Key: t.msg_status DESC, t.input_sequence
         ->  Parallel Hash Right Join  (cost=9376528.66..11242773.26 rows=7882899 width=66)
               Hash Cond: (snd_tro.reply_input_sequence = t.input_sequence)
               ->  Parallel Seq Scan on tbl_outputrequest snd_tro  (cost=0.00..1431919.45 rows=20057645 width=16)
               ->  Parallel Hash  (cost=9254599.76..9254599.76 rows=5245992 width=58)
                     ->  Parallel Hash Right Join  (cost=7458636.44..9254599.76 rows=5245992 width=58)
                           Hash Cond: (rec_tro.input_sequence = t.input_sequence)
                           ->  Parallel Seq Scan on tbl_outputrequest rec_tro  (cost=0.00..1431919.45 rows=20057645 width=16)
                           ->  Parallel Hash  (cost=7380902.99..7380902.99 rows=3491156 width=50)
                                 ->  Parallel Hash Right Join  (cost=5592677.17..7380902.99 rows=3491156 width=50)
                                       Hash Cond: (rpl_snd_tro.reply_input_sequence = r.input_sequence)
                                       ->  Parallel Seq Scan on tbl_outputrequest rpl_snd_tro  (cost=0.00..1431919.45 rows=20057645 width=16)
                                       ->  Parallel Hash  (cost=5518353.72..5518353.72 rows=3491156 width=42)
                                             ->  Parallel Hash Right Join  (cost=3729209.40..5518353.72 rows=3491156 width=42)
                                                   Hash Cond: (rpl_rec_tro.input_sequence = r.input_sequence)
                                                   ->  Parallel Seq Scan on tbl_outputrequest rpl_rec_tro  (cost=0.00..1431919.45 rows=20057645 width=16)
                                                   ->  Parallel Hash  (cost=3658294.95..3658294.95 rows=3491156 width=34)
                                                         ->  Parallel Hash Right Join  (cost=1883503.35..3658294.95 rows=3491156 width=34)
                                                               Hash Cond: (r.originalrequest_id = t.input_sequence)
                                                               ->  Parallel Seq Scan on tbl_inputrequest r  (cost=0.00..1739752.66 rows=13348166 width=16)
                                                               ->  Parallel Hash  (cost=1839863.91..1839863.91 rows=3491156 width=26)
                                                                     ->  Parallel Seq Scan on tbl_inputrequest t  (cost=0.00..1839863.91 rows=3491156 width=26)
                                                                           Filter: ((receive_time < '2024-05-17 00:00:00'::timestamp without time zone) AND (input_sequence < '202406020168279904'::bigint) AND ((msg_status)::text = ANY ('{COMPLETED,REJECTED}'::text[])))

The query cost is high and it returns a significant number of rows. However it should not consume 64+ GB RAM with the default enable_parallel_hash = 'on' when my shared_buffers is 8GB. The temporary fix I've implemented to get the system working is a change in the application configuration so that the timestamp filter selects fewer rows (3x less), together with setting enable_parallel_hash = 'off'. PostgreSQL service memory usage grows and stays over 20GB even with this setting. I'd like to find out exactly what causes the high memory usage as we have other projects using PostgreSQL and they are scheduled for upgrade from v12.  

My test.sql looks like this (application uses prepared statements, the two set operations are to revert for the current session the already implemented fixes):

prepare my_query (timestamp,bigint) as SELECT  t.input_sequence, rec_tro.output_sequence, r.input_sequence, rpl_rec_tro.output_sequence, rpl_snd_tro.output_sequence, snd_tro.output_sequence, t.msg_type  FROM inputrequest t  LEFT JOIN outputrequest rec_tro ON rec_tro.input_sequence = t.input_sequence LEFT JOIN inputrequest r ON r.originalRequest_id = t.input_sequence   LEFT JOIN outputrequest rpl_rec_tro ON rpl_rec_tro.input_sequence = r.input_sequence  LEFT JOIN outputrequest rpl_snd_tro ON rpl_snd_tro.reply_input_sequence = r.input_sequence  LEFT JOIN outputrequest snd_tro ON snd_tro.reply_input_sequence = t.input_sequence  WHERE t.receive_time < $1 AND t.input_sequence < $2  AND t.msg_status IN ('COMPLETED', 'REJECTED')  ORDER BY t.msg_status DESC, t.input_sequence ;

EXPLAIN  EXECUTE my_query('2024-05-17 00:00:00', 202406020168279904);

I have an explanation for what I suspected was a memory leak. It seems that systemd reports cached memory, the relevant part is probably rss from  /sys/fs/cgroup/memory/system.slice/postgresql-16.service/memory.stat . This still leaves the out of memory situation.
I've managed to get a dump of the memory contexts for the 5 processes (explain + 4 parallel workers) while the query is starting to use significant memory (>10G). I've attached it in case anyone familiar with postgresql internals can see something obviously wrong with the reported numbers. 

Radu

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: sud
Дата:
Сообщение: Re: Long running query causing XID limit breach
Следующее
От: Shammat
Дата:
Сообщение: Re: Poor performance after restoring database from snapshot on AWS RDS