BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset
От | PG Bug reporting form |
---|---|
Тема | BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset |
Дата | |
Msg-id | 19040-c9b6073ef814f48c@postgresql.org обсуждение исходный текст |
Ответы |
回复:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset
Re:BUG #19040: Memory leak in hashed subplan node due to missing hashtempcxt reset |
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 19040 Logged by: haiyang li Email address: mohen.lhy@alibaba-inc.com PostgreSQL version: 18beta3 Operating system: centos7 5.10.84 x86_64 Description: Hello, all! I found a query which consumes a lot of memory and triggers OOM killer. Memory leak occurs in hashed subplan node. I was able to create reproducible test case on machine with default config and postgresql 18beta3: CREATE TABLE test1( a numeric, b int); INSERT INTO test1 SELECT i, i FROM generate_series(1, 30000000) i; -- Make the running time longer EXPLAIN ANALYZE SELECT * FROM test1 WHERE a NOT IN( SELECT i FROM generate_series(1, 10000) i ); plan: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Seq Scan on test1 (cost=125.00..612432.24 rows=15000108 width=10) (actual time=135.191..25832.808 rows=29990000 loops=1) Filter: (NOT (hashed SubPlan 1)) Rows Removed by Filter: 10000 SubPlan 1 -> Function Scan on generate_series i (cost=0.00..100.00 rows=10000 width=4) (actual time=36.999..38.296 rows=10000 loops=1) Planning Time: 0.280 ms JIT: Functions: 15 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 1.155 ms, Inlining 25.929 ms, Optimization 60.700 ms, Emission 23.018 ms, Total 110.802 ms Execution Time: 28217.026 ms (11 rows) I observed that the process's RES (resident memory) was increasing rapidly during SQL execution by using 'top -p <pid>' command. Furthermore, during SQL execution, I ran 'select pg_log_backend_memory_contexts(<pid>)' to print memory context statistics. The context with abnormally high memory usage was "Subplan HashTable Temp Context." The key part of the log is as follows: ... LOG: level: 5; Subplan HashTable Temp Context: 514834432 total in 62849 blocks; 973712 free (60695 chunks); 513860720 used LOG: level: 5; Subplan HashTable Context: 524288 total in 7 blocks; 123808 free (5 chunks); 400480 used ... Grand total: 518275344 bytes in 63116 blocks; 2025560 free (60976 chunks); 516249784 used ... If I change the SQL from "a NOT IN" to "b NOT IN" and do the same action, I can not observe abnormally high memory usage. Likewise, the key part of the log is as follows: ... LOG: level: 5; Subplan HashTable Temp Context: 1024 total in 1 blocks; 784 free (0 chunks); 240 used LOG: level: 5; Subplan HashTable Context: 524288 total in 7 blocks; 123808 free (5 chunks); 400480 used ... Grand total: 3441936 bytes in 268 blocks; 1050520 free (281 chunks); 2391416 used ... While analyzing the source code, I found that the hashed subplan node fails to reset the 'hashtempcxt' context after probing the hash table for each slot. When variable-length datatypes (e.g., numeric) are processed, this can trigger calls to 'detoast_attr', which allocate memory in hashtempcxt. Without a reset, this memory is not reclaimed until the context itself is destroyed, resulting in a memory leak when processing large numbers of slots. A patch implementing this fix will be included in the follow-up email. -- Thanks, Haiyang Li
В списке pgsql-bugs по дате отправления: