BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
От | PG Bug reporting form |
---|---|
Тема | BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash |
Дата | |
Msg-id | 16104-dc11ed911f1ab9df@postgresql.org обсуждение исходный текст |
Ответы |
Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash
(Tomas Vondra <tomas.vondra@2ndquadrant.com>)
|
Список | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16104 Logged by: James Coleman Email address: jtc331@gmail.com PostgreSQL version: 11.5 Operating system: Debian Description: We have a query that, after a recent logical migration to 11.5, ends up with a parallel hash join (I don't think the query plan/query itself is important here, but if needed after the rest of the explanation, I can try to redact it for posting). The query results in this error: ERROR: invalid DSA memory alloc request size 1375731712 (the size changes sometimes significantly, but always over a GB) At first glance it sounded eerily similar to this report which preceded the final release of 11.0: https://www.postgresql.org/message-id/flat/CAEepm%3D1x48j0P5gwDUXyo6c9xRx0t_57UjVaz6X98fEyN-mQ4A%40mail.gmail.com#465f3a61bea2719bc4a7102541326dde but I confirmed that the patch for that bug was applied and is in 11.5 (and earlier). We managed to reproduce this on a replica, and so were able to attach gdb in production to capture a backtrace: #0 errfinish (dummy=dummy@entry=0) at ./build/../src/backend/utils/error/elog.c:423 #1 0x000055a7c0a00f79 in elog_finish (elevel=elevel@entry=20, fmt=fmt@entry=0x55a7c0babc18 "invalid DSA memory alloc request size %zu") at ./build/../src/backend/utils/error/elog.c:1385 #2 0x000055a7c0a2308b in dsa_allocate_extended (area=0x55a7c1d6aa38, size=1140850688, flags=flags@entry=4) at ./build/../src/backend/utils/mmgr/dsa.c:677 #3 0x000055a7c079bd17 in ExecParallelHashJoinSetUpBatches (hashtable=hashtable@entry=0x55a7c1db2740, nbatch=nbatch@entry=2097152) at ./build/../src/backend/executor/nodeHash.c:2889 #4 0x000055a7c079e5f9 in ExecParallelHashIncreaseNumBatches (hashtable=0x55a7c1db2740) at ./build/../src/backend/executor/nodeHash.c:1122 #5 0x000055a7c079ef6e in ExecParallelHashTuplePrealloc (size=56, batchno=<optimized out>, hashtable=0x55a7c1db2740) at ./build/../src/backend/executor/nodeHash.c:3283 #6 ExecParallelHashTableInsert (hashtable=hashtable@entry=0x55a7c1db2740, slot=slot@entry=0x55a7c1dadc90, hashvalue=<optimized out>) at ./build/../src/backend/executor/nodeHash.c:1716 #7 0x000055a7c079f17f in MultiExecParallelHash (node=0x55a7c1dacb78) at ./build/../src/backend/executor/nodeHash.c:288 #8 MultiExecHash (node=node@entry=0x55a7c1dacb78) at ./build/../src/backend/executor/nodeHash.c:112 #9 0x000055a7c078c40c in MultiExecProcNode (node=node@entry=0x55a7c1dacb78) at ./build/../src/backend/executor/execProcnode.c:501 #10 0x000055a7c07a07d5 in ExecHashJoinImpl (parallel=true, pstate=<optimized out>) at ./build/../src/backend/executor/nodeHashjoin.c:290 #11 ExecParallelHashJoin (pstate=<optimized out>) at ./build/../src/backend/executor/nodeHashjoin.c:581 #12 0x000055a7c078bdd9 in ExecProcNodeInstr (node=0x55a7c1d7b018) at ./build/../src/backend/executor/execProcnode.c:461 #13 0x000055a7c079f142 in ExecProcNode (node=0x55a7c1d7b018) at ./build/../src/include/executor/executor.h:251 #14 MultiExecParallelHash (node=0x55a7c1d759d0) at ./build/../src/backend/executor/nodeHash.c:281 #15 MultiExecHash (node=node@entry=0x55a7c1d759d0) at ./build/../src/backend/executor/nodeHash.c:112 #16 0x000055a7c078c40c in MultiExecProcNode (node=node@entry=0x55a7c1d759d0) at ./build/../src/backend/executor/execProcnode.c:501 #17 0x000055a7c07a07d5 in ExecHashJoinImpl (parallel=true, pstate=<optimized out>) at ./build/../src/backend/executor/nodeHashjoin.c:290 #18 ExecParallelHashJoin (pstate=<optimized out>) at ./build/../src/backend/executor/nodeHashjoin.c:581 #19 0x000055a7c078bdd9 in ExecProcNodeInstr (node=0x55a7c1d74e60) at ./build/../src/backend/executor/execProcnode.c:461 #20 0x000055a7c0784303 in ExecProcNode (node=0x55a7c1d74e60) at ./build/../src/include/executor/executor.h:251 #21 ExecutePlan (execute_once=<optimized out>, dest=0x55a7c1d0be00, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55a7c1d74e60, estate=0x55a7c1d74b70) at ./build/../src/backend/executor/execMain.c:1640 #22 standard_ExecutorRun (queryDesc=0x55a7c1d5dcd0, direction=<optimized out>, count=0, execute_once=<optimized out>) at ./build/../src/backend/executor/execMain.c:369 #23 0x00007f4b8b9ace85 in pgss_ExecutorRun (queryDesc=0x55a7c1d5dcd0, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:892 #24 0x000055a7c07893d1 in ParallelQueryMain (seg=0x55a7c1caa648, toc=<optimized out>) at ./build/../src/backend/executor/execParallel.c:1401 #25 0x000055a7c064ee64 in ParallelWorkerMain (main_arg=<optimized out>) at ./build/../src/backend/access/transam/parallel.c:1409 #26 0x000055a7c08568ed in StartBackgroundWorker () at ./build/../src/backend/postmaster/bgworker.c:834 #27 0x000055a7c08637b5 in do_start_bgworker (rw=0x55a7c1c98200) at ./build/../src/backend/postmaster/postmaster.c:5722 #28 maybe_start_bgworkers () at ./build/../src/backend/postmaster/postmaster.c:5935 #29 0x000055a7c0864355 in sigusr1_handler (postgres_signal_arg=<optimized out>) at ./build/../src/backend/postmaster/postmaster.c:5096 #30 <signal handler called> #31 0x00007f4b915895e3 in select () from /lib/x86_64-linux-gnu/libc.so.6 #32 0x000055a7c05d8b5d in ServerLoop () at ./build/../src/backend/postmaster/postmaster.c:1671 #33 0x000055a7c08654f1 in PostmasterMain (argc=5, argv=0x55a7c1c73e50) at ./build/../src/backend/postmaster/postmaster.c:1380 #34 0x000055a7c05dac34 in main (argc=5, argv=0x55a7c1c73e50) at ./build/../src/backend/main/main.c:228 From what I can tell it seems that src/backend/executor/nodeHash.c:2888 (looking at the 11.5 release tag) is another entry point into similar potential problems as were guarded against in the patch I mentioned earlier, and that this is another way parallel hash nodes can end up attempting to allocate more memory than is allowed. Thanks, James Coleman
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Alessandro FerraresiДата:
Сообщение: Re: BUG #16098: unexplained autovacuum to prevent wraparound
Следующее
От: Tomas VondraДата:
Сообщение: Re: BUG #16104: Invalid DSA Memory Alloc Request in Parallel Hash