Re: ERROR: too many dynamic shared memory segments

Поиск
Список
Период
Сортировка
От Jakub Glapa
Тема Re: ERROR: too many dynamic shared memory segments
Дата
Msg-id CAJk1zg01hqzWdtiXzUEmGkZM0Cgh8dUnSYf-SJY8juKarj-UWA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: ERROR: too many dynamic shared memory segments  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: ERROR: too many dynamic shared memory segments  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-general
I see that the segfault is under active discussion but just wanted to ask if increasing the max_connections to mitigate the DSM slots shortage is the way to go? -- regards, Jakub Glapa On Mon, Nov 27, 2017 at 11:48 PM, Thomas Munro < thomas.munro@enterprisedb.com> wrote: > On Tue, Nov 28, 2017 at 10:05 AM, Jakub Glapa > wrote: > > As for the crash. I dug up the initial log and it looks like a > segmentation > > fault... > > > > 2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR: > too > > many dynamic shared memory segments > > Hmm. Well this error can only occur in dsm_create() called without > DSM_CREATE_NULL_IF_MAXSEGMENTS. parallel.c calls it with that flag > and dsa.c doesn't (perhaps it should, not sure, but that'd just change > the error message), so that means this the error arose from dsa.c > trying to get more segments. That would be when Parallel Bitmap Heap > Scan tried to allocate memory. > > I hacked my copy of PostgreSQL so that it allows only 5 DSM slots and > managed to reproduce a segv crash by trying to run concurrent Parallel > Bitmap Heap Scans. The stack looks like this: > > * frame #0: 0x00000001083ace29 > postgres`alloc_object(area=0x0000000000000000, size_class=10) + 25 at > dsa.c:1433 > frame #1: 0x00000001083acd14 > postgres`dsa_allocate_extended(area=0x0000000000000000, size=72, > flags=4) + 1076 at dsa.c:785 > frame #2: 0x0000000108059c33 > postgres`tbm_prepare_shared_iterate(tbm=0x00007f9743027660) + 67 at > tidbitmap.c:780 > frame #3: 0x0000000108000d57 > postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at > nodeBitmapHeapscan.c:156 > frame #4: 0x0000000107fefc5b > postgres`ExecScanFetch(node=0x00007f9743019c88, > accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77), > recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) + > 459 at execScan.c:95 > frame #5: 0x0000000107fef983 > postgres`ExecScan(node=0x00007f9743019c88, > accessMtd=(postgres`BitmapHeapNext at nodeBitmapHeapscan.c:77), > recheckMtd=(postgres`BitmapHeapRecheck at nodeBitmapHeapscan.c:710)) + > 147 at execScan.c:162 > frame #6: 0x00000001080008d1 > postgres`ExecBitmapHeapScan(pstate=0x00007f9743019c88) + 49 at > nodeBitmapHeapscan.c:735 > > (lldb) f 3 > frame #3: 0x0000000108000d57 > postgres`BitmapHeapNext(node=0x00007f9743019c88) + 503 at > nodeBitmapHeapscan.c:156 > 153 * dsa_pointer of the iterator state which will be used by > 154 * multiple processes to iterate jointly. > 155 */ > -> 156 pstate->tbmiterator = tbm_prepare_shared_iterate(tbm); > 157 #ifdef USE_PREFETCH > 158 if (node->prefetch_maximum > 0) > 159 > (lldb) print tbm->dsa > (dsa_area *) $3 = 0x0000000000000000 > (lldb) print node->ss.ps.state->es_query_dsa > (dsa_area *) $5 = 0x0000000000000000 > (lldb) f 17 > frame #17: 0x000000010800363b > postgres`ExecGather(pstate=0x00007f9743019320) + 635 at > nodeGather.c:220 > 217 * Get next tuple, either from one of our workers, or by running the > plan > 218 * ourselves. > 219 */ > -> 220 slot = gather_getnext(node); > 221 if (TupIsNull(slot)) > 222 return NULL; > 223 > (lldb) print *node->pei > (ParallelExecutorInfo) $8 = { > planstate = 0x00007f9743019640 > pcxt = 0x00007f97450001b8 > buffer_usage = 0x0000000108b7e218 > instrumentation = 0x0000000108b7da38 > area = 0x0000000000000000 > param_exec = 0 > finished = '\0' > tqueue = 0x0000000000000000 > reader = 0x0000000000000000 > } > (lldb) print *node->pei->pcxt > warning: could not load any Objective-C class information. This will > significantly reduce the quality of type information available. > (ParallelContext) $9 = { > node = { > prev = 0x000000010855fb60 > next = 0x000000010855fb60 > } > subid = 1 > nworkers = 0 > nworkers_launched = 0 > library_name = 0x00007f9745000248 "postgres" > function_name = 0x00007f9745000268 "ParallelQueryMain" > error_context_stack = 0x0000000000000000 > estimator = (space_for_chunks = 180352, number_of_keys = 19) > seg = 0x0000000000000000 > private_memory = 0x0000000108b53038 > toc = 0x0000000108b53038 > worker = 0x0000000000000000 > } > > I think there are two failure modes: one of your sessions showed the > "too many ..." error (that's good, ran out of slots and said so and > our error machinery worked as it should), and another crashed with a > segfault, because it tried to use a NULL "area" pointer (bad). I > think this is a degenerate case where we completely failed to launch > parallel query, but we ran the parallel query plan anyway and this > code thinks that the DSA is available. Oops. > > -- > Thomas Munro > http://www.enterprisedb.com >

В списке pgsql-general по дате отправления:

Предыдущее
От: rob stone
Дата:
Сообщение: Re: ISO8601 vs POSIX offset clarification
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Replication causing publisher node to use excessive cpu over time