Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start
От | Noah Misch |
---|---|
Тема | Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start |
Дата | |
Msg-id | 20241108233649.01.nmisch@google.com обсуждение исходный текст |
Ответ на | Re: Leader backend hang on IPC/ParallelFinish when LWLock held at parallel query start (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-bugs |
On Fri, Nov 08, 2024 at 12:56:55PM -0500, Tom Lane wrote: > I wrote: > > Here's a proposed patch along that line. I left the test case from > > ac04aa84a alone, since it works perfectly well to test this way too. > > I'd modeled that on the existing recovery code for DSM segment creation > failure, just below. But a look at the code coverage report shows > (unsurprisingly) that that path is never exercised in our regression > tests, so I wondered if it actually works ... and it doesn't work > very well. To test, I lobotomized InitializeParallelDSM to always > force pcxt->nworkers = 0. That results in a bunch of unsurprising > regression test diffs, plus a couple of > > +ERROR: could not find key 4 in shm TOC at 0x229f138 > > which turns out to be the fault of ExecHashJoinReInitializeDSM: > it's not accounting for the possibility that we didn't really > start a parallel hash join. > > I'm also not happy about ReinitializeParallelWorkers' > > Assert(pcxt->nworkers >= nworkers_to_launch); > > The one existing caller manages not to trigger that because it's > careful to reduce its request based on pcxt->nworkers, but it > doesn't seem to me that callers should be expected to have to. > > So I end with the attached. There might still be some more issues > that the regression tests don't reach, but I think this is the > best we can do for today. Looks good.
В списке pgsql-bugs по дате отправления: