Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query
Дата
Msg-id CAEepm=3ynb5nBhKQRts0bNETA1HzNxz6-3RTPOzCbM8oQ9yPdg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query  (Sergei Kornilov <sk@zsrv.org>)
Ответы Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting in parallel query  (Sergei Kornilov <sk@zsrv.org>)
Список pgsql-bugs
On Thu, Jan 24, 2019 at 11:56 PM Sergei Kornilov <sk@zsrv.org> wrote:
> We should not call dsm_backend_shutdown twice in same process, right? So we tried call dsm_detach on same segment
0x5624578710c8twice, but this is unexpected behavior and refcnt would be incorrect. And seems we can not LWLockAcquire
lockand then LWLockAcquire same lock again without release. And here we have infinite waiting. 

Yeah, I think your analysis is right.  It shouldn't do so while
holding the lock.  dsm_unpin_segment() should perhaps release it
before it raises an error, something like:

diff --git a/src/backend/storage/ipc/dsm.c b/src/backend/storage/ipc/dsm.c
index 36904d2676..b989c0b94a 100644
--- a/src/backend/storage/ipc/dsm.c
+++ b/src/backend/storage/ipc/dsm.c
@@ -924,9 +924,15 @@ dsm_unpin_segment(dsm_handle handle)
         * called on a segment which is pinned.
         */
        if (control_slot == INVALID_CONTROL_SLOT)
+       {
+               LWLockRelease(DynamicSharedMemoryControlLock);
                elog(ERROR, "cannot unpin unknown segment handle");
+       }
        if (!dsm_control->item[control_slot].pinned)
+       {
+               LWLockRelease(DynamicSharedMemoryControlLock);
                elog(ERROR, "cannot unpin a segment that is not pinned");
+       }
        Assert(dsm_control->item[control_slot].refcnt > 1);

        /*

I have contemplated that before, but not done it because I'm not sure
about the state of the system after that; we just shouldn't be in this
situation, because if we are, it means that we can error out when
later segments (in the array dsa_release_in_place() loops through)
remain pinned forever and we'll leak memory and run out of DSM slots.
Segment pinning is opting out of resource owner control, which means
the client code is responsible for not screwing it up.  Perhaps that
suggests we should PANIC, or perhaps just LOG and continue, but I'm
not sure.

I think the root cause is earlier and in a different process (see
ProcessInterrupt() in the stack).  Presumably one that reported
"dsa_area could not attach to segment" is closer to the point where
things go wrong.  If you are in a position to reproduce this on a
modified source tree, it'd be good to see the back trace for that, to
figure out which of a couple of possible code paths reach it.  Perhaps
you could do that by enabling core files and changing this:

-                       elog(ERROR, "dsa_area could not attach to segment");
+                       elog(PANIC, "dsa_area could not attach to segment");

I have so far not succeeded in reaching that condition.

--
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Suggestion: include interruption method for \watch option (page 1922, PostgreSQL 11.1 Documentation)
Следующее
От: Patrick Headley
Дата:
Сообщение: Re: How duplicate values inserted into the primary key column oftable and how to fix it