Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM
Дата
Msg-id 22735.1507048202@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiverafter OOM  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Tom Lane wrote:
>> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>>> I think the latch is only used locally.  Seems that it was only put in
>>> shmem to avoid a separate variable ...

>> Hm, I'm strongly tempted to move it to a separate static variable then.

Oh, wait, look at

/** Wake up the walreceiver main loop.** This is called by the startup process whenever interesting xlog records* are
applied,so that walreceiver can check if it needs to send an apply* notification back to the master which may be
waitingin a COMMIT with* synchronous_commit = remote_apply.*/
 
void
WalRcvForceReply(void)
{WalRcv->force_reply = true;if (WalRcv->latch)    SetLatch(WalRcv->latch);
}

So that's trouble waiting to happen, for sure.  At the very least we
need to do a single fetch of WalRcv->latch, not two.  I wonder whether
even that is sufficient, though: this coding requires an atomic fetch of
a pointer, which is something we generally don't assume to be safe.

I'm inclined to think that it'd be a good idea to move the set and
clear of the latch field into the nearby spinlock critical sections,
and then change WalRcvForceReply to look like

void
WalRcvForceReply(void)
{Latch       *latch;
WalRcv->force_reply = true;/* fetching the latch pointer might not be atomic, so use spinlock
*/SpinLockAcquire(&WalRcv->mutex);latch= WalRcv->latch;SpinLockRelease(&WalRcv->mutex);if (latch)    SetLatch(latch);
 
}

This still leaves us with a hazard of applying SetLatch to the latch
of a PGPROC that is no longer the walreceiver's, but I think that's
okay (and we have the same problem elsewhere).  At worst it leads to
an extra wakeup of the next process using that PGPROC.

I'm also thinking that the force_reply flag needs to be sig_atomic_t,
not bool, because bool store/fetch isn't necessarily atomic on all
platforms.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] SendRowDescriptionMessage() is slow for queries with alot of columns
Следующее
От: Dang Minh Huong
Дата:
Сообщение: Re: [HACKERS] list of credits for release notes