Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiverafter OOM

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiverafter OOM
Дата
Msg-id 477434c1-b52a-3595-b54e-13cdf53c0f82@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiverafter OOM  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On 03/10/17 22:01, Thomas Munro wrote:
> On Wed, Oct 4, 2017 at 7:32 AM, Petr Jelinek
> <petr.jelinek@2ndquadrant.com> wrote:
>> On 03/10/17 19:44, Tom Lane wrote:
>>> I wrote:
>>>>> So that's trouble waiting to happen, for sure.  At the very least we
>>>>> need to do a single fetch of WalRcv->latch, not two.  I wonder whether
>>>>> even that is sufficient, though: this coding requires an atomic fetch of
>>>>> a pointer, which is something we generally don't assume to be safe.
>>>
>>> BTW, I had supposed that this bug was of long standing, but actually it's
>>> new in v10, dating to 597a87ccc9a6fa8af7f3cf280b1e24e41807d555.  Before
>>> that walreceiver start/stop just changed the owner of a long-lived shared
>>> latch, and there was no question of stale pointers.
>>>
>>> I considered reverting that decision, but the reason for it seems to have
>>> been to let libpqwalreceiver.c manipulate MyProc->procLatch rather than
>>> having to know about a custom latch.  That's probably a sufficient reason
>>> to justify some squishiness in the wakeup logic.  Still, we might want to
>>> revisit it if we find any other problems here.
>> That's correct, and because the other users of that library don't have
>> special latch it seemed feasible to standardize on the procLatch. If we
>> indeed wanted to change the decision about this I think we can simply
>> give latch as a parameter to libpqrcv_connect and store it in the
>> WalReceiverConn struct. The connection does not need to live past the
>> latch (although it currently does, but that's just a matter of
>> reordering the code in WalRcvDie() a little AFAICS).
> 
> I wonder if the new ConditionVariable mechanism would be worth
> considering in future cases like this, where the signalling process
> doesn't know the identity of the process to be woken up (or even how
> many waiting processes there are), but instead any interested waiters
> block on a particular ConditionVariable that models a specific
> interesting condition.  In the end it's just latches anyway, but it
> may be a better abstraction.  On the other hand I'm not sure how waits
> on a ConditionVariable would be multiplexed with IO (a distinct wait
> event, or leaky abstraction where the caller relies on the fact that
> it's built on MyProc->latch, something else?).
> 

In this specific case the problem is that what we are waiting for is
indeed the Latch because we want the libpqwalreceiver calls to be
interruptible when waiting for I/O.

Otherwise, yes ConditionVariable is useful for generic signaling, we use
it in slot and origin for "lock" waits (they don't have normal catalog
so normal locking does not work).

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Maksim Milyutin
Дата:
Сообщение: Re: [HACKERS][BUG] Cache invalidation for queries that contains constof temporary composite type
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] parallelize queries containing initplans