Re: "stuck spinlock"

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: "stuck spinlock"
Дата
Msg-id 24563.1386955196@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: "stuck spinlock"  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: "stuck spinlock"
Список pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
> On 2013-12-13 11:26:44 -0500, Tom Lane wrote:
>> On closer inspection, I'm thinking that actually it'd be a good idea if
>> handle_sig_alarm did what we do in, for example, HandleCatchupInterrupt:
>> it should save, clear, and restore ImmediateInterruptOK, so as to make
>> the world safe for timeout handlers to do things that might include a
>> CHECK_FOR_INTERRUPTS.

> Shouldn't the HOLD_INTERRUPTS() in handle_sig_alarm() prevent any
> eventual ProcessInterrupts() in the timeout handlers from doing anything
> harmful?

Sorry, I misspoke there.  The case I'm worried about is doing something
like a wait for lock, which would unconditionally set and then reset
ImmediateInterruptOK.  That's not very plausible perhaps, but on the other
hand we are calling DeadLockCheck() in there, and who knows what future
timeout handlers might try to do?

BTW, I'm about to go put a HOLD_INTERRUPTS/RESUME_INTERRUPTS into
HandleCatchupInterrupt and HandleNotifyInterrupt too, for essentially the
same reason.  At least the first of these *does* include semaphore ops,
so I think it's theoretically vulnerable to losing control if a timeout
occurs while it's waiting for a semaphore.  There's probably no real bug
today because I don't think we enable catchup interrupts at any point
where a timeout would be active, but that doesn't sound terribly
future-proof.  If a timeout did happen, holding off interrupts would have
the effect of postponing the query cancel till we're done with the catchup
interrupt, which seems reasonable.

> One thing I randomly noticed just now is the following in
> RecoveryConflictInterrupt():
>     elog(FATAL, "unrecognized conflict mode: %d",
>          (int) reason);
> obviously that's not really ever going to hit, but it should either be a
> PANIC or an Assert() for the reasons you cite.

Yeah, PANIC there seems good.  I also thought about using
START_CRIT_SECTION/END_CRIT_SECTION instead of
HOLD_INTERRUPTS/RESUME_INTERRUPTS in these signal handlers.  That would
both hold off interrupts and cause any elog(ERROR/FATAL) within the
handler to be promoted to PANIC.  But I'm not sure that'd be a net
stability improvement...
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christophe Pettus
Дата:
Сообщение: Re: "stuck spinlock"
Следующее
От: Tom Lane
Дата:
Сообщение: Re: "stuck spinlock"