Re: the s_lock_stuck on perform_spin_delay

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: the s_lock_stuck on perform_spin_delay
Дата
Msg-id 498328.1704386013@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: the s_lock_stuck on perform_spin_delay  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: the s_lock_stuck on perform_spin_delay  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jan 4, 2024 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We should be making an effort to ban coding patterns like
>> "return with spinlock still held", because they're just too prone
>> to errors similar to this one.

> I agree that we don't want to add overhead, and also about how
> spinlocks should be used, but I dispute "easily detectable
> statically." I mean, if you or I look at some code that uses a
> spinlock, we'll know whether the pattern that you mention is being
> followed or not, modulo differences of opinion in debatable cases. But
> you and I cannot be there to look at all the code all the time. If we
> had a static checking tool that was run as part of every build, or in
> the buildfarm, or by cfbot, or somewhere else that raised the alarm if
> this rule was violated, then we could claim to be effectively
> enforcing this rule.

I was indeed suggesting that maybe we could find a way to detect
such things automatically.  While I've not been paying close
attention, I recall there's been some discussions of using LLVM/clang
infrastructure for customized static analysis, so maybe it'd be
possible without an undue amount of effort.

> I think the question we should be asking here is what the purpose of
> the PANIC is. I can think of two possible purposes. It could be either
> (a) an attempt to prevent real-world harm by turning database hangs
> into database panics, so that at least the system will restart and get
> moving again instead of sitting there stuck for all eternity or (b) an
> attempt to punish people for writing bad code by turning coding rule
> violations into panics on production systems.

I believe it's (a).  No matter what the reason for a stuck spinlock
is, the only reliable method of getting out of the problem is to
blow things up and start over.  The patch proposed at the top of this
thread would leave the system unable to recover on its own, with the
only recourse being for the DBA to manually force a crash/restart ...
once she figured out that that was necessary, which might take a long
while if the only external evidence is an occasional WARNING that
might not even be making it to the postmaster log.  How's that better?

> ... (b3) if the PANIC does fire, it
> gives you basically zero help in figuring out where the actual problem
> is. The PostgreSQL code base is way too big for "ERROR: you screwed
> up" to be an acceptable diagnostic.

Ideally I agree with the latter, but that doesn't mean that doing
better is easy or even possible.  (The proposed patch certainly does
nothing to help diagnose such issues.)  As for the former point,
panicking here at least offers the chance of getting a stack trace,
which might help a developer find the problem.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: the s_lock_stuck on perform_spin_delay
Следующее
От: Robert Haas
Дата:
Сообщение: Re: the s_lock_stuck on perform_spin_delay