Re: the s_lock_stuck on perform_spin_delay
От | Tom Lane |
---|---|
Тема | Re: the s_lock_stuck on perform_spin_delay |
Дата | |
Msg-id | 498328.1704386013@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: the s_lock_stuck on perform_spin_delay (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: the s_lock_stuck on perform_spin_delay
(Robert Haas <robertmhaas@gmail.com>)
|
Список | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 4, 2024 at 10:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We should be making an effort to ban coding patterns like >> "return with spinlock still held", because they're just too prone >> to errors similar to this one. > I agree that we don't want to add overhead, and also about how > spinlocks should be used, but I dispute "easily detectable > statically." I mean, if you or I look at some code that uses a > spinlock, we'll know whether the pattern that you mention is being > followed or not, modulo differences of opinion in debatable cases. But > you and I cannot be there to look at all the code all the time. If we > had a static checking tool that was run as part of every build, or in > the buildfarm, or by cfbot, or somewhere else that raised the alarm if > this rule was violated, then we could claim to be effectively > enforcing this rule. I was indeed suggesting that maybe we could find a way to detect such things automatically. While I've not been paying close attention, I recall there's been some discussions of using LLVM/clang infrastructure for customized static analysis, so maybe it'd be possible without an undue amount of effort. > I think the question we should be asking here is what the purpose of > the PANIC is. I can think of two possible purposes. It could be either > (a) an attempt to prevent real-world harm by turning database hangs > into database panics, so that at least the system will restart and get > moving again instead of sitting there stuck for all eternity or (b) an > attempt to punish people for writing bad code by turning coding rule > violations into panics on production systems. I believe it's (a). No matter what the reason for a stuck spinlock is, the only reliable method of getting out of the problem is to blow things up and start over. The patch proposed at the top of this thread would leave the system unable to recover on its own, with the only recourse being for the DBA to manually force a crash/restart ... once she figured out that that was necessary, which might take a long while if the only external evidence is an occasional WARNING that might not even be making it to the postmaster log. How's that better? > ... (b3) if the PANIC does fire, it > gives you basically zero help in figuring out where the actual problem > is. The PostgreSQL code base is way too big for "ERROR: you screwed > up" to be an acceptable diagnostic. Ideally I agree with the latter, but that doesn't mean that doing better is easy or even possible. (The proposed patch certainly does nothing to help diagnose such issues.) As for the former point, panicking here at least offers the chance of getting a stack trace, which might help a developer find the problem. regards, tom lane
В списке pgsql-hackers по дате отправления: