Re: the s_lock_stuck on perform_spin_delay
От | Robert Haas |
---|---|
Тема | Re: the s_lock_stuck on perform_spin_delay |
Дата | |
Msg-id | CA+TgmoaA2uqsozTj9UDNrXTKWiUz1kW9eQsKiCr9ZvPwaFGFDA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: the s_lock_stuck on perform_spin_delay (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: the s_lock_stuck on perform_spin_delay
(Andres Freund <andres@anarazel.de>)
|
Список | pgsql-hackers |
On Thu, Jan 4, 2024 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I believe it's (a). No matter what the reason for a stuck spinlock > is, the only reliable method of getting out of the problem is to > blow things up and start over. The patch proposed at the top of this > thread would leave the system unable to recover on its own, with the > only recourse being for the DBA to manually force a crash/restart ... > once she figured out that that was necessary, which might take a long > while if the only external evidence is an occasional WARNING that > might not even be making it to the postmaster log. How's that better? It's a fair question. I think you're correct if we assume that everyone's following the coding rule ... at least assuming that the target system isn't too insanely slow, and I've seen some pretty crazily overloaded machines. But if the coding rule is not being followed, then "the only reliable method of getting out of the problem is to blow things up and start over" becomes a dubious conclusion. Also, I wonder if many or even all uses of spinlocks uses ought to be replaced with either LWLocks or atomics. An LWLock might be slightly slower when contention is low, but it scales better when contention is high, displays a wait event so that you can see that you have contention if you do, and doesn't PANIC the system if the contention gets too bad. And an atomic will just be faster, in the cases where it's adequate. The trouble with trying to do a replacement is that some of the spinlock-using code is ancient and quite hairy. info_lck in particular looks like a hot mess -- it's used in complex ways and in performance critical paths, with terrifying comments like this: * To read XLogCtl->LogwrtResult, you must hold either info_lck or * WALWriteLock. To update it, you need to hold both locks. The point of * this arrangement is that the value can be examined by code that already * holds WALWriteLock without needing to grab info_lck as well. In addition * to the shared variable, each backend has a private copy of LogwrtResult, * which is updated when convenient. * * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst * (protected by info_lck), but we don't need to cache any copies of it. * * info_lck is only held long enough to read/update the protected variables, * so it's a plain spinlock. The other locks are held longer (potentially * over I/O operations), so we use LWLocks for them. These locks are: But info_lck was introduced in 1999 and this scheme was introduced in 2012, and a lot has changed since then. Whatever benchmarking was done to validate this locking regime is probably obsolete at this point. Back then, LWLocks were built on top of spinlocks, and were, I believe, a lot slower than they are now. Plus CPU performance characteristics have changed a lot. So who really knows if the way we're doing things here makes any sense at all these days? But one doesn't want to do a naive replacement and pessimize things, either. -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления:
Следующее
От: Robert HaasДата:
Сообщение: Re: Emit fewer vacuum records by reaping removable tuples during pruning