Обсуждение: IpcSemaphoreLock/Unlock and proc_exit on 7.2.6
I have an underpowered server running 7.2.6 that backs a website which occasionally gets hit by a bunch of traffic and starts firing off "FATAL 1: Sorry, too many clients already" messages. This is all as expected, but sometimes it just crashes. I had no clue what was going on until I checked the stderr log (because I had set it up to use syslog). In there I find a whole bunch of these: IpcSemaphoreLock: semop(id=-1) failed: Invalid argument IpcSemaphoreLock: semop(id=-1) failed: Invalid argument IpcSemaphoreLock: semop(id=-1) failed: Invalid argument IpcSemaphoreLock: semop(id=-1) failed: Invalid argument IpcSemaphoreUnlock: semop(id=-1) failed: Invalid argument IpcSemaphoreLock: semop(id=-1) failed: Invalid argument IpcSemaphoreUnlock: semop(id=-1) failed: Invalid argument IpcSemaphoreLock: semop(id=-1) failed: Invalid argument Looking at the source I see proc_exit as the failure path for these two functions (IpcSemaphoreLock, IpcSemaphoreUnlock). I've read the comments around the code, but must admit that I can't really follow what's going on. Could anyone shed some light on what is going on? Certainly the semId of -1 looks a little suspicious. This is on freebsd 4.5 Kris Jurka
Kris Jurka <books@ejurka.com> writes: > I have an underpowered server running 7.2.6 that backs a website which > occasionally gets hit by a bunch of traffic and starts firing off "FATAL > 1: Sorry, too many clients already" messages. This is all as expected, > but sometimes it just crashes. I had no clue what was going on until I > checked the stderr log (because I had set it up to use syslog). In there > I find a whole bunch of these: > IpcSemaphoreLock: semop(id=-1) failed: Invalid argument [ eyeballs code... ] It looks like this could happen in 7.2 during exit from a backend that failed to acquire a semaphore --- ProcKill does things like LockReleaseAll, which needs to acquire the lockmanager LWLock, which could try to block using the process semaphore if there's contention for the LWLock. The problem should be gone in 7.3 and later due to reorganization of the semaphore management code. I'm not sure it's worth trying to fix in 7.2.* --- the odds of introducing new problems seem too high, and we're not really maintaining 7.2 anymore anyway. The comment in ProcGetNewSemIdAndNum suggests that you might be able to suppress the problem in 7.2 by using a different max_connections value. Is your current value one less than a multiple of 16, by any chance? regards, tom lane
On Sun, 14 Nov 2004, Tom Lane wrote: > The comment in ProcGetNewSemIdAndNum suggests that you might be able to > suppress the problem in 7.2 by using a different max_connections value. > Is your current value one less than a multiple of 16, by any chance? > Currently 32. It is unclear whether you think 31 is the failure case your thinking of or whether 31 might help. Kris Jurka
Kris Jurka <books@ejurka.com> writes: > On Sun, 14 Nov 2004, Tom Lane wrote: >> Is your current value one less than a multiple of 16, by any chance? > Currently 32. It is unclear whether you think 31 is the failure case your > thinking of or whether 31 might help. No, 32 is actually the best case (most slop) if I'm reading the code correctly. I'd suggest an update to 7.3 or later ... regards, tom lane