Michael Simms <grim@ewtoo.org> writes:
>>>> I have noticed a deadlock happening on 7.0.1 on updates.
>>>> The backends just lock, and take up as much CPU as they can. I kill
>>>> the postmaster, and the backends stay alive, using CPU at the highest
>>>> rate possible. The operations arent that expensive, just a single line
>>>> of update.
>>>> Anyone else seen this? Anyone dealing with this?
>>
>> News to me. What sort of hardware are you running on? It sort of
>> sounds like the spinlock code not working as it should --- and since
>> spinlocks are done with platform-dependent assembler, it matters...
> The hardware/software is:
> Linux kernel 2.2.15 (SMP kernel)
> Glibc 2.1.1
> Dual Intel PIII/500
Dual CPUs huh? I have heard of motherboards that have (misdesigned)
memory caching such that the two CPUs don't reliably see each others'
updates to a shared memory location. Naturally that plays hell with the
spinlock code :-(. It might be necessary to insert some kind of cache-
flushing instruction into the spinlock wait loop to ensure that the
CPUs see each others' changes to the lock.
This is all theory at this point, and a hole in the theory is that the
backends ought to give up with a "stuck spinlock" error after a minute
or two of not being able to grab the lock. I assume you have left them
go at it for longer than that without seeing such an error?
Anyway, the next step is to "kill -ABORT" some of the stuck processes
and get backtraces from their coredumps to see where they are stuck.
If you find they are inside s_lock() then it's definitely some kind of
spinlock problem. If not...
regards, tom lane