On 02/22/2011 07:55 AM, Robert Haas wrote:
> On Sun, Feb 20, 2011 at 12:39 PM, Greg Stark<gsstark@mit.edu> wrote:
>> On Fri, Feb 18, 2011 at 6:59 PM, Andrew Dunstan<amdunstan@nc.rr.com> wrote:
>>> The server is running as a warm standby, and the client's application tries
>>> to connect to both the master and the slave, accepting whichever lets it
>>> connect (hence hot standby is not turned on).
>>> ...
>>> #2 0x00000000005de645 in LockBufferForCleanup () at bufmgr.c:2432
>>> #3 0x0000000000463733 in heap_xlog_clean (lsn=<value optimized out>,
>> Hm, if hot standby isn't turned on why are we bothering with locks at
>> all? But that said is there *anything* else happening in the database
>> aside from recovery? Are there any errors in the database log?
>>
>> But this still shouldn't block. It's either blocked locking the buffer
>> or blocked waiting for the buffer to become unpinned. It would be nice
>> to get a backtrace from a debugging build which wouldn't have some of
>> the functions inlined. It would be really nice to see the pin count on
>> the buffer in question -- perhaps it has gotten out of sync or
>> underflowed?
> I realize now that it would also be useful to see the state of the
> LWLock on the buffer.
This has happened again. This time we have some debug info available,
and can possible get more, if people tell me what will be helpful:
(gdb) bt #0 0x0000003493ed5337 in semop () from /lib64/libc.so.6 #1 0x00000000005bd1e3 in PGSemaphoreLock
(sema=0x2b1a6014ac38, interruptOK=1 '\001') at pg_sema.c:420 #2 0x00000000005de735 in LockBufferForCleanup
(buffer=310163)at bufmgr.c:2432 #3 0x000000000048b896 in RestoreBkpBlocks (lsn=<value optimized out>,
record=0x2b1a6191b010,cleanup=1 '\001') at xlog.c:3540 #4 0x0000000000463788 in heap_xlog_clean (lsn=<value
optimized out>, record=0x2b1a6191b010) at heapam.c:4160 #5 heap2_redo (lsn=<value optimized out>,
record=0x2b1a6191b010)at heapam.c:4858 (gdb) f 2 #2 0x00000000005de735 in LockBufferForCleanup (buffer=310163)
at bufmgr.c:2432 2432 ProcWaitForSignal(); (gdb) p *bufHdr $2 = {tag = {rnode = {spcNode = 16393,
dbNode= 40475, relNode = 41880}, forkNum = MAIN_FORKNUM, blockNum = 18913}, flags = 6, usage_count = 1, refcount =
1,wait_backend_pid = 9111, buf_hdr_lock = 0 '\000', buf_id = 310162, freeNext = -2, io_in_progress_lock =
620448,content_lock = 620449}
Strangely, I don't even see ProcWaitForSignal() in the frame list - I
shouldn't have thought it was a candidate to be optimised away.
cheers
andrew