Re: SR standby hangs

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: SR standby hangs
Дата
Msg-id 4DB724A2.9040809@dunslane.net
обсуждение исходный текст
Ответ на Re: SR standby hangs  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: SR standby hangs  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers

On 02/22/2011 07:55 AM, Robert Haas wrote:
> On Sun, Feb 20, 2011 at 12:39 PM, Greg Stark<gsstark@mit.edu>  wrote:
>> On Fri, Feb 18, 2011 at 6:59 PM, Andrew Dunstan<amdunstan@nc.rr.com>  wrote:
>>> The server is running as a warm standby, and the client's application tries
>>> to connect to both the master and the slave, accepting whichever lets it
>>> connect (hence hot standby is not turned on).
>>> ...
>>>    #2  0x00000000005de645 in LockBufferForCleanup () at bufmgr.c:2432
>>>    #3  0x0000000000463733 in heap_xlog_clean (lsn=<value optimized out>,
>> Hm, if hot standby isn't turned on why are we bothering with locks at
>> all? But that said is there *anything* else happening in the database
>> aside from recovery? Are there any errors in the database log?
>>
>> But this still shouldn't block. It's either blocked locking the buffer
>> or blocked waiting for the buffer to become unpinned. It would be nice
>> to get a backtrace from a debugging build which wouldn't have some of
>> the functions inlined. It would be really nice to see the pin count on
>> the buffer in question -- perhaps it has gotten out of sync or
>> underflowed?
> I realize now that it would also be useful to see the state of the
> LWLock on the buffer.


This has happened again. This time we have some debug info available, 
and can possible get more, if people tell me what will be helpful:
   (gdb) bt   #0  0x0000003493ed5337 in semop () from /lib64/libc.so.6   #1  0x00000000005bd1e3 in PGSemaphoreLock
(sema=0x2b1a6014ac38,  interruptOK=1 '\001') at pg_sema.c:420   #2  0x00000000005de735 in LockBufferForCleanup
(buffer=310163)at   bufmgr.c:2432   #3  0x000000000048b896 in RestoreBkpBlocks (lsn=<value optimized   out>,
record=0x2b1a6191b010,cleanup=1 '\001') at xlog.c:3540   #4  0x0000000000463788 in heap_xlog_clean (lsn=<value
optimized  out>, record=0x2b1a6191b010) at heapam.c:4160   #5  heap2_redo (lsn=<value optimized out>,
record=0x2b1a6191b010)at   heapam.c:4858   (gdb) f 2   #2  0x00000000005de735 in LockBufferForCleanup (buffer=310163)
at  bufmgr.c:2432   2432                ProcWaitForSignal();   (gdb) p *bufHdr   $2 = {tag = {rnode = {spcNode = 16393,
dbNode= 40475, relNode =   41880}, forkNum = MAIN_FORKNUM, blockNum = 18913}, flags = 6,   usage_count = 1, refcount =
1,wait_backend_pid = 9111,      buf_hdr_lock = 0 '\000', buf_id = 310162, freeNext = -2,   io_in_progress_lock =
620448,content_lock = 620449}
 

Strangely, I don't even see ProcWaitForSignal() in the frame list - I 
shouldn't have thought it was a candidate to be optimised away.

cheers

andrew



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: pg_ctl restart - behaviour based on wrong instance
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: Bad COMPACT_ALLOC_CHUNK size in tsearch/spell.c?