Re: Bug in new buffer freelist code

Поиск
Список
Период
Сортировка
От Jan Wieck
Тема Re: Bug in new buffer freelist code
Дата
Msg-id 3FE8A88D.10309@Yahoo.com
обсуждение исходный текст
Ответ на Bug in new buffer freelist code  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Bug in new buffer freelist code  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Bug in new buffer freelist code  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane wrote:
> I just had the parallel regression tests hang up due to what appears to
> be a bug in the new ARC code.  The CLUSTER test gets into an infinite
> loop trying to do "CLUSTER clstr_1;".  The loop is in
> StrategyInvalidateBuffer's check that the buffer is already in the
> freelist; it isn't, and the freelist is circular.

It seems to me that buffers that are thrown away via 
StrategyInvalidateBuffer() do not get their relnode and blocknum cleaned 
out. That causes FlushRelationBuffers() while doing a full scan of the 
whole buffer pool to find buffers that once contained the block again.

If buffer 839 once contained that block, and it was given up that way, 
and later on buffer 850 contains it, there is a CDB for it. If now 
FlushRelationBuffers() scans the buffer pool, it will find buffer 839 
first and call StrategyInvalidateBuffer() for it. That finds the CDB for 
buffer 850, and add's buffer 839 to the list again. Later on FlushRB() 
calls StrategyIB() for buffer 850 and we have the situation at hand.


Does that make sense?

Jan

> 
> (gdb) bt
> #0  0x1fe8a8 in StrategyInvalidateBuffer (buf=0xc3a56f60) at freelist.c:733
> #1  0x1fbf08 in FlushRelationBuffers (rel=0x400fa298, firstDelBlock=0)
>     at bufmgr.c:1596
> #2  0x1479fc in swap_relfilenodes (r1=143786, r2=143915) at cluster.c:736
> #3  0x147458 in rebuild_relation (OldHeap=0x2322b, indexOid=143788)
>     at cluster.c:455
> #4  0x1473b0 in cluster_rel (rvtc=0x7b03bed8, recheck=0 '\000')
>     at cluster.c:395
> #5  0x146ff4 in cluster (stmt=0x400b88a8) at cluster.c:232
> #6  0x21c60c in ProcessUtility (parsetree=0x400b88a8, dest=0x400b88e8,
>     completionTag=0x7b03bbe8 "") at utility.c:1033
> ... etc ...
> 
> (gdb) p *buf
> $5 = {bufNext = -1, data = 7211904, tag = {rnode = {tblNode = 17142,
>       relNode = 143906}, blockNum = 0}, buf_id = 850, flags = 14,
>   refcount = 0, io_in_progress_lock = 1721, cntx_lock = 1722,
>   cntxDirty = 0 '\000', wait_backend_id = 0}
> (gdb) p *StrategyControl
> $1 = {target_T1_size = 423, listUnusedCDB = 249, listHead = {464, 967, 1692,
>     1227}, listTail = {968, 645, 1528, 1694}, listSize = {364, 413, 584, 636},
>   listFreeBuffers = 839, num_lookup = 546939, num_hit = {1378, 246896, 282639,
>     3935}, stat_report = 0, cdb = {{prev = 386, next = 23, list = 3,
>       buf_tag = {rnode = {tblNode = 17142, relNode = 19080}, blockNum = 30},
>       buf_id = -1, t1_xid = 3402}}}
> (gdb) p BufferDescriptors[839]
> $2 = {bufNext = 839, data = 7121792, tag = {rnode = {tblNode = 17142,
>       relNode = 143906}, blockNum = 0}, buf_id = 839, flags = 14,
>   refcount = 0, io_in_progress_lock = 1699, cntx_lock = 1700,
>   cntxDirty = 0 '\000', wait_backend_id = 0}
> 
> So we've got a couple of problems here: buffers 839 and 850 both claim
> to contain block 0 of rel 143906 (which is clstr_1), and the freelist
> is circular.
> 
> This doesn't seem to be super reproducible, but there's definitely a
> problem in there somewhere.
> 
>             regards, tom lane


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Adam Witney
Дата:
Сообщение: One regression failure with 7.4.1 on Debian 3.0r2
Следующее
От: Jean-Michel POURE
Дата:
Сообщение: Re: PostgreSQL port to pure Java?