Обсуждение: WARNING: buffer refcount leak

Поиск
Список
Период
Сортировка

WARNING: buffer refcount leak

От
Brian Hirt
Дата:
I'm working on a new machine, and i think it's got possible bad 
hardware, since that seems more likely than a bug in postgresql.  I'm  
wondering if someone has any idea what kind of hardware failure might 
cause this message:

WARNING:  buffer refcount leak: [424] (freeNext=425, freePrev=423, 
rel=0/0, blockNum=4294967295, flags=0x1c, refcount=-631 30464)

The one time this happened, postmaster displayed the refcount leak, or 
it would segfault or it crashed with messages like these:  (free(): 
invalid pointer 0xa06ffc0!).  Usually it just works fine, this appears 
to be a very intermittent problem.  We've already replaced the SCA 
backplane, the SCSI cables, the RAID controller, and the motherboard.   
The only components not replaced are the memory and the CPUs.

I've run Memtest86 on the box for several days without it finding any 
bad memory.  It's the first test I run on any new machine.   Can anyone 
recommend any good (free) diagnostics programs like Memtest86 that 
check CPUs, PCI bus, etc, etc.

The machine is a dual xeon 2.8,  4gb ECC ram, and 14 15k 36G U320 
drives with a megaraid 320-2x controller.   running fedora core 1, 
postgres 7.3.4

thanks for any advice, and i hope this isn't too off topic.

--brian



Re: WARNING: buffer refcount leak

От
Tom Lane
Дата:
Brian Hirt <bhirt@mobygames.com> writes:
> I'm working on a new machine, and i think it's got possible bad 
> hardware, since that seems more likely than a bug in postgresql.  I'm  
> wondering if someone has any idea what kind of hardware failure might 
> cause this message:

> WARNING:  buffer refcount leak: [424] (freeNext=425, freePrev=423, 
> rel=0/0, blockNum=4294967295, flags=0x1c, refcount=-631 30464)

My money is on bad RAM.  That refcount is ridiculous, and I can't see
any way for a disk problem to cause that.  (Unless this shared-buffer
header got swapped out and back in, which seems unlikely considering we
use the shared buffer headers a lot.)  Seems like it's got to be bad
RAM, bad CPU, or some part directly between them --- and you already
replaced all those parts.
        regards, tom lane


Re: WARNING: buffer refcount leak

От
Gavin Sherry
Дата:
On Mon, 26 Jul 2004, Brian Hirt wrote:

> I'm working on a new machine, and i think it's got possible bad
> hardware, since that seems more likely than a bug in postgresql.  I'm
> wondering if someone has any idea what kind of hardware failure might
> cause this message:
>
> WARNING:  buffer refcount leak: [424] (freeNext=425, freePrev=423,
> rel=0/0, blockNum=4294967295, flags=0x1c, refcount=-631 30464)

The refcount number strongly suggests hardware. Since the memory is ECC
i'd say it might be CPU (cache) related. I cannot think of any tools to
help you diagnose this, but try disabling/pulling out a CPU then swap. If
all is fine, it doesn't necessarily mean its the CPU but it narrows the
field.

Gavin