Обсуждение: Anyone understand shared buffer refcount mechanism?

Поиск
Список
Период
Сортировка

Anyone understand shared buffer refcount mechanism?

От
Tom Lane
Дата:
You may recall I was complaining a while back of "out of free buffers:
time to abort !" errors when running the regression tests with
nonstandard optimizer flags.  Those are still there, but still hard to
reproduce.  Also, I have been trying to fix ALTER TABLE RENAME so that
it flushes buffers for the target table before renaming the underlying
files (otherwise subsequent mdblindwrt will fail), and have been seeing
that code fail because of buffers being left pinned (refcount > 0) when
the only running backend claims that it does not have them pinned
(PrivateRefCount & LastRefCount are 0).  So I am pretty sure that
something is rotten in the buffer refcount accounting.

In trying to understand what the code is doing, I am confused by the
buffer refcount save/restore mechanism.  Why does the executor want
to save/restore buffer refcounts?  I can sort of see that that might
be a way to clean up buffers that have been pinned and need to be
unpinned, but it seems like it's a kluge around failure to unpin in
the code that did the pinning, if so.  If it *is* a way to do that,
shouldn't BufferRefCountRestore unpin the buffer completely if it
restores PrivateRefCount & LastRefCount to 0?  I am not sure that this
is where the refcount is getting leaked, but it looks like a possibility.

Also, it bothers me that there is a separation between PrivateRefCount
and LastRefCount.  Why not just have PrivateRefCount and let the
save/restore mechanisms save/restore those values, without zeroing out
PrivateRefCount during BufferRefCountReset?  The zeroing seems to have
the effect of having BufferValid claim in the inner executor context
that buffers pinned in the outer executor context aren't pinned ---
which is weird at best.

If anyone understands why this mechanism is designed this way,
please tell me about it.
        regards, tom lane


Re: [HACKERS] Anyone understand shared buffer refcount mechanism?

От
Vadim Mikheev
Дата:
Tom Lane wrote:
> 
> In trying to understand what the code is doing, I am confused by the
> buffer refcount save/restore mechanism.  Why does the executor want
> to save/restore buffer refcounts?  I can sort of see that that might

...

> If anyone understands why this mechanism is designed this way,
> please tell me about it.

This bothered me for long time too.
The only explanation I see in execMain.c:

/** reset buffer refcount.  the current refcounts are saved and will be* restored when ExecutorEnd is called** this
makessure that when ExecutorRun's are called recursively as for* postquel functions, the buffers pinned by one
ExecutorRunwill not* be unpinned by another ExecutorRun.*/
 

But buffers pinned by one Executor invocation SHOULDN'T
be unpinned by another one (if there are no bugs in code,
but this is another story).

So, try to remove this save/restore mechanism and let's see...

Vadim


Re: [HACKERS] Anyone understand shared buffer refcount mechanism?

От
Tom Lane
Дата:
Vadim Mikheev <vadim@krs.ru> writes:
> Tom Lane wrote:
>> In trying to understand what the code is doing, I am confused by the
>> buffer refcount save/restore mechanism.  Why does the executor want
>> to save/restore buffer refcounts?  I can sort of see that that might

> This bothered me for long time too.
> The only explanation I see in execMain.c:

>  * this makes sure that when ExecutorRun's are called recursively as for
>  * postquel functions, the buffers pinned by one ExecutorRun will not
>  * be unpinned by another ExecutorRun.

The case that is currently failing for me is postquel function calls
(the "misc" regress test contains some, and it's spewing Buffer Leak
notices like crazy, now that I fixed BufferLeakCheck to notice nonzero
LastRefCount as well as nonzero PrivateRefCount).  So there's something
rotten here.  I will keep looking at it.

> So, try to remove this save/restore mechanism and let's see...

It does seem that BufferRefCountRestore is actually unpinning some
things (things got much better after I fixed it to really do the
unpin when restoring a nonzero refcount to zero).  So I don't
think I want to try to take out the save/restore entirely.  What
it looks like right now is that a few specific paths through the
executor restore the wrong counts...
        regards, tom lane