Обсуждение: Anyone understand shared buffer refcount mechanism?
You may recall I was complaining a while back of "out of free buffers: time to abort !" errors when running the regression tests with nonstandard optimizer flags. Those are still there, but still hard to reproduce. Also, I have been trying to fix ALTER TABLE RENAME so that it flushes buffers for the target table before renaming the underlying files (otherwise subsequent mdblindwrt will fail), and have been seeing that code fail because of buffers being left pinned (refcount > 0) when the only running backend claims that it does not have them pinned (PrivateRefCount & LastRefCount are 0). So I am pretty sure that something is rotten in the buffer refcount accounting. In trying to understand what the code is doing, I am confused by the buffer refcount save/restore mechanism. Why does the executor want to save/restore buffer refcounts? I can sort of see that that might be a way to clean up buffers that have been pinned and need to be unpinned, but it seems like it's a kluge around failure to unpin in the code that did the pinning, if so. If it *is* a way to do that, shouldn't BufferRefCountRestore unpin the buffer completely if it restores PrivateRefCount & LastRefCount to 0? I am not sure that this is where the refcount is getting leaked, but it looks like a possibility. Also, it bothers me that there is a separation between PrivateRefCount and LastRefCount. Why not just have PrivateRefCount and let the save/restore mechanisms save/restore those values, without zeroing out PrivateRefCount during BufferRefCountReset? The zeroing seems to have the effect of having BufferValid claim in the inner executor context that buffers pinned in the outer executor context aren't pinned --- which is weird at best. If anyone understands why this mechanism is designed this way, please tell me about it. regards, tom lane
Tom Lane wrote: > > In trying to understand what the code is doing, I am confused by the > buffer refcount save/restore mechanism. Why does the executor want > to save/restore buffer refcounts? I can sort of see that that might ... > If anyone understands why this mechanism is designed this way, > please tell me about it. This bothered me for long time too. The only explanation I see in execMain.c: /** reset buffer refcount. the current refcounts are saved and will be* restored when ExecutorEnd is called** this makessure that when ExecutorRun's are called recursively as for* postquel functions, the buffers pinned by one ExecutorRunwill not* be unpinned by another ExecutorRun.*/ But buffers pinned by one Executor invocation SHOULDN'T be unpinned by another one (if there are no bugs in code, but this is another story). So, try to remove this save/restore mechanism and let's see... Vadim
Vadim Mikheev <vadim@krs.ru> writes: > Tom Lane wrote: >> In trying to understand what the code is doing, I am confused by the >> buffer refcount save/restore mechanism. Why does the executor want >> to save/restore buffer refcounts? I can sort of see that that might > This bothered me for long time too. > The only explanation I see in execMain.c: > * this makes sure that when ExecutorRun's are called recursively as for > * postquel functions, the buffers pinned by one ExecutorRun will not > * be unpinned by another ExecutorRun. The case that is currently failing for me is postquel function calls (the "misc" regress test contains some, and it's spewing Buffer Leak notices like crazy, now that I fixed BufferLeakCheck to notice nonzero LastRefCount as well as nonzero PrivateRefCount). So there's something rotten here. I will keep looking at it. > So, try to remove this save/restore mechanism and let's see... It does seem that BufferRefCountRestore is actually unpinning some things (things got much better after I fixed it to really do the unpin when restoring a nonzero refcount to zero). So I don't think I want to try to take out the save/restore entirely. What it looks like right now is that a few specific paths through the executor restore the wrong counts... regards, tom lane