Обсуждение: PrivateRefCount patch has got issues
I just happened to look into bufmgr.c for the first time in awhile, and noticed the privaterefcount-is-no-longer-a-simple-array stuff. It doesn't look too well thought out to me. In particular, PinBuffer_Locked calls GetPrivateRefCountEntry while holding a buffer-header spinlock. That seems completely unacceptable. It's certainly a huge violation of our design principle that spinlocks should be held for only a few instructions; and I rather suspect that a palloc failure down inside the hashtable entry-allocation code would leave things in a bad state. It's also depressing that the very common code path ReleaseBuffer->UnpinBuffer results in a double search of the array/hashtable; that should be refactored to avoid that. regards, tom lane
Hi, On 2014-12-16 18:25:13 -0500, Tom Lane wrote: > I just happened to look into bufmgr.c for the first time in awhile, and > noticed the privaterefcount-is-no-longer-a-simple-array stuff. It doesn't > look too well thought out to me. In particular, PinBuffer_Locked calls > GetPrivateRefCountEntry while holding a buffer-header spinlock. That > seems completely unacceptable. Argh, yes. That certainly isn't ok. The easiest way to fix that seems to be to declare that PinBuffer_Locked can only be used when we're guaranteed to not have pinned the buffer. That happens to be true for all the existing users. In fact all of them even seem to require the refcount to be zero across all backends. That prerequisite then allows to increase the buffer header refcount before releasing the spinlock *and* before increasing the private refcount. > It's also depressing that the very common code path > ReleaseBuffer->UnpinBuffer results in a double search of the > array/hashtable; that should be refactored to avoid that. Sounds like a good idea, will see how that works out to look. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-12-16 18:25:13 -0500, Tom Lane wrote: >> I just happened to look into bufmgr.c for the first time in awhile, and >> noticed the privaterefcount-is-no-longer-a-simple-array stuff. It doesn't >> look too well thought out to me. In particular, PinBuffer_Locked calls >> GetPrivateRefCountEntry while holding a buffer-header spinlock. That >> seems completely unacceptable. > Argh, yes. That certainly isn't ok. > The easiest way to fix that seems to be to declare that PinBuffer_Locked > can only be used when we're guaranteed to not have pinned the > buffer. That happens to be true for all the existing users. In fact all > of them even seem to require the refcount to be zero across all > backends. That prerequisite then allows to increase the buffer header > refcount before releasing the spinlock *and* before increasing the > private refcount. Hm, if you do it like that, what happens if we get a palloc failure while trying to record the private refcount? I think you must not bump the pin count in shared memory unless you're certain you can record the fact that you've done so. The idea I'd been wondering about hinged on the same observation that we know the buffer is not pinned (by our process) already, but the mechanics would be closer to what we do in resource managers: reserve space first, do the thing that needs to be remembered, bump the count using the reserved space. Given the way you've set this up, the idea boils down to having a precheck call that forces there to be an empty slot in the local fastpath array (by pushing something out to the hash table if necessary) before we start trying to pin the buffer. Then it's guaranteed that the "record" step will succeed. You could possibly even arrange it so that it's known which array entry needs to be used and then the "record" part is just a couple of inline instructions, so that it'd be OK to do that while still holding the spinlock. Otherwise it would still be a good idea to do the "record" after releasing the spinlock IMO; but either way this avoids the issue of possible state inconsistency due to a midflight palloc failure. regards, tom lane
Hi, Sorry for taking long to get back to this... On 2014-12-21 13:21:56 -0500, Tom Lane wrote: > The idea I'd been wondering about hinged on the same observation that we > know the buffer is not pinned (by our process) already, but the mechanics > would be closer to what we do in resource managers: reserve space first, > do the thing that needs to be remembered, bump the count using the > reserved space. Given the way you've set this up, the idea boils down to > having a precheck call that forces there to be an empty slot in the local > fastpath array (by pushing something out to the hash table if necessary) > before we start trying to pin the buffer. Then it's guaranteed that the > "record" step will succeed. After pondering the problem for a while I agree that this is the best approach. I've implemented it and it imo actually looks cleaner than before because GetPrivateRefCountEntry() is much simpler. > You could possibly even arrange it so that > it's known which array entry needs to be used and then the "record" part > is just a couple of inline instructions, so that it'd be OK to do that > while still holding the spinlock. Otherwise it would still be a good idea > to do the "record" after releasing the spinlock IMO; but either way this > avoids the issue of possible state inconsistency due to a midflight > palloc failure. I've indeed done it in a way that the reserved entry is remembered. Primarily because that will, in many situations, allow us to avoid searching the array for a new entry because we can setup a reserved slot when forgetting a private refcount entry. While the previous reservation would make it safe to do the private tracking with the spinlock held I've moved it to after the release; based on our observation that it's never used for buffers already locally pinned. Also added an assert checking that that actually still is the case. That observation also allows us to entirely forgo searching the private refcount array in PinBuffer_locked(), we can just directly enter the new entry. That is actually visible in profiles that are much larger than s_b. I've just written this patch down in one go, so I'll let the topic rest for a couple days and commit it then after looking through it again. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services