Re: Prepared transaction releasing locks before deregistering its GID
От | Masahiko Sawada |
---|---|
Тема | Re: Prepared transaction releasing locks before deregistering its GID |
Дата | |
Msg-id | CAD21AoAYLxWaH0v=8Xve53WUbhiQ+WWZ1GJd-5AkidxOshOiUQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Prepared transaction releasing locks before deregistering its GID (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Prepared transaction releasing locks before deregistering its GID
(Michael Paquier <michael@paquier.xyz>)
|
Список | pgsql-hackers |
On Tue, Feb 19, 2019 at 12:27 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Feb 18, 2019 at 05:05:13PM +0100, Oleksii Kliukin wrote: > > That looks like a race condition to me. What happens is that another > > transaction with the name identical to the running one can start and proceed > > to the prepare phase while the original one commits, failing at last instead > > of waiting for the original one to finish. > > It took me 50 clients and a bit more than 20 seconds, but I have been > able to reproduce the problem with one error. Thanks for the > reproducer. This is indeed a race condition with 2PC. > > > By looking at the source code of FinishPreparedTransaction() I can see the > > RemoveGXact() call, which removes the prepared transaction from > > TwoPhaseState->prepXacts. It is placed at the very end of the function, > > after the post-commit callbacks that clear out the locks held by the > > transaction. Those callbacks are not guarded by the TwoPhaseStateLock, > > resulting in a possibility for a concurrent session to proceed will > > MarkAsPreparing after acquiring the locks released by them. > > Hm, I see. Taking a breakpoint just after ProcessRecords() or putting > a sleep there makes the issue plain. The same issue can happen with > predicate locks. > > > I couldn’t find any documentation on the expected outcome in cases like > > this, so I assume it might not be a bug, but an undocumented behavior. > > If you run two transactions in parallel using your script, the second > transaction would wait at LOCK time until the first transaction > releases its locks with the COMMIT PREPARED. > > > Should I go about and produce a patch to put a note in the description of > > commit prepared, or is there any interest in changing the behavior to avoid > > such conflicts altogether (perhaps by holding the lock throughout the > > cleanup phase)? > > That's a bug, let's fix it. I agree with your suggestion that we had > better not release the 2PC lock using the callbacks for COMMIT > PREPARED or ROLLBACK PREPARED until the shared memory state is > cleared. At the same time, I think that we should be smarter in the > ordering of the actions: we care about predicate locks here, but not > about the on-disk file removal and the stat counters. One trick is > that the post-commit callback calls TwoPhaseGetDummyProc() which would > try to take TwoPhaseStateLock which needs to be applied so we need > some extra logic to not take a lock in this case. From what I can see > this is older than 9.4 as the removal of the GXACT entry in shared > memory and the post-commit hooks are out of sync for a long time :( > > Attached is a patch that fixes the problem for me. Please note the > safety net in TwoPhaseGetGXact(). Thank you for working on this. @@ -811,6 +811,9 @@ TwoPhaseGetGXact(TransactionId xid) static TransactionId cached_xid = InvalidTransactionId; static GlobalTransaction cached_gxact = NULL; + Assert(!lock_held || + LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE)); + /* * During a recovery, COMMIT PREPARED, or ABORT PREPARED, we'll be called * repeatedly for the same XID. We can save work with a simple cache. @@ -818,7 +821,8 @@ TwoPhaseGetGXact(TransactionId xid) if (xid == cached_xid) return cached_gxact; - LWLockAcquire(TwoPhaseStateLock, LW_SHARED); + if (!lock_held) + LWLockAcquire(TwoPhaseStateLock, LW_SHARED); for (i = 0; i < TwoPhaseState->numPrepXacts; i++) { It seems strange to me, why do we always require an exclusive lock here in spite of acquiring a shared lock? ----- @@ -854,7 +859,7 @@ TwoPhaseGetGXact(TransactionId xid) BackendId TwoPhaseGetDummyBackendId(TransactionId xid) { - GlobalTransaction gxact = TwoPhaseGetGXact(xid); + GlobalTransaction gxact = TwoPhaseGetGXact(xid, false); return gxact->dummyBackendId; } TwoPhaseGetDummyBackendId() is called by multixact_twophase_postcommit() which is called while holding TwoPhaseStateLock in exclusive mode in FinishPreparedTransaction(). Since we cache the global transaction when we call lock_twophase_commit() I couldn't produce issue but it seems to have a potential of locking problem. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Kato, Sho"Дата:
Сообщение: RE: Speeding up creating UPDATE/DELETE generic plan for partitionedtable into a lot
Следующее
От: "Tsunakawa, Takayuki"Дата:
Сообщение: RE: Speed up transaction completion faster after many relations areaccessed in a transaction