Re: Prepared transaction releasing locks before deregistering its GID

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Prepared transaction releasing locks before deregistering its GID
Дата
Msg-id CAD21AoAYLxWaH0v=8Xve53WUbhiQ+WWZ1GJd-5AkidxOshOiUQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Prepared transaction releasing locks before deregistering its GID  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Prepared transaction releasing locks before deregistering its GID  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Tue, Feb 19, 2019 at 12:27 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Feb 18, 2019 at 05:05:13PM +0100, Oleksii Kliukin wrote:
> > That looks like a race condition to me. What happens is that another
> > transaction with the name identical to the running one can start and proceed
> > to the prepare phase while the original one commits, failing at last instead
> > of waiting for the original one to finish.
>
> It took me 50 clients and a bit more than 20 seconds, but I have been
> able to reproduce the problem with one error.  Thanks for the
> reproducer.  This is indeed a race condition with 2PC.
>
> > By looking at the source code of FinishPreparedTransaction() I can see the
> > RemoveGXact() call, which removes the prepared transaction from
> > TwoPhaseState->prepXacts. It is placed at the very end of the function,
> > after the post-commit callbacks that clear out the locks held by the
> > transaction. Those callbacks are not guarded by the TwoPhaseStateLock,
> > resulting in a possibility for a concurrent session to proceed will
> > MarkAsPreparing after acquiring the locks released by them.
>
> Hm, I see.  Taking a breakpoint just after ProcessRecords() or putting
> a sleep there makes the issue plain.  The same issue can happen with
> predicate locks.
>
> > I couldn’t find any documentation on the expected outcome in cases like
> > this, so I assume it might not be a bug, but an undocumented behavior.
>
> If you run two transactions in parallel using your script, the second
> transaction would wait at LOCK time until the first transaction
> releases its locks with the COMMIT PREPARED.
>
> > Should I go about and produce a patch to put a note in the description of
> > commit prepared, or is there any interest in changing the behavior to avoid
> > such conflicts altogether (perhaps by holding the lock throughout the
> > cleanup phase)?
>
> That's a bug, let's fix it.  I agree with your suggestion that we had
> better not release the 2PC lock using the callbacks for COMMIT
> PREPARED or ROLLBACK PREPARED until the shared memory state is
> cleared.  At the same time, I think that we should be smarter in the
> ordering of the actions: we care about predicate locks here, but not
> about the on-disk file removal and the stat counters.  One trick is
> that the post-commit callback calls TwoPhaseGetDummyProc() which would
> try to take TwoPhaseStateLock which needs to be applied so we need
> some extra logic to not take a lock in this case.  From what I can see
> this is older than 9.4 as the removal of the GXACT entry in shared
> memory and the post-commit hooks are out of sync for a long time :(
>
> Attached is a patch that fixes the problem for me.  Please note the
> safety net in TwoPhaseGetGXact().

Thank you for working on this.

@@ -811,6 +811,9 @@ TwoPhaseGetGXact(TransactionId xid)
        static TransactionId cached_xid = InvalidTransactionId;
        static GlobalTransaction cached_gxact = NULL;

+       Assert(!lock_held ||
+                  LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE));
+
        /*
         * During a recovery, COMMIT PREPARED, or ABORT PREPARED,
we'll be called
         * repeatedly for the same XID.  We can save work with a simple cache.
@@ -818,7 +821,8 @@ TwoPhaseGetGXact(TransactionId xid)
        if (xid == cached_xid)
                return cached_gxact;

-       LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+       if (!lock_held)
+               LWLockAcquire(TwoPhaseStateLock, LW_SHARED);

        for (i = 0; i < TwoPhaseState->numPrepXacts; i++)
        {

It seems strange to me, why do we always require an exclusive lock
here in spite of acquiring a shared lock?

-----
@@ -854,7 +859,7 @@ TwoPhaseGetGXact(TransactionId xid)
 BackendId
 TwoPhaseGetDummyBackendId(TransactionId xid)
 {
-       GlobalTransaction gxact = TwoPhaseGetGXact(xid);
+       GlobalTransaction gxact = TwoPhaseGetGXact(xid, false);

        return gxact->dummyBackendId;
 }

TwoPhaseGetDummyBackendId() is called by
multixact_twophase_postcommit() which is called while holding
TwoPhaseStateLock in exclusive mode in FinishPreparedTransaction().
Since we cache the global transaction when we call
lock_twophase_commit() I couldn't produce issue but it seems to have a
potential of locking problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kato, Sho"
Дата:
Сообщение: RE: Speeding up creating UPDATE/DELETE generic plan for partitionedtable into a lot
Следующее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: RE: Speed up transaction completion faster after many relations areaccessed in a transaction