Re: Prepared transaction releasing locks before deregistering its GID
От | Oleksii Kliukin |
---|---|
Тема | Re: Prepared transaction releasing locks before deregistering its GID |
Дата | |
Msg-id | 1ECFC537-A8B5-4087-9469-7190630A3693@hintbits.com обсуждение исходный текст |
Ответ на | Re: Prepared transaction releasing locks before deregistering its GID (Michael Paquier <michael@paquier.xyz>) |
Ответы |
Re: Prepared transaction releasing locks before deregistering its GID
Re: Prepared transaction releasing locks before deregistering its GID |
Список | pgsql-hackers |
Hi, Michael Paquier <michael@paquier.xyz> wrote: > On Mon, Feb 18, 2019 at 05:05:13PM +0100, Oleksii Kliukin wrote: >> That looks like a race condition to me. What happens is that another >> transaction with the name identical to the running one can start and proceed >> to the prepare phase while the original one commits, failing at last instead >> of waiting for the original one to finish. > > It took me 50 clients and a bit more than 20 seconds, but I have been > able to reproduce the problem with one error. Thanks for the > reproducer. This is indeed a race condition with 2PC. Yes, as a race condition I could consistently reproduce it on one system, but not on another. On my macbook it never manifested in an error; however, on one occasion I saw sessions waiting on a lock and an orphaned record in pg_prepared_xacts. I couldn’t find any evidences in the logs or elsewhere what has happened to that session, so I just assumed it was, perhaps, killed by the test cleanup. >> By looking at the source code of FinishPreparedTransaction() I can see the >> RemoveGXact() call, which removes the prepared transaction from >> TwoPhaseState->prepXacts. It is placed at the very end of the function, >> after the post-commit callbacks that clear out the locks held by the >> transaction. Those callbacks are not guarded by the TwoPhaseStateLock, >> resulting in a possibility for a concurrent session to proceed will >> MarkAsPreparing after acquiring the locks released by them. > > Hm, I see. Taking a breakpoint just after ProcessRecords() or putting > a sleep there makes the issue plain. The same issue can happen with > predicate locks. I did verify this on my part by putting a TwoPhaseStateLock right before the line saying "We assume it's safe to do this without taking TwoPhaseStateLock” and observing that the errors has stopped over multiple runs. >> I couldn’t find any documentation on the expected outcome in cases like >> this, so I assume it might not be a bug, but an undocumented behavior. > > If you run two transactions in parallel using your script, the second > transaction would wait at LOCK time until the first transaction > releases its locks with the COMMIT PREPARED. That is the desired outcome, right? > >> Should I go about and produce a patch to put a note in the description of >> commit prepared, or is there any interest in changing the behavior to avoid >> such conflicts altogether (perhaps by holding the lock throughout the >> cleanup phase)? > > That's a bug, let's fix it. I agree with your suggestion that we had > better not release the 2PC lock using the callbacks for COMMIT > PREPARED or ROLLBACK PREPARED until the shared memory state is > cleared. At the same time, I think that we should be smarter in the > ordering of the actions: we care about predicate locks here, but not > about the on-disk file removal and the stat counters. One trick is > that the post-commit callback calls TwoPhaseGetDummyProc() which would > try to take TwoPhaseStateLock which needs to be applied so we need > some extra logic to not take a lock in this case. From what I can see > this is older than 9.4 as the removal of the GXACT entry in shared > memory and the post-commit hooks are out of sync for a long time :( > > Attached is a patch that fixes the problem for me. Please note the > safety net in TwoPhaseGetGXact(). The approach looks good to me. Surprisingly, I saw no stalled backends because of the double acquisition of lock at TwoPhaseGetGXact once I put a simple TwoPhaseStateLock right before the "gxact->valid = false” line; I will test your patch and post the outcome. Regards, Oleksii Kliukin
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Michael PaquierДата:
Сообщение: Re: Prepared transaction releasing locks before deregistering its GID