Hi,
Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Feb 20, 2019 at 11:50:42AM +0100, Oleksii Kliukin wrote:
>> RecoverPreparedTransactions calls ProcessRecords with
>> twophase_recover_callbacks right after releasing the TwoPhaseStateLock, so I
>> think lock_held should be false here (matching the similar call of
>> TwoPhaseGetDummyProc at lock_twophase_recover).
>
> Indeed. That would be a bad idea. We don't actually stress this
> routine in 009_twophase.pl or the assertion I added would have blown
> up immediately. So I propose to close the gap, and the updated patch
> attached adds dedicated tests causing the problem you spotted to be
> triggered (soft and hard restarts with 2PC transactions including
> DDLs). The previous version of the patch and those tests cause the
> assertion to blow up, failing the tests.
Thank you for updating the patch and sorry for the delay, it looks good to
me, the tests pass on my system.
>> Since the patch touches TwoPhaseGetDummyBackendId, let’s fix the comment
>> header to mention it instead of TwoPhaseGetDummyProc
>
> Not sure I understand the issue you are pointing out here. The patch
> adds an extra argument to both TwoPhaseGetDummyProc() and
> TwoPhaseGetDummyBackendId(), and the headers of both functions
> document the new argument.
Just this:
@@ -844,17 +851,18 @@ TwoPhaseGetGXact(TransactionId xid)
}
/*
- * TwoPhaseGetDummyProc
+ * TwoPhaseGetDummyBackendId
* Get the dummy backend ID for prepared transaction specified by XID
>
> One extra trick I have been using for testing this patch is the
> following:
> --- a/src/backend/access/transam/twophase.c
> +++ b/src/backend/access/transam/twophase.c
> @@ -816,13 +816,6 @@ TwoPhaseGetGXact(TransactionId xid, bool lock_held)
>
> Assert(!lock_held || LWLockHeldByMe(TwoPhaseStateLock));
>
> - /*
> - * During a recovery, COMMIT PREPARED, or ABORT PREPARED, we'll be called
> - * repeatedly for the same XID. We can save work with a simple cache.
> - */
> - if (xid == cached_xid)
> - return cached_gxact;
>
> This has the advantage to check more aggressively for lock conflicts,
> causing the tests to deadlock if the flag is not correctly set in the
> worst case scenario.
Nice, thank you. I ran all my tests with it and found no further issues.
Regards,
Oleksii Kliukin