Re: Incorrect snapshots while promoting hot standby node when 2PC is used

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Incorrect snapshots while promoting hot standby node when 2PC is used
Дата
Msg-id YK3/ayaKnpLAMYoZ@paquier.xyz
обсуждение исходный текст
Ответ на Incorrect snapshots while promoting hot standby node when 2PC is used  (Andres Freund <andres@anarazel.de>)
Ответы Re: Incorrect snapshots while promoting hot standby node when 2PC is used  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Thu, Apr 22, 2021 at 01:36:03PM -0700, Andres Freund wrote:
> The sequence in StartupXLOG() leading to the issue is the following:
>
> 1) RecoverPreparedTransactions();
> 2) ShutdownRecoveryTransactionEnvironment();
> 3) XLogCtl->SharedRecoveryState = RECOVERY_STATE_DONE;
>
> Because 2) resets the KnownAssignedXids machinery, snapshots that happen
> between 2) and 3) will not actually look at the procarray to compute
> snapshots, as that only happens within
>
>     snapshot->takenDuringRecovery = RecoveryInProgress();
>     if (!snapshot->takenDuringRecovery)
>
> and RecoveryInProgress() checks XLogCtl->SharedRecoveryState !=
> RECOVERY_STATE_DONE, which is set in 3).

Oh, indeed.  It is easy to see RecentXmin jumping back-and-worth while
running 023_pitr_prepared_xact.pl with a small sleep added just after
ShutdownRecoveryTransactionEnvironment().

> Without prepared transaction there probably isn't an issue, because there
> shouldn't be any other in-progress xids at that point?

Yes, there should not be any as far as I recall.  2PC is kind of
special with its fake ProcArray entries.

> I think to fix the issue we'd have to move
> ShutdownRecoveryTransactionEnvironment() to after XLogCtl->SharedRecoveryState
> = RECOVERY_STATE_DONE.
>
> The acquisition of ProcArrayLock() in
> ShutdownRecoveryTransactionEnvironment()->ExpireAllKnownAssignedTransactionIds()
> should prevent the data from being removed between the RecoveryInProgress()
> and the KnownAssignedXidsGetAndSetXmin() calls in GetSnapshotData().
>
> I haven't yet figured out whether there would be a problem with deferring the
> other tasks in ShutdownRecoveryTransactionEnvironment() until after
> RECOVERY_STATE_DONE.

Hmm.  This would mean releasing all the exclusive locks tracked by a
standby, as of StandbyReleaseAllLocks(), after opening the instance
for writes after a promotion.  I don't think that's unsafe, but it
would be intrusive.

Anyway, isn't the issue ExpireAllKnownAssignedTransactionIds() itself,
where we should try to not wipe out the 2PC entries to make sure that
all those snapshots still see the 2PC transactions as something to
count on?  I am attaching a crude patch to show the idea.

Thoughts?
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "houzj.fnst@fujitsu.com"
Дата:
Сообщение: Fix typo: multiple tuple => tuples
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay