Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: pgsql: Avoid duplicate XIDs at recovery when building initialsnapshot
Дата
Msg-id 20181014174240.byktkskdymae7kmy@alap3.anarazel.de
обсуждение исходный текст
Ответ на pgsql: Avoid duplicate XIDs at recovery when building initial snapshot  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-committers
On 2018-10-14 13:26:24 +0000, Michael Paquier wrote:
> Avoid duplicate XIDs at recovery when building initial snapshot
> 
> On a primary, sets of XLOG_RUNNING_XACTS records are generated on a
> periodic basis to allow recovery to build the initial state of
> transactions for a hot standby.  The set of transaction IDs is created
> by scanning all the entries in ProcArray.  However it happens that its
> logic never counted on the fact that two-phase transactions finishing to
> prepare can put ProcArray in a state where there are two entries with
> the same transaction ID, one for the initial transaction which gets
> cleared when prepare finishes, and a second, dummy, entry to track that
> the transaction is still running after prepare finishes.  This way
> ensures a continuous presence of the transaction so as callers of for
> example TransactionIdIsInProgress() are always able to see it as alive.
> 
> So, if a XLOG_RUNNING_XACTS takes a standby snapshot while a two-phase
> transaction finishes to prepare, the record can finish with duplicated
> XIDs, which is a state expected by design.  If this record gets applied
> on a standby to initial its recovery state, then it would simply fail,
> so the odds of facing this failure are very low in practice.  It would
> be tempting to change the generation of XLOG_RUNNING_XACTS so as
> duplicates are removed on the source, but this requires to hold on
> ProcArrayLock for longer and this would impact all workloads,
> particularly those using heavily two-phase transactions.
> 
> XLOG_RUNNING_XACTS is also actually used only to initialize the standby
> state at recovery, so instead the solution is taken to discard
> duplicates when applying the initial snapshot.
> 
> Diagnosed-by: Konstantin Knizhnik
> Author: Michael Paquier
> Discussion: https://postgr.es/m/0c96b653-4696-d4b4-6b5d-78143175d113@postgrespro.ru
> Backpatch-through: 9.3

I'm unhappy this approach was taken over objections. Without a real
warning.   Even leaving the crummyness aside, did you check other users
of XLOG_RUNNING_XACTS, e.g. logical decoding?

Greetings,

Andres Freund


В списке pgsql-committers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: pgsql: Use PlaceHolderVars within the quals of a FULL JOIN.
Следующее
От: Tom Lane
Дата:
Сообщение: pgsql: Make some subquery-using test cases a bit more robust.