Re: hot_standby_feedback vs excludeVacuum and snapshots

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: hot_standby_feedback vs excludeVacuum and snapshots
Дата
Msg-id CANP8+jK__bi2AtU=FkBn78Xo_phd_Mkgyf1q9xm0iMEWS7bNSA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: hot_standby_feedback vs excludeVacuum and snapshots  (Andres Freund <andres@anarazel.de>)
Ответы Re: hot_standby_feedback vs excludeVacuum and snapshots  (Amit Kapila <amit.kapila16@gmail.com>)
Re: hot_standby_feedback vs excludeVacuum and snapshots  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 7 June 2018 at 22:25, Andres Freund <andres@anarazel.de> wrote:
> On 2018-06-07 14:19:18 -0700, Andres Freund wrote:

> Look at:
>
> void
> ProcArrayApplyRecoveryInfo(RunningTransactions running)
> ...
>         /*
>          * Remove stale locks, if any.
>          *
>          * Locks are always assigned to the toplevel xid so we don't need to care
>          * about subxcnt/subxids (and by extension not about ->suboverflowed).
>          */
>         StandbyReleaseOldLocks(running->xcnt, running->xids);
>
> by excluding running transactions you have, as far as I can tell,
> effectively removed the vacuum truncation AEL from the standby.

I agree, that is correct, there is a bug in my recent commit that
causes a small race window that could potentially lead to someone
reading the size of a relation just before it is truncated and then
falling off the end of the scan, resulting in a block access ERROR,
potentially much later than the truncate.

I have also found another bug which affects what we do next.

For context, AEL locks are normally removed by COMMIT or ABORT.
StandbyReleaseOldLocks() is just a sweeper to catch anything that
didn't send an abort before it died, so it hardly ever activates. The
coding of StandbyReleaseOldLocks() is backwards... if it ain't in the
running list, then we remove it.

But that wasn't working correctly either, since as of 49bff5300d527 we
assigned AELs to subxids. Subxids weren't in the running list and so
AELs held by them would have been removed at the wrong time, an extant
bug in PG10. It looks to me like they would have been removed either
early or late, up to the next runningxact info record. They would be
removed, so no leakage, but the late timing wouldn't be noticeable for
tests or most usage, since it would look just like lock contention.
Early release might give same issue of block access to non-existent
block/file.

So the attached patch fixes both the bug in the recent commit and the
one I just found by observation of 49bff5300d527, since they are
related.

StandbyReleaseOldLocks() can sweep in the same way as
ExpireOldKnownAssignedTransactionIds().

> I also don't understand why this change would be backpatched in the
> first place. It's a relatively minor efficiency thing, no?

As for everything, that is open to discussion. Yes, it seems minor to
me.... until it affects you, then its not. It seems to have affected
Greg.

The attached patch, or a later revision, needs to be backpatched to
PG10 independently of the recent committed patch.

I have yet to test this manually, but will do so tomorrow morning.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: why partition pruning doesn't work?
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: Concurrency bug in UPDATE of partition-key