Re: lastOverflowedXid does not handle transaction ID wraparound

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: lastOverflowedXid does not handle transaction ID wraparound
Дата
Msg-id CANbhV-F-EAPh+HwYEpRo3v6m6_O9AOmthM=+tUsXY==MG+nV4A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: lastOverflowedXid does not handle transaction ID wraparound  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: lastOverflowedXid does not handle transaction ID wraparound  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On Thu, 21 Oct 2021 at 05:01, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
> At Wed, 20 Oct 2021 08:55:12 -0700, Stan Hu <stanhu@gmail.com> wrote in
> > On Wed, Oct 20, 2021 at 4:00 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >
> > >
> > >
> > > > 17 окт. 2021 г., в 21:55, Dmitry Dolgov <9erthalion6@gmail.com>
> > > написал(а):
> > > > I wonder what would be side
> > > > effects of clearing it when the snapshot is not suboverfloved anymore?
> > >
> > > I think we should just invalidate lastOverflowedXid on every
> > > XLOG_RUNNING_XACTS if subxid_overflow == false. I can't find a reason not
> > > to do so.

I believe that to be an incorrect fix, but so very nearly correct.
There is a documented race condition in the generation of a
XLOG_RUNNING_XACTS that means there could be a new overflow event
after the snapshot was taken but before it was logged.

> > On a replica, I think it's possible for lastOverflowedXid to be set even if
> > subxid_overflow is false on the primary and secondary (
> >
https://github.com/postgres/postgres/blob/dc899146dbf0e1d23fb24155a5155826ddce34c9/src/backend/storage/ipc/procarray.c#L1327).
> > I thought subxid_overflow only gets set if there are more than
> > PGPROC_MAX_CACHED_SUBXIDS (64) used in a given transaction.
> >
> > Should the replica be invalidating lastOverflowedXid if subxcnt goes to
> > zero in XLOG_RUNNING_XACTS? But if there's an outstanding snapshot with an
> > xmin that precedes lastOverflowedXid we might violate MVCC if we invalidate
> > this, so I wonder if we also need to check the snapshot with the lowest
> > xmin?
>
> lastOverflowedXid is the smallest subxid that possibly exists but
> possiblly not known to the standby. So if all top-level transactions
> older than lastOverflowedXid end, that means that all the
> subtransactions in doubt are known to have been ended.

Agreed

> XLOG_RUNNING_XACTS reports oldestRunningXid, which is the oldest
> running top-transaction.  Standby expires xids in KnownAssignedXids
> array that precede to the oldestRunningXid.  We are sure that all
> possiblly-overflown subtransactions are gone as well if the oldest xid
> is newer than the first overflowed subtransaction.

Agreed

> As a cross check, the following existing code in GetSnapshotData means
> that no overflow is not happening if the smallest xid in the known
> assigned list is larger than lastOverflowedXid, which agrees to the
> consideration above.
>
> procaray.c:2428
> >               subcount = KnownAssignedXidsGetAndSetXmin(snapshot->subxip, &xmin,
> >                                                                                                 xmax);
> >
> >               if (TransactionIdPrecedesOrEquals(xmin, procArray->lastOverflowedXid))
> >                       suboverflowed = true;
>
>
> If the discussion so far is correct, the following diff will fix the
> issue.
>
> diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
> index bd3c7a47fe..19682b73ec 100644
> --- a/src/backend/storage/ipc/procarray.c
> +++ b/src/backend/storage/ipc/procarray.c
> @@ -4463,6 +4463,12 @@ ExpireOldKnownAssignedTransactionIds(TransactionId xid)
>  {
>         LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
>         KnownAssignedXidsRemovePreceding(xid);
> +       /*
> +        * reset lastOverflowedXid if we know transactions that have been possiblly
> +        * running are being gone.
> +        */
> +       if (TransactionIdPrecedes(procArray->lastOverflowedXid, xid))
> +               procArray->lastOverflowedXid = InvalidTransactionId;
>         LWLockRelease(ProcArrayLock);
>  }

So I agree with this fix.

It is however, an undocumented modularity violation. I think that is
acceptable because of the ProcArrayLock traffic, but needs to have a
comment to explain this at the call to
ExpireOldKnownAssignedTransactionIds() i.e. " and potentially reset
lastOverflowedXid", as well as a comment on the
ExpireOldKnownAssignedTransactionIds() function.

--
Simon Riggs                http://www.EnterpriseDB.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jan Wieck
Дата:
Сообщение: Re: should we enable log_checkpoints out of the box?
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: XTS cipher mode for cluster file encryption