Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Дата
Msg-id 20230109220751.7quf2kfzfdvd76hw@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE  (Mark Dilger <mark.dilger@enterprisedb.com>)
Ответы Re: BUG: Postgres 14 + vacuum_defer_cleanup_age + FOR UPDATE + UPDATE
Список pgsql-hackers
Hi,

On 2023-01-09 13:55:02 -0800, Mark Dilger wrote:
> > On Jan 9, 2023, at 11:34 AM, Andres Freund <andres@anarazel.de> wrote:
> > 
> > 1) Because ctx->next_xid is set after the XidFromFullTransactionId() call in
> > update_cached_xid_range(), we end up using the xid 0 (or an outdated value in
> > subsequent calls) to determine whether epoch needs to be reduced.
> 
> Can you say a bit more about your analysis here, preferably with pointers to
> the lines of code you are analyzing?  Does the problem exist in amcheck as
> currently committed, or are you thinking about a problem that arises only
> after applying your patch?  I'm a bit fuzzy on where xid 0 gets used.

The problems exist in the code as currently committed. I'm not sure what
exactly the consequences are, the result is that oldest_fxid will be, at least
temporarily, bogus.

Consider the first call to update_cached_xid_range():

/*
 * Update our cached range of valid transaction IDs.
 */
static void
update_cached_xid_range(HeapCheckContext *ctx)
{
    /* Make cached copies */
    LWLockAcquire(XidGenLock, LW_SHARED);
    ctx->next_fxid = ShmemVariableCache->nextXid;
    ctx->oldest_xid = ShmemVariableCache->oldestXid;
    LWLockRelease(XidGenLock);

    /* And compute alternate versions of the same */
    ctx->oldest_fxid = FullTransactionIdFromXidAndCtx(ctx->oldest_xid, ctx);
    ctx->next_xid = XidFromFullTransactionId(ctx->next_fxid);
}

The problem is that the call to FullTransactionIdFromXidAndCtx() happens
before ctx->next_xid is assigned, even though FullTransactionIdFromXidAndCtx()
uses ctx->next_xid.

static FullTransactionId
FullTransactionIdFromXidAndCtx(TransactionId xid, const HeapCheckContext *ctx)
{
    uint32        epoch;

    if (!TransactionIdIsNormal(xid))
        return FullTransactionIdFromEpochAndXid(0, xid);
    epoch = EpochFromFullTransactionId(ctx->next_fxid);
    if (xid > ctx->next_xid)
        epoch--;
    return FullTransactionIdFromEpochAndXid(epoch, xid);
}

Because ctx->next_xid is 0, due to not having been set yet, "xid > ctx->next_xid"
will always be true, leading to epoch being reduced by one.

In the common case of there never having been an xid wraparound, we'll thus
underflow epoch, generating an xid far into the future.


The tests encounter the issue today. If you add
    Assert(TransactionIdIsValid(ctx->next_xid));
    Assert(FullTransactionIdIsValid(ctx->next_fxid));
early in FullTransactionIdFromXidAndCtx() it'll be hit in the
amcheck/pg_amcheck tests.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Show various offset arrays for heap WAL records
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans