Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Дата
Msg-id CAAKRu_ai8PMW5cqCFhu-U46CWLmgP2d_FnpLOqCSvMxY-UQ9xw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Andres Freund <andres@anarazel.de>)
Ответы Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-bugs
On Mon, Apr 15, 2024 at 1:39 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> I've tried a couple times to catch up with this thread. But always kinda felt
> I must be missing something. It might be that this is one part of the
> confusion:
>
> On 2024-01-06 12:24:13 -0800, Noah Misch wrote:
> > Fair enough.  While I agree there's a decent chance back-patching would be
> > okay, I think there's also a decent chance that 1ccc1e05ae creates the problem
> > Matthias theorized.  Something like: we update relfrozenxid based on
> > OldestXmin, even though GlobalVisState caused us to retain a tuple older than
> > OldestXmin.  Then relfrozenxid disagrees with table contents.
>
> Looking at the state as of 1ccc1e05ae, I don't see how - in lazy_scan_prune(),
> if heap_page_prune() spuriously didn't prune a tuple, because the horizon went
> backwards, we'd encounter the tuple in the loop below and call
> heap_prepare_freeze_tuple(), which would error out with one of
>
>     /*
>      * Process xmin, while keeping track of whether it's already frozen, or
>      * will become frozen iff our freeze plan is executed by caller (could be
>      * neither).
>      */
>     xid = HeapTupleHeaderGetXmin(tuple);
>     if (!TransactionIdIsNormal(xid))
>         xmin_already_frozen = true;
>     else
>     {
>         if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
>             ereport(ERROR,
>                     (errcode(ERRCODE_DATA_CORRUPTED),
>                      errmsg_internal("found xmin %u from before relfrozenxid %u",
>                                      xid, cutoffs->relfrozenxid)));
>
> or
>                 if (TransactionIdPrecedes(update_xact, cutoffs->relfrozenxid))
>                         ereport(ERROR,
>                                         (errcode(ERRCODE_DATA_CORRUPTED),
>                                          errmsg_internal("multixact %u contains update XID %u from before
relfrozenxid%u", 
>                                                                          multi, update_xact,
>                                                                          cutoffs->relfrozenxid)));
> or
>                 /* Raw xmax is normal XID */
>                 if (TransactionIdPrecedes(xid, cutoffs->relfrozenxid))
>                         ereport(ERROR,
>                                         (errcode(ERRCODE_DATA_CORRUPTED),
>                                          errmsg_internal("found xmax %u from before relfrozenxid %u",
>                                                                          xid, cutoffs->relfrozenxid)));
>
>
> I'm not saying that spuriously erroring out would be ok. But I guess I just
> don't understand the data corruption theory in this subthread, because we'd
> error out if we encountered a tuple that should have been frozen but wasn't?

I have a more basic question. How could GlobalVisState->maybe_needed
going backwards cause a problem with relfrozenxid? Yes, if
maybe_needed goes backwards, we may not remove a tuple whose xmin/xmax
are older than VacuumCutoffs->OldestXmin. But, if that tuple's
xmin/xmax are older than OldestXmin, then wouldn't we freeze it? If we
freeze it, there isn't an issue. And if the tuple's xids are not newer
than OldestXmin, then how could we end up advancing relfrozenxid to a
value greater than the tuple's xids?

- Melanie



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Melanie Plageman
Дата:
Сообщение: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()