Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Дата
Msg-id CAAKRu_Z50WSPWLYg-2NC4TDBSyTLMRL_jG=K+txByTAeu5nNXA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-bugs
On Thu, Jun 20, 2024 at 11:49 AM Melanie Plageman
<melanieplageman@gmail.com> wrote:
>
> On Tue, Jun 18, 2024 at 6:51 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> >
> > Finally, upthread there is discussion of how we could end up doing a
> > catalog lookup after vacuum_get_cutoffs() and before the tuple
> > visibility check on 16. Assuming this is true, we would want to
> > backport the fix to 16 as well. I could use some help getting a repro
> > (using btree index deletion for example) of the infinite loop on 16.
>
> So, I ended up working on a new repro that works by forcing a round of
> index vacuuming after the standby reconnects and before pruning a dead
> tuple whose xmax is older than OldestXmin.
>
> At the end of the round of index vacuuming, _bt_pendingfsm_finalize()
> calls GetOldestNonRemovableTransactionId(), thereby updating the
> backend's GlobalVisState and moving maybe_needed backwards.
>
> Then vacuum's first pass will continue with pruning and find our later
> inserted and updated tuple HEAPTUPLE_RECENTLY_DEAD when compared to
> maybe_needed but HEAPTUPLE_DEAD when compared to OldestXmin.
>
> I make sure that the standby reconnects between vacuum_get_cutoffs()
> (vacuum_set_xid_limits() on 14/15) and pruning because I have a cursor
> on the page keeping VACUUM FREEZE from getting a cleanup lock.
>
> See the repros for step-by-step explanations of how it works.
>
> With this, I can repro the infinite loop on 14-16.
>
> Backporting 1ccc1e05ae fixes 16 but, with the new repro, 14 and 15
> error out with "cannot freeze committed xmax". I'm going to
> investigate further why this is happening. It definitely makes me
> wonder about the fix.

It turns out it was also erroring out on 16 (i.e. backporting
1ccc1e05ae did not fix anything), but I didn't notice it because the
perl TAP test passed. I also discovered we can hit this error in
master, so I started a thread about that here [1].

- Melanie

[1] https://www.postgresql.org/message-id/CAAKRu_bDD7oq9ZwB2OJqub5BovMG6UjEYsoK2LVttadjEqyRGg%40mail.gmail.com



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #18517: Dropping a table referenced by an initially deferred foreign key fails with an error
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #18499: Reindexing spgist index concurrently triggers Assert("TransactionIdIsValid(state->myXid)")