Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae

Поиск
Список
Период
Сортировка
От Bowen Shi
Тема Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Дата
Msg-id CAM_vCuc297m-ZroQM_yT561T6_uFYHLdE=b7+5PA6QQqjB-8UQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae  (Melanie Plageman <melanieplageman@gmail.com>)
Ответы Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae
Список pgsql-bugs
Hi,

I only had time to check the fix_hang_15.patch until today. 

On Thu, May 23, 2024 at 12:57 AM Melanie Plageman <melanieplageman@gmail.com> wrote:
On Mon, May 20, 2024 at 4:48 PM Noah Misch <noah@leadboat.com> wrote:
>
> On Mon, May 20, 2024 at 11:58:23AM -0400, Melanie Plageman wrote:
> > On Sat, May 18, 2024 at 6:23 PM Noah Misch <noah@leadboat.com> wrote:
> > > Are there obstacles to fixing the hang by back-patching 1ccc1e05ae instead of
> > > this?  We'll need to get confident about 1ccc1e05ae before v17, and that
> > > sounds potentially easier than getting confident about both 1ccc1e05ae and
> > > this other approach.
> >
> > I haven't tried back-patching 1ccc1e05ae yet, but I don't understand
> > why we would want to use stable back branches to get comfortable with
> > an approach committed to an unreleased version of Postgres.
>
> I wouldn't say we want to use stable back branches to get comfortable with an
> approach.  I wanted to say that it's less work to be confident about one new
> way of doing things than two new ways of doing things.

I think I understand your point better now.

> > The small fix proposed in this thread is extremely minimal and
> > straightforward. It seems much less risky as a backpatch.
>
> Here's how I model the current and proposed code:
>
> 1. v15 VACUUM removes tuples that are HEAPTUPLE_DEAD according to VisTest.
>    OldestXmin doesn't cause tuple removal, but there's a hang when OldestXmin
>    rules DEAD after VisTest ruled RECENTLY_DEAD.
>
> 2. With 1ccc1e05ae, v17 VACUUM still removes tuples that are HEAPTUPLE_DEAD
>    according to VisTest only.  OldestXmin doesn't come into play.
>
> 3. fix_hang_15.patch would make v15 VACUUM remove tuples that are
>    HEAPTUPLE_DEAD according to _either_ VisTest or OldestXmin.
>
> Since (3) is the only list entry that removes tuples that VisTest ruled
> RECENTLY_DEAD, I find it higher risk.  For all three, the core task of
> certifying the behavior is confirming that its outcome is sound when VisTest
> and OldestXmin disagree.  How do you model it?

Okay, I see your point. In 1 and 2, tuples that would have been
considered HEAPTUPLE_DEAD at the beginning of vacuum but are
considered HEAPTUPLE_RECENTLY_DEAD when pruning them are not removed.
In 3, tuples that would have been considered HEAPTUPLE_DEAD at the
beginning of vacuum are always removed, regardless of whether or not
they would be considered HEAPTUPLE_RECENTLY_DEAD when pruning them.

One option is to add the logic in fix_hang_15.patch to master as well
(always remove tuples older than OldestXmin). This addresses your
concern about gaining confidence in a single solution.

However, I can see how removing more tuples could be concerning. In
the case that the horizon moves backwards because of a standby
reconnecting, I think the worst case is that removing that tuple
causes a recovery conflict on the standby (depending on the value of
max_standby_streaming_delay et al).

What would happen if we simply skipped the current page when we found the vacuum process had entered the infinite loop (use a counter)?

--
Regards
Bowen Shi

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18484: "Cannot enlarge string buffer" during parallel execution of prepared statement/partitioning
Следующее
От: Bowen Shi
Дата:
Сообщение: Re: relfrozenxid may disagree with row XIDs after 1ccc1e05ae