Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Дата
Msg-id CAAKRu_agCLAQ9OjmrTdJe-X=Xr7QnU4d=cfxdQGwc9jNx9w31w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Thanks for the review! All the small changes you suggested I made in
attached v23 unless otherwise noted below.

On Mon, Nov 24, 2025 at 5:24 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2025-11-20 12:19:58 -0500, Melanie Plageman wrote:
> > Subject: [PATCH v22 1/9] Split heap_page_prune_and_freeze() into helpers
>
> One minor thing: It's slightly odd that prune_freeze_plan() gets an oid
> argument, prune_freeze_setup() gets the entire prstate,
> heap_page_will_freeze() gets the Relation. It's what they need, but still a
> bit odd.

They all get the PruneState actually.

I've committed this patch (but actually have to do a follow-on commit
to silence coverity. Will do that next.)

> > Subject: [PATCH v22 2/9] Eliminate XLOG_HEAP2_VISIBLE from vacuum phase I
> >  prune/freeze
>
>
> Hm. This change makes sense, but unfortunately I find it somewhat hard to
> review. There are a lot of changes that don't obviously work towards one
> goal in this commit.

I've split up the first commit into 4 patches in attached v23
(0002-0005). They are not meant to be committed separately but are
separate only for ease of review. They comprise the logical steps for
getting to the final code state. I originally had it split up but got
feedback it was more work to review them each. So, let's see how this
goes.

> >@@ -238,6 +239,16 @@ typedef struct PruneFreezeParams
>
> >+     * vmbuffer is the buffer that must already contain contain the required
> >+     * block of the visibility map if we are to update it. blk_known_av is the
> >+     * visibility status of the heap block as of the last call to
> >+     * find_next_unskippable_block().
> >+     */
> >+    Buffer      vmbuffer;
> >+    bool        blk_known_av;
>
> What is blk_known_av set to if the block is known to not be all visible?
> Compared to the case where we did not yet determine the visibility status of
> the block?

blk_known_av should always be set to false if the caller doesn't know.
It is used as an optimization. I've added to the comment in this
struct to clarify that. More on this further down in my mail.

> > + * Decide whether to set the visibility map bits for heap_blk, using
> > + * information from PruneState and blk_known_av. Some callers may already
> > + * have examined this page’s VM bits (e.g., VACUUM in the previous
> > + * heap_vac_scan_next_block() call) and can pass that along.
>
> That's not entirely trivial to follow, tbh. As mentioned above, it's not clear
> to me how the state of a block where did determine that the block is *not*
> all-visible is represented.

There is no need to distinguish between knowing it is not all-visible
and not knowing if it is all-visible. That is, "not known" and "known
not" are the same for our purposes. This is only an optimization and
not needed for correctness. I've tried to add comments to this effect
in various places where blk_known_av is used.

> > +     else if (blk_known_av && !PageIsAllVisible(heap_page) &&
> > +                      visibilitymap_get_status(relation, heap_blk, &vmbuffer) != 0)
> > +     {
> > +             ereport(WARNING,
> > +                             (errcode(ERRCODE_DATA_CORRUPTED),
> > +                              errmsg("page is not marked all-visible but visibility map bit is set in relation
\"%s\"page %u", 
> > +                                             RelationGetRelationName(relation), heap_blk)));
> > +
> > +             visibilitymap_clear(relation, heap_blk, vmbuffer,
> > +                                                     VISIBILITYMAP_VALID_BITS);
>
> Wait, why is it ok to perform this check iff blk_known_av is set?

This is existing logic in vacuum. It would be okay to perform the
check even if blk_known_av is false but might be too expensive for the
common case where the page is not all-visible (especially on-access).
The next vacuum should be able to enter this code path and fix it. Or
do you think it will be cheap enough because the caller will have read
in and pinned the VM page?

> > +                     old_vmbits = visibilitymap_set_vmbits(blockno,
> > +                                                                                               vmbuffer,
new_vmbits,
> > +
params->relation->rd_locator);
> > +                     if (old_vmbits == new_vmbits)
> > +                     {
> > +                             LockBuffer(vmbuffer, BUFFER_LOCK_UNLOCK);
> > +                             /* Unset so we don't emit WAL since no change occurred */
> > +                             do_set_vm = false;
> > +                     }
> > +             }
>
> What can lead to this path being reached? Doesn't this mean that something
> changed the state of the VM while we were holding an exclusive lock on the
> heap buffer?

This shouldn't be in this commit (I've fixed that). However, it is
needed once we have on-access VM setting because we could have set the
page all-visible in the VM on-access in between when
find_next_unskippable_block() first checks the VM and sets
all_visible_according_to_vm/blk_known_av and when we take the lock and
prune/freeze the page.

> >                       log_heap_prune_and_freeze(params->relation, buffer,
> > -                                                                       InvalidBuffer,        /* vmbuffer */
> > -                                                                       0,    /* vmflags */
> > +                                                                       do_set_vm ? vmbuffer : InvalidBuffer,
> > +                                                                       do_set_vm ? new_vmbits : 0,
> >                                                                         conflict_xid,
> > -                                                                       true, params->reason,
> > +                                                                       true, /* cleanup lock */
> > +                                                                       do_set_pd_vis,
> > +                                                                       params->reason,
> >                                                                         prstate.frozen, prstate.nfrozen,
> >                                                                         prstate.redirected, prstate.nredirected,
> >                                                                         prstate.nowdead, prstate.ndead,
>
> This function is now taking 16 parameters :/

Is this complaint about readability or performance of parameter
passing? Because if it's the latter, I can't imagine that will be
noticeable when compared to the overhead of emitting a WAL record.

I could add a struct just for passing the parameters to the
log_heap_prune_and_freeze(). Something like:

typedef struct PruneFreezeChanges
{
    int            nredirected;
    int            ndead;
    int            nunused;
    int            nfrozen;
    OffsetNumber *redirected;
    OffsetNumber *nowdead;
    OffsetNumber *nowunused;
    HeapTupleFreeze *frozen;
} PruneFreezeChanges;

PruneFreezeChanges c = {
        .redirected = prstate.redirected,
        .nredirected = prstate.nredirected,
        .ndead = prstate.ndead,
        .nowdead = prstate.nowdead,
        .nunused = prstate.nunused,
        .nowunused = prstate.nowunused,
        .nfrozen = prstate.nfrozen,
        .frozen = prstate.frozen,
};

log_heap_prune_and_freeze(params->relation, buffer,
                                                        InvalidBuffer,
   /* vmbuffer */
                                                        0,    /* vmflags */
                                                        conflict_xid,
                                                        true, params->reason,
                                                        c);

However, I fear it is a bit confusing to have this struct just to pass
the parameters to the log_heap_prune_and_freeze(). We can't use that
struct inline in the PruneState because then we would need all the
arrays to be inline in the PruneFreezeChanges struct which would cause
4*MaxHeapTuplesPerPage stack allocated OffsetNumbers in vacuum phase
III than it currently has and needs.

The only other related parameters I see that could be stuck into a
struct are vmflags and set_pd_all_vis -- maybe called VisiChanges or
HeapPageVisiChanges. But again, I'm not sure if it is worth adding a
new struct for this.

> > +#ifdef USE_ASSERT_CHECKING
> > +     if (prstate.all_visible)
> > +     {
> > +             TransactionId debug_cutoff;
> > +             bool            debug_all_frozen;
> > +
> > +             Assert(prstate.lpdead_items == 0);
> > +             Assert(prstate.cutoffs);
> > +
> > +             if (!heap_page_is_all_visible(params->relation, buffer,
> > +                                                                       prstate.cutoffs->OldestXmin,
> > +                                                                       &debug_all_frozen,
> > +                                                                       &debug_cutoff, off_loc))
> > +                     Assert(false);
>
> I don't love Assert(false), because the message for the assert failure is
> pretty much meaningless. Sometimes it's hard to avoid, but here you have an if
> () that has no body other than Assert(false)? Just Assert the expression
> directly.

This is existing code. I agree it's weird, but I remember Peter saying
something about why he did it this way that I no longer remember.
Anyway, 0001 changes the assert as you suggest.

> > Subject: [PATCH v22 3/9] Eliminate XLOG_HEAP2_VISIBLE from empty-page vacuum
> >
> > As part of removing XLOG_HEAP2_VISIBLE records, phase I of VACUUM now
> > marks empty pages all-visible in a XLOG_HEAP2_PRUNE_VACUUM_SCAN record.
>
> This whole business of treating empty pages as all-visible continues to not
> make any sense to me. Particularly in combination with a not crashsafe FSM it
> just seems ... unhelpful. It also means that there there's a decent chance of
> extra WAL when bulk extending. But that's not the fault of this change.

Is the argument for setting them av/af that we can skip them more
easily in future vacuums (i.e. not have to read in the page and take a
lock etc)?

> > Subject: [PATCH v22 4/9] Remove XLOG_HEAP2_VISIBLE entirely
> >
> > As no remaining users emit XLOG_HEAP2_VISIBLE records.
> > This includes deleting the xl_heap_visible struct and all functions
> > responsible for emitting or replaying XLOG_HEAP2_VISIBLE records.
>
> Probably worth mentioning that this changes the VM API.

I've added a mention about this in the commit.
Are you imagining I have any comments anywhere about how
XLOG_HEAP2_VISIBLE used to exist?

I realized I need to bump XLOG_PAGE_MAGIC in this commit because the
code to replay XLOG_HEAP2_VISIBLE records is gone now.

What I'm not sure is if I have to bump it in some of the other commits
that change which WAL records are emitted by a particular operation
(e.g. not emitting a separate VM record from phase I of vacuum).

> > - * - Page pruning, in VACUUM's 1st pass or on access: Some items are
> > + * - Page pruning, in vacuum phase I or on-access: Some items are
> >   *   redirected, some marked dead, and some removed altogether.
> >   *
> > - * - Freezing: Items are marked as 'frozen'.
> > + * - Freezing: During vacuum phase I, items are marked as 'frozen'
> >   *
> > - * - Vacuum, 2nd pass: Items that are already LP_DEAD are marked as unused.
> > + * - Reaping: During vacuum phase III, items that are already LP_DEAD are
> > + *   marked as unused.
> >   *
> > - * They have enough commonalities that we use a single WAL record for them
> > + * - VM updates: After vacuum phases I and III, the heap page may be marked
> > + *   all-visible and all-frozen.
> > + *
> > + * These changes all happen together, so we use a single WAL record for them
> >   * all.
> >   *
> >   * If replaying the record requires a cleanup lock, pass cleanup_lock =
> >   true.
>
> How's that related to the commit's subject?

Oops, I moved it to the relevant commit.

> > Subject: [PATCH v22 5/9] Rename GlobalVisTestIsRemovableXid() to
> >  GlobalVisXidVisibleToAll()
> >
> > The function is currently only used to check whether a tuple’s xmax is
> > visible to all transactions (and thus removable). Upcoming changes will
> > also use it to test whether a tuple’s xmin is visible to all to
> > decide if a page can be marked all-visible in the visibility map.
> >
> > The new name, GlobalVisXidVisibleToAll(), better reflects this broader
> > purpose.
>
> If we want this - and I'm not convinced we do - I think it needs to go further
> and change the other uses of removable in
> procarray.c. ComputeXidHorizonsResult has a lot of related fields.
>
> There's also GetOldestNonRemovableTransactionId(),
> GlobalVisCheckRemovableXid(), GlobalVisCheckRemovableFullXid() that weren't
> included in the renaming.

Okay, I see what you are saying. When you say you're not sure if we
want "this" -- do you mean using GlobalVisState for determining if
xmins are visible to all (which is required to set the VM on-access)
or do you mean renaming those functions?

If we're just talking about the renaming, looking at procarray.c, it
is full of the word "removable" because its functions were largely
used to examine and determine if everyone can see an xmax as committed
and thus if that tuple is removable from their perspective. But
nothing about the code that I can see means it has to be an xmax. We
could just as well use the functions to determine if everyone can see
an xmin as committed.

I don't see how we can leave the names as is and use it on xmins
because that tuple is _not_ removable based on testing if everyone can
see the xmin. So the function basically returns an incorrect result.

That being said, the problem with replacing "removable" with "visible
to all" -- which isn't _terrible_ -- means we have to replace
"nonremovable" with "not visible to all" -- which is terrible.

I think getting rid of "removable" from procarray.c would be an
improvement because that file feels tightly coupled to vacuum and
removing tuples because of the names of variables and functions when
actually its functionality isn't. So, the issue is coming up with
something palatable.

One alternative idea (that requires no renaming) is to add a wrapper
function somewhere outside procarray.c which invokes
GlobalVisTestIsRemovableXid() but is called something like
XidVisibleToAll() and is documented for use with xmins/etc. It would
avoid the messy work of coming up with a good name. What do you think?

> > Subject: [PATCH v22 6/9] Use GlobalVisState in vacuum to determine page level
> >  visibility
> >
> > This also benefits vacuum directly: in some cases, GlobalVisState may
> > advance during a vacuum, allowing more pages to become considered
> > all-visible. And, in the future, we could easily add a heuristic to
> > update GlobalVisState more frequently during vacuums of large tables. In
> > the rare case that the GlobalVisState moves backward, vacuum falls back
> > to OldestXmin to ensure we don’t attempt to freeze a dead tuple that
> > wasn’t yet prunable according to the GlobalVisState.
>
> I think it may be better to make sure that the GlobalVisState can't move
> backward.

Do you mean that I shouldn't use the GlobalVisState to determine
visibility until I make sure it can't move backwards?

There is actually no functional difference in my patch set with the
code this commit message refers to (in heap_prune_satisfies_vacuum()).
I only mentioned it to make sure folks knew that even though I was
widening usage of GlobalVisState, we wouldn't encounter
synchronization issues with freezing horizons.

> > Subject: [PATCH v22 8/9] Allow on-access pruning to set pages all-visible
> >
> > Many queries do not modify the underlying relation. For such queries, if
> > on-access pruning occurs during the scan, we can check whether the page
> > has become all-visible and update the visibility map accordingly.
> > Previously, only vacuum and COPY FREEZE marked pages as all-visible or
> > all-frozen.
>
> > Supporting this requires passing information about whether the relation
> > is modified from the executor down to the scan descriptor.
>
> I think it'd be good to split this part into a separate commit. The set of
> folks to review that are distinct (and broader) from the ones looking at
> heapam internals.

Good point. I've split it into 3 commits in this patch set (0011-0013)

- Melanie

Вложения

В списке pgsql-hackers по дате отправления: