Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Дата
Msg-id CAH2-Wzkxrwof3g9BYXH22YMS7dAjn+y5S1H=asCZFd_uNCFxbw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Masahiko Sawada <sawada.mshk@gmail.com>)
Ответы Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Masahiko Sawada <sawada.mshk@gmail.com>)
Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Thu, Dec 16, 2021 at 10:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > My emphasis here has been on making non-aggressive VACUUMs *always*
> > advance relfrozenxid, outside of certain obvious edge cases. And so
> > with all the patches applied, up to and including the opportunistic
> > freezing patch, every autovacuum of every table manages to advance
> > relfrozenxid during benchmarking -- usually to a fairly recent value.
> > I've focussed on making aggressive VACUUMs (especially anti-wraparound
> > autovacuums) a rare occurrence, for truly exceptional cases (e.g.,
> > user keeps canceling autovacuums, maybe due to automated script that
> > performs DDL). That has taken priority over other goals, for now.
>
> Great!

Maybe this is a good time to revisit basic questions about VACUUM. I
wonder if we can get rid of some of the GUCs for VACUUM now.

Can we fully get rid of vacuum_freeze_table_age? Maybe even get rid of
vacuum_freeze_min_age, too? Freezing tuples is a maintenance task for
physical blocks, but we use logical units (XIDs).

We probably shouldn't be using any units, but using XIDs "feels wrong"
to me. Even with my patch, it is theoretically possible that we won't
be able to advance relfrozenxid very much, because we cannot get a
cleanup lock on one single heap page with one old XID. But even in
this extreme case, how relevant is the "age" of this old XID, really?
What really matters is whether or not we can advance relfrozenxid in
time (with time to spare). And so the wraparound risk of the system is
not affected all that much by the age of the single oldest XID. The
risk mostly comes from how much total work we still need to do to
advance relfrozenxid. If the single old XID is quite old indeed (~1.5
billion XIDs), but there is only one, then we just have to freeze one
tuple to be able to safely advance relfrozenxid (maybe advance it by a
huge amount!). How long can it take to freeze one tuple, with the
freeze map, etc?

On the other hand, the risk may be far greater if we have *many*
tuples that are still unfrozen, whose XIDs are only "middle aged"
right now. The idea behind vacuum_freeze_min_age seems to be to be
lazy about work (tuple freezing) in the hope that we'll never need to
do it, but that seems obsolete now. (It probably made a little more
sense before the visibility map.)

Using XIDs makes sense for things like autovacuum_freeze_max_age,
because there we have to worry about wraparound and relfrozenxid
(whether or not we like it). But with this patch, and with everything
else (the failsafe, insert-driven autovacuums, everything we've done
over the last several years) I think that it might be time to increase
the autovacuum_freeze_max_age default. Maybe even to something as high
as 800 million transaction IDs, but certainly to 400 million. What do
you think? (Maybe don't answer just yet, something to think about.)

> +       vacrel->aggressive = aggressive;
>         vacrel->failsafe_active = false;
>         vacrel->consider_bypass_optimization = true;
>
> How about adding skipwithvm to LVRelState too?

Agreed -- it's slightly better that way. Will change this.

>                          */
> -                       if (skipping_blocks && !FORCE_CHECK_PAGE())
> +                       if (skipping_blocks && blkno < nblocks - 1)
>
> Why do we always need to scan the last page even if heap truncation is
> disabled (or in the failsafe mode)?

My goal here was to keep the behavior from commit e8429082, "Avoid
useless truncation attempts during VACUUM", while simplifying things
around skipping heap pages via the visibility map (including removing
the FORCE_CHECK_PAGE() macro). Of course you're right that this
particular change that you have highlighted does change the behavior a
little -- now we will always treat the final page as a "scanned page",
except perhaps when 100% of all pages in the relation are skipped
using the visibility map.

This was a deliberate choice (and perhaps even a good choice!). I
think that avoiding accessing the last heap page like this isn't worth
the complexity. Note that we may already access heap pages (making
them "scanned pages") despite the fact that we know it's unnecessary:
the SKIP_PAGES_THRESHOLD test leads to this behavior (and we don't
even try to avoid wasting CPU cycles on these
not-skipped-but-skippable pages). So I think that the performance cost
for the last page isn't going to be noticeable.

However, now that I think about it, I wonder...what do you think of
SKIP_PAGES_THRESHOLD, in general? Is the optimal value still 32 today?
SKIP_PAGES_THRESHOLD hasn't changed since commit bf136cf6e3, shortly
after the original visibility map implementation was committed in
2009. The idea that it helps us to advance relfrozenxid outside of
aggressive VACUUMs (per commit message from bf136cf6e3) seems like it
might no longer matter with the patch -- because now we won't ever set
a page all-visible but not all-frozen. Plus the idea that we need to
do all this work just to get readahead from the OS
seems...questionable.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Column Filtering in Logical Replication
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: row filtering for logical replication