Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Дата
Msg-id CAD21AoC_mhPGbytx19cHX86R10fjmC1MHQSYDcU0-5mjd4FHHw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Sat, Dec 18, 2021 at 11:29 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Dec 16, 2021 at 10:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > My emphasis here has been on making non-aggressive VACUUMs *always*
> > > advance relfrozenxid, outside of certain obvious edge cases. And so
> > > with all the patches applied, up to and including the opportunistic
> > > freezing patch, every autovacuum of every table manages to advance
> > > relfrozenxid during benchmarking -- usually to a fairly recent value.
> > > I've focussed on making aggressive VACUUMs (especially anti-wraparound
> > > autovacuums) a rare occurrence, for truly exceptional cases (e.g.,
> > > user keeps canceling autovacuums, maybe due to automated script that
> > > performs DDL). That has taken priority over other goals, for now.
> >
> > Great!
>
> Maybe this is a good time to revisit basic questions about VACUUM. I
> wonder if we can get rid of some of the GUCs for VACUUM now.
>
> Can we fully get rid of vacuum_freeze_table_age?

Does it mean that a vacuum always is an aggressive vacuum? If
opportunistic freezing works well on all tables, we might no longer
need vacuum_freeze_table_age. But I’m not sure that’s true since the
cost of freezing tuples is not 0.

> We probably shouldn't be using any units, but using XIDs "feels wrong"
> to me. Even with my patch, it is theoretically possible that we won't
> be able to advance relfrozenxid very much, because we cannot get a
> cleanup lock on one single heap page with one old XID. But even in
> this extreme case, how relevant is the "age" of this old XID, really?
> What really matters is whether or not we can advance relfrozenxid in
> time (with time to spare). And so the wraparound risk of the system is
> not affected all that much by the age of the single oldest XID. The
> risk mostly comes from how much total work we still need to do to
> advance relfrozenxid. If the single old XID is quite old indeed (~1.5
> billion XIDs), but there is only one, then we just have to freeze one
> tuple to be able to safely advance relfrozenxid (maybe advance it by a
> huge amount!). How long can it take to freeze one tuple, with the
> freeze map, etc?

I think that that's true for (mostly) static tables. But regarding
constantly-updated tables, since autovacuum runs based on the number
of garbage tuples (or inserted tuples) and how old the relfrozenxid is
if an autovacuum could not advance the relfrozenxid because it could
not get a cleanup lock on the page that has the single oldest XID,
it's likely that when autovacuum runs next time it will have to
process other pages too since the page will get dirty enough.

It might be a good idea that we remember pages where we could not get
a cleanup lock somewhere and revisit them after index cleanup. While
revisiting the pages, we don’t prune the page but only freeze tuples.

>
> On the other hand, the risk may be far greater if we have *many*
> tuples that are still unfrozen, whose XIDs are only "middle aged"
> right now. The idea behind vacuum_freeze_min_age seems to be to be
> lazy about work (tuple freezing) in the hope that we'll never need to
> do it, but that seems obsolete now. (It probably made a little more
> sense before the visibility map.)

Why is it obsolete now? I guess that it's still valid depending on the
cases, for example, heavily updated tables.

>
> Using XIDs makes sense for things like autovacuum_freeze_max_age,
> because there we have to worry about wraparound and relfrozenxid
> (whether or not we like it). But with this patch, and with everything
> else (the failsafe, insert-driven autovacuums, everything we've done
> over the last several years) I think that it might be time to increase
> the autovacuum_freeze_max_age default. Maybe even to something as high
> as 800 million transaction IDs, but certainly to 400 million. What do
> you think? (Maybe don't answer just yet, something to think about.)

I don’t have an objection to increasing autovacuum_freeze_max_age for
now. One of my concerns with anti-wraparound vacuums is that too many
tables (or several large tables) will reach autovacuum_freeze_max_age
at once, using up autovacuum slots and preventing autovacuums from
being launched on tables that are heavily being updated. Given these
works, expanding the gap between vacuum_freeze_table_age and
autovacuum_freeze_max_age would have better chances for the tables to
advance its relfrozenxid by an aggressive vacuum instead of an
anti-wraparound-aggressive vacuum. 400 million seems to be a good
start.

>
> > +       vacrel->aggressive = aggressive;
> >         vacrel->failsafe_active = false;
> >         vacrel->consider_bypass_optimization = true;
> >
> > How about adding skipwithvm to LVRelState too?
>
> Agreed -- it's slightly better that way. Will change this.
>
> >                          */
> > -                       if (skipping_blocks && !FORCE_CHECK_PAGE())
> > +                       if (skipping_blocks && blkno < nblocks - 1)
> >
> > Why do we always need to scan the last page even if heap truncation is
> > disabled (or in the failsafe mode)?
>
> My goal here was to keep the behavior from commit e8429082, "Avoid
> useless truncation attempts during VACUUM", while simplifying things
> around skipping heap pages via the visibility map (including removing
> the FORCE_CHECK_PAGE() macro). Of course you're right that this
> particular change that you have highlighted does change the behavior a
> little -- now we will always treat the final page as a "scanned page",
> except perhaps when 100% of all pages in the relation are skipped
> using the visibility map.
>
> This was a deliberate choice (and perhaps even a good choice!). I
> think that avoiding accessing the last heap page like this isn't worth
> the complexity. Note that we may already access heap pages (making
> them "scanned pages") despite the fact that we know it's unnecessary:
> the SKIP_PAGES_THRESHOLD test leads to this behavior (and we don't
> even try to avoid wasting CPU cycles on these
> not-skipped-but-skippable pages). So I think that the performance cost
> for the last page isn't going to be noticeable.

Agreed.

>
> However, now that I think about it, I wonder...what do you think of
> SKIP_PAGES_THRESHOLD, in general? Is the optimal value still 32 today?
> SKIP_PAGES_THRESHOLD hasn't changed since commit bf136cf6e3, shortly
> after the original visibility map implementation was committed in
> 2009. The idea that it helps us to advance relfrozenxid outside of
> aggressive VACUUMs (per commit message from bf136cf6e3) seems like it
> might no longer matter with the patch -- because now we won't ever set
> a page all-visible but not all-frozen. Plus the idea that we need to
> do all this work just to get readahead from the OS
> seems...questionable.

Given the opportunistic freezing, that's true but I'm concerned
whether opportunistic freezing always works well on all tables since
freezing tuples is not 0 cost.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: row filtering for logical replication
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: parallel vacuum comments