Re: New strategies for freezing, advancing relfrozenxid early

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: New strategies for freezing, advancing relfrozenxid early
Дата
Msg-id CAH2-WznPY76ugghc2oWSayYUws7wLbZ2BSPCBru5fFmwwXiCiw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: New strategies for freezing, advancing relfrozenxid early  (Dilip Kumar <dilipbalaut@gmail.com>)
Список pgsql-hackers
On Mon, Jan 23, 2023 at 3:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> My final set of comments for 0002

Thanks for the review!

> I do not understand much use of maintaining these two
> 'scanned_pages_lazy' and 'scanned_pages_eager' variables.  I think
> just maintaining 'scanned_pages' should be sufficient.  I do not see
> in patches also they are really used.

I agree that the visibility map snapshot struct could stand to be
cleaned up -- some of that state may not be needed, and it wouldn't be
that hard to use memory a little more economically, particularly with
very small tables. It's on my TODO list already.

> +#define MAX_PAGES_YOUNG_TABLEAGE    0.05    /* 5% of rel_pages */
> +#define MAX_PAGES_OLD_TABLEAGE        0.70    /* 70% of rel_pages */
>
> Why is the logic behind 5% and 70% are those based on some
> experiments?  Should those be tuning parameters so that with real
> world use cases if we realise that it would be good if the eager scan
> is getting selected more frequently or less frequently then we can
> tune those parameters?

The specific multipliers constants chosen (for
MAX_PAGES_YOUNG_TABLEAGE and MAX_PAGES_OLD_TABLEAGE) were based on
both experiments and intuition. The precise values could be somewhat
different without it really mattering, though. For example, with a
table like pgbench_history (which is a really important case for the
patch in general), there won't be any all-visible pages at all (at
least after a short while), so it won't matter what these constants
are -- eager scanning will always be chosen.

I don't think that they should be parameters. The useful parameter for
users remains vacuum_freeze_table_age/autovacuum_freeze_max_age (note
that vacuum_freeze_table_age usually gets its value from
autovacuum_freeze_max_age due to changes in 0002). Like today,
vacuum_freeze_table_age forces VACUUM to scan all not-all-frozen pages
so that relfrozenxid can be advanced. Unlike today, it forces eager
scanning (not aggressive mode). But even long before eager scanning is
*forced*, pressure to use eager scanning gradually builds. That
pressure will usually cause some VACUUM to use eager scanning before
it's strictly necessary. Overall,
vacuum_freeze_table_age/autovacuum_freeze_max_age now provide loose
guidance.

It really has to be loose in this sense in order for
lazy_scan_strategy() to have the freedom to do the right thing based
on the characteristics of the table as a whole, according to its
visibility map snapshot. This allows lazy_scan_strategy() to stumble
upon once-off opportunities to advance relfrozenxid inexpensively,
including cases where it could never happen with the current model.
These opportunities are side-effects of workload characteristics that
can be hard to predict [1][2].

> I think this should be moved as first if case, I mean why to do all
> the calculations based on the 'tableagefrac' and
> 'TABLEAGEFRAC_XXPOINT' if we are forced to scan them all.  I agree the
> extra computation we are doing might not really matter compared to the
> vacuum work we are going to perform but still seems logical to me to
> do the simple check first.

This is only needed for DISABLE_PAGE_SKIPPING, which is an escape
hatch option that is never supposed to be needed. I don't think that
it's worth going to the trouble of indenting the code more just so
this is avoided -- it really is an afterthought. Besides, the compiler
might well be doing this for us.

> 4. Should we move prefetching as a separate patch, instead of merging
> with the scanning strategy?

I don't think that breaking that out would be an improvement. A lot of
the prefetching stuff informs how the visibility map code is
structured.

[1] https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Patch_3
[2]
https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples#Opportunistically_advancing_relfrozenxid_with_bursty.2C_real-world_workloads
--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: Add LZ4 compression in pg_dump
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: Schema variables - new implementation for Postgres 15 (typo)