Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations
Дата
Msg-id 20211124013225.d67t32hkcbbbsjjc@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Removing more vacuumlazy.c special cases, relfrozenxid optimizations  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Hi,

On 2021-11-23 17:01:20 -0800, Peter Geoghegan wrote:
> > On reason for my doubt is the following:
> >
> > We can set all-visible on a page without a FPW image (well, as long as hint
> > bits aren't logged). There's a significant difference between needing to WAL
> > log FPIs for every heap page or not, and it's not that rare for data to live
> > shorter than autovacuum_freeze_max_age or that limit never being reached.
> 
> This sounds like an objection to one specific heuristic, and not an
> objection to the general idea.

I understood you to propose that we do not have separate frozen and
all-visible states. Which I think will be problematic, because of scenarios
like the above.


> The only essential part is "opportunistic freezing during vacuum, when the
> cost is clearly very low, and the benefit is probably high". And so it now
> seems you were making a far more limited statement than I first believed.

I'm on board with freezing when we already dirty out the page, and when doing
so doesn't cause an additional FPI. And I don't think I've argued against that
in the past.


> These all-visible (but not all-frozen) heap pages could be considered
> "tenured", since they have survived at least one full VACUUM cycle
> without being unset. So why not also freeze them based on the
> assumption that they'll probably stay that way forever?

Because it's a potentially massive increase in write volume? E.g. if you have
a insert-only workload, and you discard old data by dropping old partitions,
this will often add yet another rewrite, despite your data likely never
getting old enough to need to be frozen.

Given that we often immediately need to start another vacuum just when one
finished, because the vacuum took long enough to reach thresholds of vacuuming
again, I don't think the (auto-)vacuum count is a good proxy.

Maybe you meant this as a more limited concept, i.e. only doing so when the
percentage of all-visible but not all-frozen pages is small?


We could perhaps do better with if we had information about the system-wide
rate of xid throughput and how often / how long past vacuums of a table took.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "houzj.fnst@fujitsu.com"
Дата:
Сообщение: RE: row filtering for logical replication
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: parallel vacuum comments