Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Дата
Msg-id CAH2-Wz=cJYdFww3FifTrLUYRwMzAVPVFuCZ0RcfMnibR94Rqng@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Wed, Jan 18, 2023 at 12:44 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I don't know enough about the specifics of how this works to have an
> intelligent opinion about how likely these particular ideas are to
> work out. However, I think it's risky to look at estimates and try to
> infer whether they are reliable. It's too easy to be wrong. What we
> really want to do is anchor our estimates to some data source that we
> know we can trust absolutely. If you trust possibly-bad data less, it
> screws up your estimates more slowly, but it still screws them up.

Some of what I'm proposing arguably amounts to deliberately adding a
bias. But that's not an unreasonable thing in itself. I think of it as
related to the bias-variance tradeoff, which is a concept that comes
up a lot in machine learning and statistical inference.

We can afford to be quite imprecise at times, especially if we choose
a bias that we know has much less potential to do us harm -- some
mistakes hurt much more than others. We cannot afford to ever be
dramatically wrong, though -- especially in the direction of vacuuming
less often.

Besides, there is something that we *can* place a relatively high
degree of trust in that will still be in the loop here: VACUUM itself.
If VACUUM runs then it'll call pgstat_report_vacuum(), which will set
the record straight in the event of over estimating dead tuples. To
some degree the problem of over estimating dead tuples is
self-limiting.

> If Andres is correct that what really matter is the number of pages
> we're going to have to dirty, we could abandon counting dead tuples
> altogether and just count not-all-visible pages in the VM map.

That's what matters most from a cost point of view IMV. So it's a big
part of the overall picture, but not everything. It tells us
relatively little about the benefits, except perhaps when most pages
are all-visible.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Dilger
Дата:
Сообщение: Re: Non-superuser subscription owners
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Extracting cross-version-upgrade knowledge from buildfarm client