Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Дата
Msg-id 20230119000214.xa2wzk5u7wbu3c3f@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Hi,

On 2023-01-18 14:37:20 -0800, Peter Geoghegan wrote:
> On Wed, Jan 18, 2023 at 2:22 PM Andres Freund <andres@anarazel.de> wrote:
> > The problem with the change is here:
> >
> >         /*
> >          * Okay, we've covered the corner cases.  The normal calculation is to
> >          * convert the old measurement to a density (tuples per page), then
> >          * estimate the number of tuples in the unscanned pages using that figure,
> >          * and finally add on the number of tuples in the scanned pages.
> >          */
> >         old_density = old_rel_tuples / old_rel_pages;
> >         unscanned_pages = (double) total_pages - (double) scanned_pages;
> >         total_tuples = old_density * unscanned_pages + scanned_tuples;
> >         return floor(total_tuples + 0.5);
> 
> My assumption has always been that vac_estimate_reltuples() is prone
> to issues like this because it just doesn't have access to very much
> information each time it runs. It can only see the delta between what
> VACUUM just saw, and what the last VACUUM (or possibly the last
> ANALYZE) saw according to pg_class. You're always going to find
> weaknesses in such a model if you go looking for them. You're always
> going to find a way to salami slice your way from good information to
> total nonsense, if you pick the right/wrong test case, which runs
> VACUUM in a way that allows whatever bias there may be to accumulate.
> It's sort of like the way floating point values can become very
> inaccurate through a process that allows many small inaccuracies to
> accumulate over time.

Sure. To start with, there's always going to be some inaccuracies when you
assume an even distribution across a table. But I think this goes beyond
that.

This problem occurs with a completely even distribution, exactly the same
inputs to the estimation function every time.  My example under-sold the
severity, because I had only 5% non-deletable tuples. Here's it with 50%
non-removable tuples (I've seen way worse than 50% in many real-world cases),
and a bunch of complexity removed (attched).

vacuum-no    reltuples/n_live_tup    n_dead_tup
1        4999976            5000000
2        2500077            5000000
3        1250184            5000000
4         625266            5000000
5         312821            5000000
10          10165            5000000

Each vacuum halves reltuples.  That's going to screw badly with all kinds of
things. Planner costs completely out of whack etc.



I wonder if this is part of the reason for the distortion you addressed with
74388a1a / 3097bde7dd1d. I am somewhat doubtful they're right as is. For a
large relation 2% of blocks is a significant number of rows, and simply never
adjusting reltuples seems quite problematic. At the very least we ought to
account for dead tids we removed or such, instead of just freezing reltuples.

Greetings,

Andres Freund

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Decoupling antiwraparound autovacuum from special rules around auto cancellation