Re: New GUC autovacuum_max_threshold ?

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: New GUC autovacuum_max_threshold ?
Дата
Msg-id CAAKRu_ZoieMN3abH9gjMRmRMf6M3f1n4x-U9LsfE02tJnnbN1Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: New GUC autovacuum_max_threshold ?  (Frédéric Yhuel <frederic.yhuel@dalibo.com>)
Ответы Re: New GUC autovacuum_max_threshold ?  (Frédéric Yhuel <frederic.yhuel@dalibo.com>)
Список pgsql-hackers
On Thu, Apr 25, 2024 at 2:52 AM Frédéric Yhuel
<frederic.yhuel@dalibo.com> wrote:
>
> Le 24/04/2024 à 21:10, Melanie Plageman a écrit :
> > On Wed, Apr 24, 2024 at 8:08 AM Frédéric Yhuel
> > <frederic.yhuel@dalibo.com> wrote:
> >>
> >> Hello,
> >>
> >> I would like to suggest a new parameter, autovacuum_max_threshold, which
> >> would set an upper limit on the number of tuples to delete/update/insert
> >> prior to vacuum/analyze.
> >
> > Hi Frédéric, thanks for the proposal! You are tackling a very tough
> > problem. I would also find it useful to know more about what led you
> > to suggest this particular solution. I am very interested in user
> > stories around difficulties with what tables are autovacuumed and
> > when.
> >
>
> Hi Melanie! I can certainly start compiling user stories about that.

Cool! That would be very useful.

> >> The idea would be to replace the following calculation :
> >>
> >> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
> >>
> >> with this one :
> >>
> >> vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples / (1
> >> + vac_scale_factor * reltuples / autovacuum_max_threshold)
> >>
> >> (and the same for the others, vacinsthresh and anlthresh).
> >
> > My first thought when reviewing the GUC and how it is used is
> > wondering if its description is a bit misleading.
> >
> > autovacuum_vacuum_threshold is the "minimum number of updated or
> > deleted tuples needed to trigger a vacuum". That is, if this many
> > tuples are modified, it *may* trigger a vacuum, but we also may skip
> > vacuuming the table for other reasons or due to other factors.
> > autovacuum_max_threshold's proposed definition is the upper
> > limit/maximum number of tuples to insert/update/delete prior to
> > vacuum/analyze. This implies that if that many tuples have been
> > modified or inserted, the table will definitely be vacuumed -- which
> > isn't true. Maybe that is okay, but I thought I would bring it up.
> >
>
> I'm not too sure I understand. What are the reasons it might by skipped?
> I can think of a concurrent index creation on the same table, or
> anything holding a SHARE UPDATE EXCLUSIVE lock or above. Is this the
> sort of thing you are talking about?

No, I was thinking more literally that, if reltuples (assuming
reltuples is modified/inserted tuples) > autovacuum_max_threshold, I
would expect the table to be vacuumed. However, with your formula,
that wouldn't necessarily be true.

I think there are values of reltuples and autovacuum_max_threshold at
which reltuples > autovacuum_max_threshold but reltuples <=
vac_base_thresh + vac_scale_factor * reltuples / (1 + vac_scale_factor
* reltuples / autovacuum_max_threshold)

I tried to reduce the formula to come up with a precise definition of
the range of values for which this is true, however I wasn't able to
reduce it to something nice.

Here is just an example of a case:

vac_base_thresh = 2000
vac_scale_factor = 0.9
reltuples = 3200
autovacuum_max_threshold = 2500

total_thresh = vac_base_thresh + vac_scale_factor * reltuples / (1 +
vac_scale_factor * reltuples / autovacuum_max_threshold)

total_thresh: 3338. dead tuples: 3200. autovacuum_max_threshold: 2500

so there are more dead tuples than the max threshold, so it should
trigger a vacuum, but it doesn't because the total calculated
threshold is higher than the number of dead tuples.

This of course may not be a realistic scenario in practice. It works
best the closer scale factor is to 1 (wish I had derived the formula
successfully) and when autovacuum_max_threshold > 2 * vac_base_thresh.
So, maybe it is not an issue.

> Perhaps a better name for the GUC would be
> autovacuum_asymptotic_limit... or something like that?

If we keep the asymptotic part, that makes sense. I wonder if we have
to add another "vacuum" in there (e.g.
autovacuum_vacuum_max_threshold) to be consistent with the other gucs.
I don't really know why they have that extra "vacuum" in them, though.
Makes the names so long.

- Melanie



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Why don't we support external input/output functions for the composite types