Re: autovacuum: change priority of the vacuumed tables

Поиск
Список
Период
Сортировка
От Marco Garavello
Тема Re: autovacuum: change priority of the vacuumed tables
Дата
Msg-id CAFp+5ESwDJDL=nVrhMz0X-bBoeBVQtUhboT0X8QQ1QW_jGwL2g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: autovacuum: change priority of the vacuumed tables  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hello all,

I've just read this thread as I googled for "how to give to autovacuum priority on a table"
I'm looking this because I have large table with lot of inserts/updates, and to improve the system I want to force the first free autovacuum worker to work on that table, of course if condition "such table is in need-to-be-vacuumed-state (depending on autovacuum_scale_factor parameter)" is satisfed.
I found that autovacuum_scale_factor can be specified for each table, so I think this could do the trick, isn't it?
From my understanding specifying it lower on the target table should give an implicit higher priority to that table.


Distinti Saluti / Kind Regards

Marco Garavello | IoT Cloud Operations Specialist



Il giorno ven 14 gen 2022 alle ore 12:55 Tomas Vondra <tomas.vondra@2ndquadrant.com> ha scritto:

On 03/03/2018 10:21 PM, Jim Nasby wrote:
> On 3/3/18 2:53 PM, Tomas Vondra wrote:
>> That largely depends on what knobs would be exposed. I'm against adding
>> some low-level knobs that perhaps 1% of the users will know how to tune,
>> and the rest will set it incorrectly. Some high-level options that would
>> specify the workload type might work, but I have no idea about details.
>
> Not knowing about details is why we've been stuck here for years: it's
> not terribly obvious how to create a scheduler that is going to work in
> all situations. Current autovac is great for 80% of situations, but it
> simply doesn't handle the remaining 20% by itself. Once you're pushing
> your IO limits you *have* to start scheduling manual vacuums for any
> critical tables.
>
> At least if we exposed some low level ability to control autovac
> workers then others could create tools to improve the situation.
> Currently that's not possible because manual vacuum lacks features
> that autovac has.
>

I have my doubts about both points - usefulness of low-level controls
and viability of tools built on them.

Firstly, my hunch is that if we knew what low-level controls to expose,
it would pretty much how to implement the tool internally. Exposing
something just because you home someone will find a use for that seems
like a dead-end to me. So, which controls would you expose?

Second, all the statistics used to decide which tables need vacuuming
are already exposed, and we have things like bgworkers etc. So you could
go and write a custom autovacuum today - copy the autovacuum code, tweak
the scheduling, done. Yet no such tool emerged yet. Why is that?

>>> One fairly simple option would be to simply replace the logic
>>> that currently builds a worker's table list with running a query
>>> via SPI. That would allow for prioritizing important tables. It
>>> could also reduce the problem of workers getting "stuck" on a ton
>>> of large tables by taking into consideration the total number of
>>> pages/tuples a list contains.
>>>
>> I don't see why SPI would be needed to do that, i.e. why couldn't
>> we implement such prioritization with the current approach. Another
>> thing
>
> Sure, it's just a SMOC. But most of the issue here is actually a
> query problem. I suspect that the current code would actually shrink
> if converted to SPI. In any case, I'm not wed to that idea.
>

I disagree this a "query problem" - it certainly is not the case that
simply prioritizing the tables differently will make a difference. Or
more precisely, it certainly does not solve the autovacuum issues I'm
thinking about. I have no idea which issues are you trying to solve,
because you haven't really described those.

>> is I really doubt prioritizing "important tables" is an good solution,
>> as it does not really guarantee anything.
>
> If by "important" you mean small tables with high update rates,
> prioritizing those actually would help as long as you have free workers.
> By itself it doesn't gain all that much though.
>

Which is why I mentioned we could have separate pools of autovacuum
workers - one for regular tables, one for "important" ones.

>>> A more fine-grained approach would be to have workers make a new
>>> selection after every vacuum they complete. That would provide
>>> the ultimate in control, since you'd be able to see exactly what
>>> all the other workers are doing.
>> That was proposed earlier in this thread, and the issue is it may
>> starve all the other tables when the "important" tables need
>> cleanup all the time.
>
> There's plenty of other ways to shoot yourself in the foot in that
> regard already. We can always have safeguards in place if we get too
> close to wrap-around, just like we currently do.

I haven't mentioned wraparound at all.

My point is that if you entirely ignore some tables because "important"
ones need constant cleanup (saturating the total autovacuum capacity),
then you'll end up with extreme bloat in those other tables. And then
you will need to do more work to clean them up, which will likely cause
delays in cleaning up the important ones.


FWIW I find this discussion rather too hand-wavy, because I have no idea
what controls would you like to expose, etc. If you have an idea, please
write a patch or at least a proposal explaining the details.

regards

--
Tomas Vondra                  https://urldefense.com/v3/__http://www.2ndQuadrant.com__;!!Ck4v2Rc!h1TqYTvzyQn1mDOR119nZYsN26gRVhRZ3UIT7sQT60SgRR7833KpQ3CtJ6l0JpMIEt3JBbd09ZZ9-N3U5MHRn7r7RU4NEiRE$
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [PATCH] Allow multiple recursive self-references
Следующее
От: "tanghy.fnst@fujitsu.com"
Дата:
Сообщение: RE: Column Filtering in Logical Replication