Обсуждение: vacuum visibility relevance

Поиск
Список
Период
Сортировка

vacuum visibility relevance

От
senor
Дата:
Hi All,

Simplified Scenario:
40+ gig table with 20+ indexes receiving log event type records in an append only fashion.
One of the columns is a sequence ID.
PG version 11.4

If I start a vacuum on this table when sequence ID is 1 million, and the table continues to accumulate new events, when vacuum completes, should I be thinking about the state of the table and indexes as being vacuumed up to 1 million? Or is vacuum also operating on records inserted after vacuum started? 

Is there any reason to think about this differently when the vacuum is manual as opposed to auto? 

I attempted to deal with the append only issue corrected in PG13 by setting autovacuum_freeze_max_age low enough to trigger vacuum but that has been such a wild card I abandoned it. I'm now trying to find a formula to trigger a manual vacuum. There are stats gathering processes pulling from the most recently added data all the time so my priority is to keep performance high for those. Secondly, there's a ton of memory allotted so running vacuum freeze before table pages age out of memory seems like a good idea. 

I am having trouble with autovacuum keeping up and I suspect there's a communication problem with the stats collector but have put off looking into that because updating to PG15 was "right around the corner". Meanwhile, I have had to run multiple-thread scripts to keep up vacuum. Manual vacuums always work and updates stats correctly but autovacuums hang and occupies workers doing nothing. 

I realize these are 2 completely different issues. I would love to have better understanding of the initial question but it feels like I may not have to be very aware of how that works if I knew what was causing the latter issue. Any hints, pointers and references are always appreciated.

- ​Senor

Re: vacuum visibility relevance

От
Ron Johnson
Дата:
On Sun, Dec 3, 2023 at 9:08 PM senor <frio_cervesa@hotmail.com> wrote:
Hi All,

Simplified Scenario:
40+ gig table with 20+ indexes receiving log event type records in an append only fashion.
One of the columns is a sequence ID.
PG version 11.4

If I start a vacuum on this table when sequence ID is 1 million, and the table continues to accumulate new events, when vacuum completes, should I be thinking about the state of the table and indexes as being vacuumed up to 1 million? Or is vacuum also operating on records inserted after vacuum started? 

Is there any reason to think about this differently when the vacuum is manual as opposed to auto? 

I attempted to deal with the append only issue corrected in PG13 by setting autovacuum_freeze_max_age low enough to trigger vacuum but that has been such a wild card I abandoned it. I'm now trying to find a formula to trigger a manual vacuum. There are stats gathering processes pulling from the most recently added data all the time so my priority is to keep performance high for those. Secondly, there's a ton of memory allotted so running vacuum freeze before table pages age out of memory seems like a good idea. 

I am having trouble with autovacuum keeping up and I suspect there's a communication problem with the stats collector but have put off looking into that because updating to PG15 was "right around the corner". Meanwhile, I have had to run multiple-thread scripts to keep up vacuum. Manual vacuums always work and updates stats correctly but autovacuums hang and occupies workers doing nothing. 

I realize these are 2 completely different issues. I would love to have better understanding of the initial question but it feels like I may not have to be very aware of how that works if I knew what was causing the latter issue. Any hints, pointers and references are always appreciated.

What do you have autovacuum_vacuum_scale_factor set to?  The default 0.1 is almost certainly way too high. 

Re: vacuum visibility relevance

От
Laurenz Albe
Дата:
On Mon, 2023-12-04 at 02:08 +0000, senor wrote:
> Simplified Scenario:
> 40+ gig table with 20+ indexes receiving log event type records in an append only fashion.

Log tables typically receive tons of INSERTs.  Over twenty indexes are too many.

> One of the columns is a sequence ID.
> PG version 11.4
>
>
> If I start a vacuum on this table when sequence ID is 1 million, and the table continues
> to accumulate new events, when vacuum completes, should I be thinking about the state of
> the table and indexes as being vacuumed up to 1 million? Or is vacuum also operating on
> records inserted after vacuum started?

VACUUM processes the whole table, but it determines an "xmin horizon" when it starts,
that is, it determines which tuples are safe to remove.  Rows added after VACUUM started
won't qualify, so they won't be removed.

> Is there any reason to think about this differently when the vacuum is manual as opposed to auto?

No, the behavior is the same.

> I attempted to deal with the append only issue corrected in PG13 by setting
> autovacuum_freeze_max_age low enough to trigger vacuum but that has been such a wild card
> I abandoned it. I'm now trying to find a formula to trigger a manual vacuum. There are
> stats gathering processes pulling from the most recently added data all the time so my
> priority is to keep performance high for those. Secondly, there's a ton of memory allotted
> so running vacuum freeze before table pages age out of memory seems like a good idea.

You can monitor the "n_tup_ins" column in "pg_stat_user_tables" and run VACUUM whebever
it has increased enough.

But the simple solution is to upgrade.  Any version older than v13 is due for an upgrade soon.

> I am having trouble with autovacuum keeping up and I suspect there's a communication
> problem with the stats collector but have put off looking into that because updating
> to PG15 was "right around the corner". Meanwhile, I have had to run multiple-thread
> scripts to keep up vacuum. Manual vacuums always work and updates stats correctly but
> autovacuums hang and occupies workers doing nothing.

Autovacuum deliberately runs slower than a manual VACUUM.
If you are running a version older than v12, the default value for
"autovacuum_vacuum_cost_delay" is 20ms, which makes autovacuum really slow.  You should
change that parameter to 2ms, and if that is still too slow, use ALTER TABLE to set it
to 0 on the affected table.  Then autovacuum will run as fast as manual VACUUM.

In addition, see that "maintenance_work_mem" is set high to make autovacuum fast.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



Re: vacuum visibility relevance

От
David Rowley
Дата:
On Mon, 4 Dec 2023 at 15:08, senor <frio_cervesa@hotmail.com> wrote:
> PG version 11.4

You're missing out on over 3 years of bug fixes running that version.
I see you're planning an upgrade to v15. You should update to 11.22 in
the meantime. That's the latest and final version of v11.

> If I start a vacuum on this table when sequence ID is 1 million, and the table continues to accumulate new events,
whenvacuum completes, should I be thinking about the state of the table and indexes as being vacuumed up to 1 million?
Oris vacuum also operating on records inserted after vacuum started? 

The transaction ID thresholds are decided at the start of vacuum, so
tuples that become dead after vacuum starts won't be touched.

> Is there any reason to think about this differently when the vacuum is manual as opposed to auto?

Not in terms of what will be removed, but there are performance
considerations. vacuum_cost_delay is 0 but default but
autovacuum_vacuum_cost_delay isn't. I believe it's 20ms for v11.  That
means the default settings allow manual VACUUMs to run quickly, but
autovacuums are throttled to be quite slow.

> I attempted to deal with the append only issue corrected in PG13 by setting autovacuum_freeze_max_age low enough to
triggervacuum but that has been such a wild card I abandoned it. I'm now trying to find a formula to trigger a manual
vacuum.There are stats gathering processes pulling from the most recently added data all the time so my priority is to
keepperformance high for those. Secondly, there's a ton of memory allotted so running vacuum freeze before table pages
ageout of memory seems like a good idea. 

Perhaps you could set autovacuum_analyze_scale_factor to something
higher for the tables in question use
pg_stat_all_tables.n_mod_since_analyze and do a manual VACUUM ANALYZE.
If autovacuum were ever to ANALYZE the tables in question then this
would cause issues.

> I am having trouble with autovacuum keeping up and I suspect there's a communication problem with the stats collector
buthave put off looking into that because updating to PG15 was "right around the corner". Meanwhile, I have had to run
multiple-threadscripts to keep up vacuum. Manual vacuums always work and updates stats correctly but autovacuums hang
andoccupies workers doing nothing. 

If you find all the autovacuum workers busy most of the time, then
you'll likely want to reduce autovacuum_vacuum_cost_delay to increase
the speed of the auto-vacuum workers.

David