Обсуждение: Add mode column to pg_stat_progress_vacuum

Поиск
Список
Период
Сортировка

Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
Hi hackers,

I would like to propose a patch that enhances the
pg_stat_progress_vacuum view by adding a mode column. The patch is
attached.

Although it is possible to identify an anti-wraparound VACUUM through
the process title (to prevent wraparound) or specific log entries, it
would be significantly more convenient for monitoring purposes to have
this status clearly indicated in the pg_stat_progress_vacuum view.
This would enable DBAs to immediately understand the urgency of the
vacuum process without needing to check separate logs or system
processes.

This patch introduces a mode column to provide this visibility. The
possible values are:
- normal: A standard, user-initiated VACUUM or a regular autovacuum run.
- anti-wraparound: An autovacuum run launched specifically to prevent
transaction ID wraparound.
- failsafe: A vacuum that has entered failsafe mode to prevent
imminent transaction ID wraparound.

This will allow administrators to better understand the context and
urgency of vacuum operations, which is crucial for monitoring and
troubleshooting.

Design Considerations:
When defining the scope of the anti-wraparound mode, I considered
including manual commands like VACUUM (FREEZE) or VACUUM
(DISABLE_PAGE_SKIPPING). However, I decided against this to keep the
meaning of the mode clear and simple. These options can be used for
various purposes, and overloading the anti-wraparound mode with too
many meanings could become confusing. Therefore, the current
implementation limits this mode to autovacuum runs that are explicitly
launched for wraparound prevention.

Regarding Testing:
I was able to manually verify the failsafe mode's behavior by using
the existing test script at
src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl. This script
successfully triggered the failsafe condition and the view reported
the correct mode. However, I found this test to be somewhat flaky in
my environment and decided not to add it to the patch to avoid
introducing a potentially unstable test into the tree.

Thought?

--
Best regards,
Shinya Kato
NTT OSS Center

Вложения

Re: Add mode column to pg_stat_progress_vacuum

От
Kirill Reshke
Дата:
On Thu, 14 Aug 2025 at 16:13, Shinya Kato <shinya11.kato@gmail.com> wrote:

> This patch introduces a mode column to provide this visibility. The
> possible values are:
> - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> - anti-wraparound: An autovacuum run launched specifically to prevent
> transaction ID wraparound.
> - failsafe: A vacuum that has entered failsafe mode to prevent
> imminent transaction ID wraparound.

> Thought?

Just a small comment:

I am more used to Lazy vs Eager vacuum types. It is how we use to call
them in doc and code. Maybe this wording will be better?



-- 
Best regards,
Kirill Reshke



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Thu, Aug 14, 2025 at 9:20 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Thu, 14 Aug 2025 at 16:13, Shinya Kato <shinya11.kato@gmail.com> wrote:
>
> > This patch introduces a mode column to provide this visibility. The
> > possible values are:
> > - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> > - anti-wraparound: An autovacuum run launched specifically to prevent
> > transaction ID wraparound.
> > - failsafe: A vacuum that has entered failsafe mode to prevent
> > imminent transaction ID wraparound.
>
> > Thought?
>
> Just a small comment:
>
> I am more used to Lazy vs Eager vacuum types. It is how we use to call
> them in doc and code. Maybe this wording will be better?

Thanks for the feedback!

Are you suggesting it would be better to change "normal" to "lazy" and
"anti-wraparound" to "eager"? My hesitation is that "lazy" is a term
used to contrast with VACUUM FULL, and the "lazy" vs. "eager"
distinction also exists for tuple freezing logic within vacuum.
Reusing these terms for a different purpose could be confusing. I also
find "anti-wraparound" to be a much clearer and more descriptive term
than "eager".

Unless there’s a strong preference otherwise, I’d like to keep
“anti-wraparound” and “failsafe” as-is, and keep “normal” (or possibly
“plain”/“regular” if that reads better).

--
Best regards,
Shinya Kato
NTT OSS Center



Re: Add mode column to pg_stat_progress_vacuum

От
Kirill Reshke
Дата:
On Fri, 15 Aug 2025 at 09:19, Shinya Kato <shinya11.kato@gmail.com> wrote:
eager".
>
> Unless there’s a strong preference otherwise, I’d like to keep
> “anti-wraparound” and “failsafe” as-is, and keep “normal” (or possibly
> “plain”/“regular” if that reads better).

OK.


I have tested the patch a bit, and noticed that `mode` column shows
`normal` for both auto-vacuum and user-initiated vacuum (via VACUUM
utility statement).
Is it ok? Maybe for more visibility we can display different values
for these two cases?


--
Best regards,
Kirill Reshke



Re: Add mode column to pg_stat_progress_vacuum

От
Nathan Bossart
Дата:
On Thu, Aug 14, 2025 at 08:12:55PM +0900, Shinya Kato wrote:
> I would like to propose a patch that enhances the
> pg_stat_progress_vacuum view by adding a mode column. The patch is
> attached.
> 
> Although it is possible to identify an anti-wraparound VACUUM through
> the process title (to prevent wraparound) or specific log entries, it
> would be significantly more convenient for monitoring purposes to have
> this status clearly indicated in the pg_stat_progress_vacuum view.
> This would enable DBAs to immediately understand the urgency of the
> vacuum process without needing to check separate logs or system
> processes.

This seems generally reasonable to me.

> This patch introduces a mode column to provide this visibility. The
> possible values are:
> - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> - anti-wraparound: An autovacuum run launched specifically to prevent
> transaction ID wraparound.
> - failsafe: A vacuum that has entered failsafe mode to prevent
> imminent transaction ID wraparound.

I wonder if we should also add "aggressive".

> I was able to manually verify the failsafe mode's behavior by using
> the existing test script at
> src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl. This script
> successfully triggered the failsafe condition and the view reported
> the correct mode. However, I found this test to be somewhat flaky in
> my environment and decided not to add it to the patch to avoid
> introducing a potentially unstable test into the tree.

Perhaps there's something we can do with injection points to improve the
stability of the test.

-- 
nathan



Re: Add mode column to pg_stat_progress_vacuum

От
Robert Treat
Дата:
On Tue, Oct 7, 2025 at 11:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Thu, Aug 14, 2025 at 08:12:55PM +0900, Shinya Kato wrote:
> > I would like to propose a patch that enhances the
> > pg_stat_progress_vacuum view by adding a mode column. The patch is
> > attached.
> >
> > Although it is possible to identify an anti-wraparound VACUUM through
> > the process title (to prevent wraparound) or specific log entries, it
> > would be significantly more convenient for monitoring purposes to have
> > this status clearly indicated in the pg_stat_progress_vacuum view.
> > This would enable DBAs to immediately understand the urgency of the
> > vacuum process without needing to check separate logs or system
> > processes.
>
> This seems generally reasonable to me.
>

There is a bit of an issue that an anti-wraparound vacuum is not in
and of itself urgent, especially not with our defaults, so I have a
little bit of concern that this patch could be mis-leading, but that
isn't exactly an argument against the merits of it.

> > This patch introduces a mode column to provide this visibility. The
> > possible values are:
> > - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> > - anti-wraparound: An autovacuum run launched specifically to prevent
> > transaction ID wraparound.
> > - failsafe: A vacuum that has entered failsafe mode to prevent
> > imminent transaction ID wraparound.
>

I think we should probably split out manual vacuums, which can be run
for a whole host of different reasons. I'd suggest a mode of "manual"
for those, and probably "standard" for a regular autovacuum run.

> I wonder if we should also add "aggressive".
>

I don't think so. I feel like the point of the mode is to answer "why
is this vacuum running" not "how is it operating under the hood".


Robert Treat
https://xzilla.net



Re: Add mode column to pg_stat_progress_vacuum

От
Nathan Bossart
Дата:
On Tue, Oct 07, 2025 at 11:50:46AM -0400, Robert Treat wrote:
> On Tue, Oct 7, 2025 at 11:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> I wonder if we should also add "aggressive".
> 
> I don't think so. I feel like the point of the mode is to answer "why
> is this vacuum running" not "how is it operating under the hood".

To some extent, those are tied together.  For example, a failsafe vacuum is
an anti-wraparound vacuum that skips index vacuuming, etc.  And an
anti-wraparound vacuum implies an aggressive scan, but not vice versa.
There's also a separate parameter (vacuum_freeze_table_age) that controls
when vacuum decides to perform an aggressive scan, just like there exists a
parameter for anti-wraparound vacuums (autovacuum_freeze_max_age) and
failsafe vacuums (vacuum_failsafe_age).

-- 
nathan



Re: Add mode column to pg_stat_progress_vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 7, 2025 at 9:26 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Oct 07, 2025 at 11:50:46AM -0400, Robert Treat wrote:
> > On Tue, Oct 7, 2025 at 11:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> I wonder if we should also add "aggressive".
> >
> > I don't think so. I feel like the point of the mode is to answer "why
> > is this vacuum running" not "how is it operating under the hood".
>
> To some extent, those are tied together.  For example, a failsafe vacuum is
> an anti-wraparound vacuum that skips index vacuuming, etc.  And an
> anti-wraparound vacuum implies an aggressive scan, but not vice versa.
> There's also a separate parameter (vacuum_freeze_table_age) that controls
> when vacuum decides to perform an aggressive scan, just like there exists a
> parameter for anti-wraparound vacuums (autovacuum_freeze_max_age) and
> failsafe vacuums (vacuum_failsafe_age).

Right. I think we cannot display both things in one mode column. Since
both manual vacuums and anti-wraparound autovacuums can enter the
failsafe mode dynamically, if we show "failsafe" in the mode column,
we would lose the information "why is this vacuum running". I guess we
would need separate columns. For example, I guess that the column
showing "how is it operating under the hood" can have three values:
"normal", "aggressive" (disables VM optimization), and "failsafe"
(implies aggressive vacuum and disables many things to prioritize XID
freezing).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add mode column to pg_stat_progress_vacuum

От
Sami Imseih
Дата:
I am bit late to this thread, but I have a comment.

> This patch introduces a mode column to provide this visibility. The
> possible values are:
> - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> - anti-wraparound: An autovacuum run launched specifically to prevent
> transaction ID wraparound.
> - failsafe: A vacuum that has entered failsafe mode to prevent
> imminent transaction ID wraparound.

The vacuum command detail can now be determined from
pg_stat_activity.query by joining with pg_stat_progress_vacuum, right?
I don't see why this is not sufficient, especially because it already
indicates how the vacuum was triggered, and the autovacuum activity
message also tells you why it was triggered. We could perhaps add "due to
failsafe" to the autovacuum activity message to explicitly show that reason.

--
Sami Imseih
Amazon Web Services (AWS)



Re: Add mode column to pg_stat_progress_vacuum

От
Nathan Bossart
Дата:
On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote:
> Right. I think we cannot display both things in one mode column. Since
> both manual vacuums and anti-wraparound autovacuums can enter the
> failsafe mode dynamically, if we show "failsafe" in the mode column,
> we would lose the information "why is this vacuum running". I guess we
> would need separate columns. For example, I guess that the column
> showing "how is it operating under the hood" can have three values:
> "normal", "aggressive" (disables VM optimization), and "failsafe"
> (implies aggressive vacuum and disables many things to prioritize XID
> freezing).

Am I understanding correctly that your idea is to have a "reason" column
that would have values like "manual", "normal autovacuum", and "autovacuum
for wraparound", and a "mode" column that would have values like "normal",
"agressive", and "failsafe"?  I wonder if we could be even more granular
for the "normal autovacuum" case and point to the reason the table was
chosen.  For example, was it the insert threshold, the update/delete
threshold, etc.?

-- 
nathan



Re: Add mode column to pg_stat_progress_vacuum

От
Nathan Bossart
Дата:
On Tue, Oct 07, 2025 at 12:45:12PM -0500, Sami Imseih wrote:
> The vacuum command detail can now be determined from
> pg_stat_activity.query by joining with pg_stat_progress_vacuum, right?
> I don't see why this is not sufficient, especially because it already
> indicates how the vacuum was triggered, and the autovacuum activity
> message also tells you why it was triggered. We could perhaps add "due to
> failsafe" to the autovacuum activity message to explicitly show that reason.

Eh, IMHO requiring users to look for a certain substring in the query field
doesn't seem especially user-friendly to me.  (I was going to point out
that it's undocumented, too, but it is in fact documented [0].)

[0] https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM

-- 
nathan



Re: Add mode column to pg_stat_progress_vacuum

От
Sami Imseih
Дата:
> On Tue, Oct 07, 2025 at 12:45:12PM -0500, Sami Imseih wrote:
> > The vacuum command detail can now be determined from
> > pg_stat_activity.query by joining with pg_stat_progress_vacuum, right?
> > I don't see why this is not sufficient, especially because it already
> > indicates how the vacuum was triggered, and the autovacuum activity
> > message also tells you why it was triggered. We could perhaps add "due to
> > failsafe" to the autovacuum activity message to explicitly show that reason.
>
> Eh, IMHO requiring users to look for a certain substring in the query field
> doesn't seem especially user-friendly to me.  (I was going to point out
> that it's undocumented, too, but it is in fact documented [0].)

I am not sure if it's a bad user experience. In my experience that string
is quite easy to parse.

Also, It is also common for a DBA to have to reference
pg_stat_activity anyhow to determine how long the vacuum been
running for, wait events, etc. IMO, the progress view is the wrong place
for all information that is static ( does not change from the start of the
command ) and can be derived from the query string.

>> Right. I think we cannot display both things in one mode column. Since
>> both manual vacuums and anti-wraparound autovacuums can enter the
>> failsafe mode dynamically, if we show "failsafe" in the mode column,
>> we would lose the information "why is this vacuum running". I guess we
>> would need separate columns. For example, I guess that the column
>> showing "how is it operating under the hood" can have three values:
>> "normal", "aggressive" (disables VM optimization), and "failsafe"
>> (implies aggressive vacuum and disables many things to prioritize XID
>> freezing).

> Am I understanding correctly that your idea is to have a "reason" column
> that would have values like "manual", "normal autovacuum", and "autovacuum
> for wraparound", and a "mode" column that would have values like "normal",
> "agressive", and "failsafe"?  I wonder if we could be even more granular
> for the "normal autovacuum" case and point to the reason the table was
> chosen.  For example, was it the insert threshold, the update/delete
> threshold, etc.?

ahh, it's true that failsafe can trigger while an (auto)vacuum is in progress,
the check does not happen at the start, but in places like the main loop
of lazy_scan_heap. Since "failsafe" can be flipped on in-flight, I can see
that being a useful (bool?) field in the progress view.

--
Sami



Re: Add mode column to pg_stat_progress_vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote:
> > Right. I think we cannot display both things in one mode column. Since
> > both manual vacuums and anti-wraparound autovacuums can enter the
> > failsafe mode dynamically, if we show "failsafe" in the mode column,
> > we would lose the information "why is this vacuum running". I guess we
> > would need separate columns. For example, I guess that the column
> > showing "how is it operating under the hood" can have three values:
> > "normal", "aggressive" (disables VM optimization), and "failsafe"
> > (implies aggressive vacuum and disables many things to prioritize XID
> > freezing).
>
> Am I understanding correctly that your idea is to have a "reason" column
> that would have values like "manual", "normal autovacuum", and "autovacuum
> for wraparound", and a "mode" column that would have values like "normal",
> "agressive", and "failsafe"?

Right. The first column provides an insight into whether or not the
running vacuum is cancellable, and the second column provides
information on how vacuums are actually processing tables under the
hood. Users are able to get the former information by checking
pg_stat_activity too but the latter information is available only in
server logs.

>  I wonder if we could be even more granular
> for the "normal autovacuum" case and point to the reason the table was
> chosen.  For example, was it the insert threshold, the update/delete
> threshold, etc.?

Sounds like reasonable information. I guess we might want to have such
information in a cumulative statistics view but do you think it's
better to have it in a dynamic statistics view?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add mode column to pg_stat_progress_vacuum

От
Sami Imseih
Дата:
> >  I wonder if we could be even more granular
> > for the "normal autovacuum" case and point to the reason the table was
> > chosen.  For example, was it the insert threshold, the update/delete
> > threshold, etc.?
>
> Sounds like reasonable information. I guess we might want to have such
> information in a cumulative statistics view but do you think it's
> better to have it in a dynamic statistics view?

+1 for this information in cumulative stats, on a per table level for sure.
I do think however the pg_stat_all_tables views is getting too wide
and moving new relation vacuum stats to a separate stats view will
be very useful.

--
Sami



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Wed, Oct 8, 2025 at 12:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> > I was able to manually verify the failsafe mode's behavior by using
> > the existing test script at
> > src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl. This script
> > successfully triggered the failsafe condition and the view reported
> > the correct mode. However, I found this test to be somewhat flaky in
> > my environment and decided not to add it to the patch to avoid
> > introducing a potentially unstable test into the tree.
>
> Perhaps there's something we can do with injection points to improve the
> stability of the test.

Thank you for the advice. You're right, I can test it with injection
point. However, many other progress reporting views do not have
regression tests, so I do not have to add a regression test of
pg_stat_progress_report.




--
Best regards,
Shinya Kato
NTT OSS Center



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Wed, Oct 8, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >
> > On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote:
> > > Right. I think we cannot display both things in one mode column. Since
> > > both manual vacuums and anti-wraparound autovacuums can enter the
> > > failsafe mode dynamically, if we show "failsafe" in the mode column,
> > > we would lose the information "why is this vacuum running". I guess we
> > > would need separate columns. For example, I guess that the column
> > > showing "how is it operating under the hood" can have three values:
> > > "normal", "aggressive" (disables VM optimization), and "failsafe"
> > > (implies aggressive vacuum and disables many things to prioritize XID
> > > freezing).
> >
> > Am I understanding correctly that your idea is to have a "reason" column
> > that would have values like "manual", "normal autovacuum", and "autovacuum
> > for wraparound", and a "mode" column that would have values like "normal",
> > "agressive", and "failsafe"?
>
> Right. The first column provides an insight into whether or not the
> running vacuum is cancellable, and the second column provides
> information on how vacuums are actually processing tables under the
> hood. Users are able to get the former information by checking
> pg_stat_activity too but the latter information is available only in
> server logs.

Thanks for the clarification. I agree with your proposal. Separating
the "reason" from the "mode" into two columns is a great idea that
will provide much clearer insight for DBAs.


--
Best regards,
Shinya Kato
NTT OSS Center



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Wed, Oct 8, 2025 at 4:40 AM Sami Imseih <samimseih@gmail.com> wrote:
>
> > >  I wonder if we could be even more granular
> > > for the "normal autovacuum" case and point to the reason the table was
> > > chosen.  For example, was it the insert threshold, the update/delete
> > > threshold, etc.?
> >
> > Sounds like reasonable information. I guess we might want to have such
> > information in a cumulative statistics view but do you think it's
> > better to have it in a dynamic statistics view?
>
> +1 for this information in cumulative stats, on a per table level for sure.
> I do think however the pg_stat_all_tables views is getting too wide
> and moving new relation vacuum stats to a separate stats view will
> be very useful.

Thanks for the discussion.

IIUC are you suggesting I add such a last_autovacuum_reason column to
pg_stat_all_tables, which would be populated with one of the following
values?
- autovacuum_vacuum_threshold
- autovacuum_vacuum_insert_threshold
- autovacuum_freeze_max_age
- autovacuum_multixact_freeze_max_age

(For consistency, I should probably add  a last_autoanalyze_reason
column to pg_stat_all_tables as well.)

--
Best regards,
Shinya Kato
NTT OSS Center



Re: Add mode column to pg_stat_progress_vacuum

От
Sami Imseih
Дата:
> > > >  I wonder if we could be even more granular
> > > > for the "normal autovacuum" case and point to the reason the table was
> > > > chosen.  For example, was it the insert threshold, the update/delete
> > > > threshold, etc.?
> > >
> > > Sounds like reasonable information. I guess we might want to have such
> > > information in a cumulative statistics view but do you think it's
> > > better to have it in a dynamic statistics view?
> >
> > +1 for this information in cumulative stats, on a per table level for sure.
> > I do think however the pg_stat_all_tables views is getting too wide
> > and moving new relation vacuum stats to a separate stats view will
> > be very useful.
>
> Thanks for the discussion.
>
> IIUC are you suggesting I add such a last_autovacuum_reason column to
> pg_stat_all_tables, which would be populated with one of the following
> values?
> - autovacuum_vacuum_threshold
> - autovacuum_vacuum_insert_threshold
> - autovacuum_freeze_max_age
> - autovacuum_multixact_freeze_max_age

This should be a separate discussion. But, I would think the
counters will be n_aggressive, n_wraparound and n_failsafe.

We need to separate aggressive and wraparound due to
what is mentioned in vacuumlazy.c

/*
* While it's possible for a VACUUM to be both is_wraparound
* and !aggressive, that's just a corner-case -- is_wraparound
* implies aggressive. Produce distinct output for the corner
* case all the same, just in case.
*/

A normal vacuum is the difference of autovacuum_count and the
total of these counters.

--
Sami Imseih
Amazon Web Services (AWS)



Re: Add mode column to pg_stat_progress_vacuum

От
Michael Paquier
Дата:
On Thu, Oct 09, 2025 at 10:07:17AM -0500, Sami Imseih wrote:
>> IIUC are you suggesting I add such a last_autovacuum_reason column to
>> pg_stat_all_tables, which would be populated with one of the following
>> values?
>> - autovacuum_vacuum_threshold
>> - autovacuum_vacuum_insert_threshold
>> - autovacuum_freeze_max_age
>> - autovacuum_multixact_freeze_max_age
>
> This should be a separate discussion. But, I would think the
> counters will be n_aggressive, n_wraparound and n_failsafe.

Depends, I guess (separate discussion it should be, but I count not
resist).  If you had this information available in the cumulative
stats, what should be a "correct" set of numbers, and what could be
tuned to redirect the system so as it gets to a better set of numbers.
Wraparound autovacuums, for one, don't seem really relevant
to know about in an aggregated way.

Coming back to the original proposal.  Knowing about the state we are
kicking an autovacuum worker job for a set of tables in the progress
view would be definitely a nice thing.
--
Michael

Вложения

Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Thu, Oct 16, 2025 at 2:17 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Oct 09, 2025 at 10:07:17AM -0500, Sami Imseih wrote:
> >> IIUC are you suggesting I add such a last_autovacuum_reason column to
> >> pg_stat_all_tables, which would be populated with one of the following
> >> values?
> >> - autovacuum_vacuum_threshold
> >> - autovacuum_vacuum_insert_threshold
> >> - autovacuum_freeze_max_age
> >> - autovacuum_multixact_freeze_max_age
> >
> > This should be a separate discussion. But, I would think the
> > counters will be n_aggressive, n_wraparound and n_failsafe.
>
> Depends, I guess (separate discussion it should be, but I count not
> resist).  If you had this information available in the cumulative
> stats, what should be a "correct" set of numbers, and what could be
> tuned to redirect the system so as it gets to a better set of numbers.
> Wraparound autovacuums, for one, don't seem really relevant
> to know about in an aggregated way.

Thank you for the comment. You’re right. Let’s set this discussion
aside for now and continue it in a separate thread.

--
Best regards,
Shinya Kato
NTT OSS Center



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
Thank you all for the reviews!

On Wed, Oct 8, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >
> > On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote:
> > > Right. I think we cannot display both things in one mode column. Since
> > > both manual vacuums and anti-wraparound autovacuums can enter the
> > > failsafe mode dynamically, if we show "failsafe" in the mode column,
> > > we would lose the information "why is this vacuum running". I guess we
> > > would need separate columns. For example, I guess that the column
> > > showing "how is it operating under the hood" can have three values:
> > > "normal", "aggressive" (disables VM optimization), and "failsafe"
> > > (implies aggressive vacuum and disables many things to prioritize XID
> > > freezing).
> >
> > Am I understanding correctly that your idea is to have a "reason" column
> > that would have values like "manual", "normal autovacuum", and "autovacuum
> > for wraparound", and a "mode" column that would have values like "normal",
> > "agressive", and "failsafe"?
>
> Right. The first column provides an insight into whether or not the
> running vacuum is cancellable, and the second column provides
> information on how vacuums are actually processing tables under the
> hood. Users are able to get the former information by checking
> pg_stat_activity too but the latter information is available only in
> server logs.

I have updated the patch according to your comments. I have modified
this to display the behavior mode (normal, aggressive, failsafe) in
the mode column, and the trigger reason (manual, autovacuum,
anti-wraparound) in the reason column

--
Best regards,
Shinya Kato
NTT OSS Center

Вложения

Re: Add mode column to pg_stat_progress_vacuum

От
Masahiko Sawada
Дата:
On Fri, Oct 24, 2025 at 12:15 AM Shinya Kato <shinya11.kato@gmail.com> wrote:
>
> Thank you all for the reviews!
>
> On Wed, Oct 8, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > >
> > > On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote:
> > > > Right. I think we cannot display both things in one mode column. Since
> > > > both manual vacuums and anti-wraparound autovacuums can enter the
> > > > failsafe mode dynamically, if we show "failsafe" in the mode column,
> > > > we would lose the information "why is this vacuum running". I guess we
> > > > would need separate columns. For example, I guess that the column
> > > > showing "how is it operating under the hood" can have three values:
> > > > "normal", "aggressive" (disables VM optimization), and "failsafe"
> > > > (implies aggressive vacuum and disables many things to prioritize XID
> > > > freezing).
> > >
> > > Am I understanding correctly that your idea is to have a "reason" column
> > > that would have values like "manual", "normal autovacuum", and "autovacuum
> > > for wraparound", and a "mode" column that would have values like "normal",
> > > "agressive", and "failsafe"?
> >
> > Right. The first column provides an insight into whether or not the
> > running vacuum is cancellable, and the second column provides
> > information on how vacuums are actually processing tables under the
> > hood. Users are able to get the former information by checking
> > pg_stat_activity too but the latter information is available only in
> > server logs.
>
> I have updated the patch according to your comments. I have modified
> this to display the behavior mode (normal, aggressive, failsafe) in
> the mode column,

The new 'mode' column with the possible three values looks good to me.

>  and the trigger reason (manual, autovacuum,
> anti-wraparound) in the reason column

Showing 'anti-wraparound' value hides the fact that the process is an
autovacuum worker. How about 'ant-wraparound_autovacuum',
'autovacuum_wraparound', or something along those lines? Also, we can
probably find a better column name than 'reason'. How about 'source'
or 'triggered_by'?

I think we need to update the documentation in maintenance.sgml as
well. For instance, we can add the reference to the new columns in the
following description:

      <para>
       Autovacuum workers generally don't block other commands.  If a process
       attempts to acquire a lock that conflicts with the
       <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
       acquisition will interrupt the autovacuum.  For conflicting lock modes,
       see <xref linkend="table-lock-compatibility"/>.  However, if
the autovacuum
       is running to prevent transaction ID wraparound (i.e., the
autovacuum query
       name in the <structname>pg_stat_activity</structname> view ends with
       <literal>(to prevent wraparound)</literal>), the autovacuum is not
       automatically interrupted.
      </para>

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Thu, Oct 30, 2025 at 8:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I have updated the patch according to your comments. I have modified
> > this to display the behavior mode (normal, aggressive, failsafe) in
> > the mode column,
>
> The new 'mode' column with the possible three values looks good to me.

Thank you for the review!

> >  and the trigger reason (manual, autovacuum,
> > anti-wraparound) in the reason column
>
> Showing 'anti-wraparound' value hides the fact that the process is an
> autovacuum worker. How about 'ant-wraparound_autovacuum',
> 'autovacuum_wraparound', or something along those lines?

I think 'autovacuum_wraparound' is better because it's shorter and
simpler. I've updated the patch to use it.

> Also, we can
> probably find a better column name than 'reason'. How about 'source'
> or 'triggered_by'?

I've changed it to use 'triggered_by' because 'source' is overloaded
and can be interpreted as data origin, I/O source, or WAL source in
other contexts, making it ambiguous. Also, I've updated the docs from
"The reason why the current vacuum started" to "The trigger of the
current vacuum operation".

> I think we need to update the documentation in maintenance.sgml as
> well. For instance, we can add the reference to the new columns in the
> following description:
>
>       <para>
>        Autovacuum workers generally don't block other commands.  If a process
>        attempts to acquire a lock that conflicts with the
>        <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
>        acquisition will interrupt the autovacuum.  For conflicting lock modes,
>        see <xref linkend="table-lock-compatibility"/>.  However, if
> the autovacuum
>        is running to prevent transaction ID wraparound (i.e., the
> autovacuum query
>        name in the <structname>pg_stat_activity</structname> view ends with
>        <literal>(to prevent wraparound)</literal>), the autovacuum is not
>        automatically interrupted.
>       </para>

I've added a reference to the triggered_by column in the
pg_stat_progress_vacuum view.

--
Best regards,
Shinya Kato
NTT OSS Center

Вложения

Re: Add mode column to pg_stat_progress_vacuum

От
Masahiko Sawada
Дата:
On Thu, Oct 30, 2025 at 12:39 AM Shinya Kato <shinya11.kato@gmail.com> wrote:
>
> On Thu, Oct 30, 2025 at 8:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > I have updated the patch according to your comments. I have modified
> > > this to display the behavior mode (normal, aggressive, failsafe) in
> > > the mode column,
> >
> > The new 'mode' column with the possible three values looks good to me.
>
> Thank you for the review!
>
> > >  and the trigger reason (manual, autovacuum,
> > > anti-wraparound) in the reason column
> >
> > Showing 'anti-wraparound' value hides the fact that the process is an
> > autovacuum worker. How about 'ant-wraparound_autovacuum',
> > 'autovacuum_wraparound', or something along those lines?
>
> I think 'autovacuum_wraparound' is better because it's shorter and
> simpler. I've updated the patch to use it.
>
> > Also, we can
> > probably find a better column name than 'reason'. How about 'source'
> > or 'triggered_by'?
>
> I've changed it to use 'triggered_by' because 'source' is overloaded
> and can be interpreted as data origin, I/O source, or WAL source in
> other contexts, making it ambiguous. Also, I've updated the docs from
> "The reason why the current vacuum started" to "The trigger of the
> current vacuum operation".
>
> > I think we need to update the documentation in maintenance.sgml as
> > well. For instance, we can add the reference to the new columns in the
> > following description:
> >
> >       <para>
> >        Autovacuum workers generally don't block other commands.  If a process
> >        attempts to acquire a lock that conflicts with the
> >        <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
> >        acquisition will interrupt the autovacuum.  For conflicting lock modes,
> >        see <xref linkend="table-lock-compatibility"/>.  However, if
> > the autovacuum
> >        is running to prevent transaction ID wraparound (i.e., the
> > autovacuum query
> >        name in the <structname>pg_stat_activity</structname> view ends with
> >        <literal>(to prevent wraparound)</literal>), the autovacuum is not
> >        automatically interrupted.
> >       </para>
>
> I've added a reference to the triggered_by column in the
> pg_stat_progress_vacuum view.

Thank you for updating the patch!

How about the following changes for the documentation changes?

+      <para>
+       The mode of the current vacuum operation.  Possible values are:

The mode in which the current VACUUM operation is running. See Section
24.1.5 for details of each mode. Possible values are:

+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>normal</literal>: A standard vacuum operation that is not
+          required to be aggressive and has not entered failsafe mode.
+         </para>

normal: The operation is performing a standard vacuum and is neither
required to run in aggressive mode nor operating in failsafe mode.

+        </listitem>
+        <listitem>
+         <para>
+          <literal>aggressive</literal>: A vacuum that must scan the entire
+          table because <xref linkend="guc-vacuum-freeze-table-age"/> or
+          <xref linkend="guc-vacuum-freeze-min-age"/> (or the corresponding
+          per-table storage parameters) required it, or because page skipping
+          was disabled via <command>VACUUM (DISABLE_PAGE_SKIPPING)</command>.

aggressive: The operation is running an aggressive vacuum that must
scan every page that is not marked all-frozen. vacuum_freeze_table_age
and vacuum_multixact_freeze_table_age control when a table is
aggressively vacuumed.

+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>failsafe</literal>: A vacuum operation that entered failsafe
+          mode when the system was at risk of transaction ID or multixact ID
+          wraparound (see <xref linkend="guc-vacuum-failsafe-age"/> and
+          <xref linkend="guc-vacuum-multixact-failsafe-age"/>).

failsafe: The operation has entered failsafe mode, in which vacuum
performs only the minimum work needed to avoid transaction ID
wraparound failure. vacuum_failsafe_age and
vacuum_multixact_failsafe_age controls when the vacuum enters failsafe
mode.

+      </para>
+      <para>
+       The trigger of the current vacuum operation.  Possible values are:

What caused the current VACUUM operation to be initiated. Possible values are:

+       <itemizedlist>
+        <listitem>
+         <para>
+          <literal>manual</literal>: Initiated by an explicit
+          <command>VACUUM</command> command.

manual: The vacuum was initiated by an explicit VACUUM command.

+          <literal>autovacuum</literal>: Launched by autovacuum based on
+          <xref linkend="guc-autovacuum-vacuum-threshold"/> or
+          <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>.

autovacuum: The vacuum was started by an autovacuum worker. Autovacuum
workers launched for this purpose are interrupted due to lock
conflicts.

+          <literal>autovacuum_wraparound</literal>: Launched by autovacuum to
+          avoid transaction ID or multixact ID wraparound (see
+          <xref linkend="vacuum-for-wraparound"/> as well as
+          <xref linkend="guc-autovacuum-freeze-max-age"/> and
+          <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>).

autovacuum_wraparound: The vacuum was started by an autovacuum worker
to prevent transaction ID or multixact ID wraparound. Autovacuum
workers launched for this purpose are not interrupted because of lock
conflicts.

---
+       /* Reset the progress counters and the mode */
+       pgstat_progress_update_multi_param(3, progress_index, progress_val);

This change seems not correct to me since we don't reset the mode. I'd
change it to:

/* Reset the progress counters and set the failsafe mode */

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Thu, Nov 13, 2025 at 11:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Thank you for updating the patch!

Thank you for the review!

> How about the following changes for the documentation changes?
>
> +      <para>
> +       The mode of the current vacuum operation.  Possible values are:
>
> The mode in which the current VACUUM operation is running. See Section
> 24.1.5 for details of each mode. Possible values are:
>
> +       <itemizedlist>
> +        <listitem>
> +         <para>
> +          <literal>normal</literal>: A standard vacuum operation that is not
> +          required to be aggressive and has not entered failsafe mode.
> +         </para>
>
> normal: The operation is performing a standard vacuum and is neither
> required to run in aggressive mode nor operating in failsafe mode.
>
> +        </listitem>
> +        <listitem>
> +         <para>
> +          <literal>aggressive</literal>: A vacuum that must scan the entire
> +          table because <xref linkend="guc-vacuum-freeze-table-age"/> or
> +          <xref linkend="guc-vacuum-freeze-min-age"/> (or the corresponding
> +          per-table storage parameters) required it, or because page skipping
> +          was disabled via <command>VACUUM (DISABLE_PAGE_SKIPPING)</command>.
>
> aggressive: The operation is running an aggressive vacuum that must
> scan every page that is not marked all-frozen. vacuum_freeze_table_age
> and vacuum_multixact_freeze_table_age control when a table is
> aggressively vacuumed.
>
> +         </para>
> +        </listitem>
> +        <listitem>
> +         <para>
> +          <literal>failsafe</literal>: A vacuum operation that entered failsafe
> +          mode when the system was at risk of transaction ID or multixact ID
> +          wraparound (see <xref linkend="guc-vacuum-failsafe-age"/> and
> +          <xref linkend="guc-vacuum-multixact-failsafe-age"/>).
>
> failsafe: The operation has entered failsafe mode, in which vacuum
> performs only the minimum work needed to avoid transaction ID
> wraparound failure. vacuum_failsafe_age and
> vacuum_multixact_failsafe_age controls when the vacuum enters failsafe
> mode.

Fixed.

> +      </para>
> +      <para>
> +       The trigger of the current vacuum operation.  Possible values are:
>
> What caused the current VACUUM operation to be initiated. Possible values are:
>
> +       <itemizedlist>
> +        <listitem>
> +         <para>
> +          <literal>manual</literal>: Initiated by an explicit
> +          <command>VACUUM</command> command.
>
> manual: The vacuum was initiated by an explicit VACUUM command.
>
> +          <literal>autovacuum</literal>: Launched by autovacuum based on
> +          <xref linkend="guc-autovacuum-vacuum-threshold"/> or
> +          <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>.
>
> autovacuum: The vacuum was started by an autovacuum worker. Autovacuum
> workers launched for this purpose are interrupted due to lock
> conflicts.
>
> +          <literal>autovacuum_wraparound</literal>: Launched by autovacuum to
> +          avoid transaction ID or multixact ID wraparound (see
> +          <xref linkend="vacuum-for-wraparound"/> as well as
> +          <xref linkend="guc-autovacuum-freeze-max-age"/> and
> +          <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>).
>
> autovacuum_wraparound: The vacuum was started by an autovacuum worker
> to prevent transaction ID or multixact ID wraparound. Autovacuum
> workers launched for this purpose are not interrupted because of lock
> conflicts.

Fixed, but I have a comment. I noticed minor wording inconsistencies,
e.g., 'started' vs. 'initiated' and 'due to' vs. 'because of'. Should
I unify these terms?

> ---
> +       /* Reset the progress counters and the mode */
> +       pgstat_progress_update_multi_param(3, progress_index, progress_val);
>
> This change seems not correct to me since we don't reset the mode. I'd
> change it to:
>
> /* Reset the progress counters and set the failsafe mode */

Fixed.


--
Best regards,
Shinya Kato
NTT OSS Center

Вложения