Обсуждение: Add mode column to pg_stat_progress_vacuum

Поиск
Список
Период
Сортировка

Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
Hi hackers,

I would like to propose a patch that enhances the
pg_stat_progress_vacuum view by adding a mode column. The patch is
attached.

Although it is possible to identify an anti-wraparound VACUUM through
the process title (to prevent wraparound) or specific log entries, it
would be significantly more convenient for monitoring purposes to have
this status clearly indicated in the pg_stat_progress_vacuum view.
This would enable DBAs to immediately understand the urgency of the
vacuum process without needing to check separate logs or system
processes.

This patch introduces a mode column to provide this visibility. The
possible values are:
- normal: A standard, user-initiated VACUUM or a regular autovacuum run.
- anti-wraparound: An autovacuum run launched specifically to prevent
transaction ID wraparound.
- failsafe: A vacuum that has entered failsafe mode to prevent
imminent transaction ID wraparound.

This will allow administrators to better understand the context and
urgency of vacuum operations, which is crucial for monitoring and
troubleshooting.

Design Considerations:
When defining the scope of the anti-wraparound mode, I considered
including manual commands like VACUUM (FREEZE) or VACUUM
(DISABLE_PAGE_SKIPPING). However, I decided against this to keep the
meaning of the mode clear and simple. These options can be used for
various purposes, and overloading the anti-wraparound mode with too
many meanings could become confusing. Therefore, the current
implementation limits this mode to autovacuum runs that are explicitly
launched for wraparound prevention.

Regarding Testing:
I was able to manually verify the failsafe mode's behavior by using
the existing test script at
src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl. This script
successfully triggered the failsafe condition and the view reported
the correct mode. However, I found this test to be somewhat flaky in
my environment and decided not to add it to the patch to avoid
introducing a potentially unstable test into the tree.

Thought?

--
Best regards,
Shinya Kato
NTT OSS Center

Вложения

Re: Add mode column to pg_stat_progress_vacuum

От
Kirill Reshke
Дата:
On Thu, 14 Aug 2025 at 16:13, Shinya Kato <shinya11.kato@gmail.com> wrote:

> This patch introduces a mode column to provide this visibility. The
> possible values are:
> - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> - anti-wraparound: An autovacuum run launched specifically to prevent
> transaction ID wraparound.
> - failsafe: A vacuum that has entered failsafe mode to prevent
> imminent transaction ID wraparound.

> Thought?

Just a small comment:

I am more used to Lazy vs Eager vacuum types. It is how we use to call
them in doc and code. Maybe this wording will be better?



-- 
Best regards,
Kirill Reshke



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Thu, Aug 14, 2025 at 9:20 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> On Thu, 14 Aug 2025 at 16:13, Shinya Kato <shinya11.kato@gmail.com> wrote:
>
> > This patch introduces a mode column to provide this visibility. The
> > possible values are:
> > - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> > - anti-wraparound: An autovacuum run launched specifically to prevent
> > transaction ID wraparound.
> > - failsafe: A vacuum that has entered failsafe mode to prevent
> > imminent transaction ID wraparound.
>
> > Thought?
>
> Just a small comment:
>
> I am more used to Lazy vs Eager vacuum types. It is how we use to call
> them in doc and code. Maybe this wording will be better?

Thanks for the feedback!

Are you suggesting it would be better to change "normal" to "lazy" and
"anti-wraparound" to "eager"? My hesitation is that "lazy" is a term
used to contrast with VACUUM FULL, and the "lazy" vs. "eager"
distinction also exists for tuple freezing logic within vacuum.
Reusing these terms for a different purpose could be confusing. I also
find "anti-wraparound" to be a much clearer and more descriptive term
than "eager".

Unless there’s a strong preference otherwise, I’d like to keep
“anti-wraparound” and “failsafe” as-is, and keep “normal” (or possibly
“plain”/“regular” if that reads better).

--
Best regards,
Shinya Kato
NTT OSS Center



Re: Add mode column to pg_stat_progress_vacuum

От
Kirill Reshke
Дата:
On Fri, 15 Aug 2025 at 09:19, Shinya Kato <shinya11.kato@gmail.com> wrote:
eager".
>
> Unless there’s a strong preference otherwise, I’d like to keep
> “anti-wraparound” and “failsafe” as-is, and keep “normal” (or possibly
> “plain”/“regular” if that reads better).

OK.


I have tested the patch a bit, and noticed that `mode` column shows
`normal` for both auto-vacuum and user-initiated vacuum (via VACUUM
utility statement).
Is it ok? Maybe for more visibility we can display different values
for these two cases?


--
Best regards,
Kirill Reshke



Re: Add mode column to pg_stat_progress_vacuum

От
Nathan Bossart
Дата:
On Thu, Aug 14, 2025 at 08:12:55PM +0900, Shinya Kato wrote:
> I would like to propose a patch that enhances the
> pg_stat_progress_vacuum view by adding a mode column. The patch is
> attached.
> 
> Although it is possible to identify an anti-wraparound VACUUM through
> the process title (to prevent wraparound) or specific log entries, it
> would be significantly more convenient for monitoring purposes to have
> this status clearly indicated in the pg_stat_progress_vacuum view.
> This would enable DBAs to immediately understand the urgency of the
> vacuum process without needing to check separate logs or system
> processes.

This seems generally reasonable to me.

> This patch introduces a mode column to provide this visibility. The
> possible values are:
> - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> - anti-wraparound: An autovacuum run launched specifically to prevent
> transaction ID wraparound.
> - failsafe: A vacuum that has entered failsafe mode to prevent
> imminent transaction ID wraparound.

I wonder if we should also add "aggressive".

> I was able to manually verify the failsafe mode's behavior by using
> the existing test script at
> src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl. This script
> successfully triggered the failsafe condition and the view reported
> the correct mode. However, I found this test to be somewhat flaky in
> my environment and decided not to add it to the patch to avoid
> introducing a potentially unstable test into the tree.

Perhaps there's something we can do with injection points to improve the
stability of the test.

-- 
nathan



Re: Add mode column to pg_stat_progress_vacuum

От
Robert Treat
Дата:
On Tue, Oct 7, 2025 at 11:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Thu, Aug 14, 2025 at 08:12:55PM +0900, Shinya Kato wrote:
> > I would like to propose a patch that enhances the
> > pg_stat_progress_vacuum view by adding a mode column. The patch is
> > attached.
> >
> > Although it is possible to identify an anti-wraparound VACUUM through
> > the process title (to prevent wraparound) or specific log entries, it
> > would be significantly more convenient for monitoring purposes to have
> > this status clearly indicated in the pg_stat_progress_vacuum view.
> > This would enable DBAs to immediately understand the urgency of the
> > vacuum process without needing to check separate logs or system
> > processes.
>
> This seems generally reasonable to me.
>

There is a bit of an issue that an anti-wraparound vacuum is not in
and of itself urgent, especially not with our defaults, so I have a
little bit of concern that this patch could be mis-leading, but that
isn't exactly an argument against the merits of it.

> > This patch introduces a mode column to provide this visibility. The
> > possible values are:
> > - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> > - anti-wraparound: An autovacuum run launched specifically to prevent
> > transaction ID wraparound.
> > - failsafe: A vacuum that has entered failsafe mode to prevent
> > imminent transaction ID wraparound.
>

I think we should probably split out manual vacuums, which can be run
for a whole host of different reasons. I'd suggest a mode of "manual"
for those, and probably "standard" for a regular autovacuum run.

> I wonder if we should also add "aggressive".
>

I don't think so. I feel like the point of the mode is to answer "why
is this vacuum running" not "how is it operating under the hood".


Robert Treat
https://xzilla.net



Re: Add mode column to pg_stat_progress_vacuum

От
Nathan Bossart
Дата:
On Tue, Oct 07, 2025 at 11:50:46AM -0400, Robert Treat wrote:
> On Tue, Oct 7, 2025 at 11:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> I wonder if we should also add "aggressive".
> 
> I don't think so. I feel like the point of the mode is to answer "why
> is this vacuum running" not "how is it operating under the hood".

To some extent, those are tied together.  For example, a failsafe vacuum is
an anti-wraparound vacuum that skips index vacuuming, etc.  And an
anti-wraparound vacuum implies an aggressive scan, but not vice versa.
There's also a separate parameter (vacuum_freeze_table_age) that controls
when vacuum decides to perform an aggressive scan, just like there exists a
parameter for anti-wraparound vacuums (autovacuum_freeze_max_age) and
failsafe vacuums (vacuum_failsafe_age).

-- 
nathan



Re: Add mode column to pg_stat_progress_vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 7, 2025 at 9:26 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Oct 07, 2025 at 11:50:46AM -0400, Robert Treat wrote:
> > On Tue, Oct 7, 2025 at 11:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> I wonder if we should also add "aggressive".
> >
> > I don't think so. I feel like the point of the mode is to answer "why
> > is this vacuum running" not "how is it operating under the hood".
>
> To some extent, those are tied together.  For example, a failsafe vacuum is
> an anti-wraparound vacuum that skips index vacuuming, etc.  And an
> anti-wraparound vacuum implies an aggressive scan, but not vice versa.
> There's also a separate parameter (vacuum_freeze_table_age) that controls
> when vacuum decides to perform an aggressive scan, just like there exists a
> parameter for anti-wraparound vacuums (autovacuum_freeze_max_age) and
> failsafe vacuums (vacuum_failsafe_age).

Right. I think we cannot display both things in one mode column. Since
both manual vacuums and anti-wraparound autovacuums can enter the
failsafe mode dynamically, if we show "failsafe" in the mode column,
we would lose the information "why is this vacuum running". I guess we
would need separate columns. For example, I guess that the column
showing "how is it operating under the hood" can have three values:
"normal", "aggressive" (disables VM optimization), and "failsafe"
(implies aggressive vacuum and disables many things to prioritize XID
freezing).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add mode column to pg_stat_progress_vacuum

От
Sami Imseih
Дата:
I am bit late to this thread, but I have a comment.

> This patch introduces a mode column to provide this visibility. The
> possible values are:
> - normal: A standard, user-initiated VACUUM or a regular autovacuum run.
> - anti-wraparound: An autovacuum run launched specifically to prevent
> transaction ID wraparound.
> - failsafe: A vacuum that has entered failsafe mode to prevent
> imminent transaction ID wraparound.

The vacuum command detail can now be determined from
pg_stat_activity.query by joining with pg_stat_progress_vacuum, right?
I don't see why this is not sufficient, especially because it already
indicates how the vacuum was triggered, and the autovacuum activity
message also tells you why it was triggered. We could perhaps add "due to
failsafe" to the autovacuum activity message to explicitly show that reason.

--
Sami Imseih
Amazon Web Services (AWS)



Re: Add mode column to pg_stat_progress_vacuum

От
Nathan Bossart
Дата:
On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote:
> Right. I think we cannot display both things in one mode column. Since
> both manual vacuums and anti-wraparound autovacuums can enter the
> failsafe mode dynamically, if we show "failsafe" in the mode column,
> we would lose the information "why is this vacuum running". I guess we
> would need separate columns. For example, I guess that the column
> showing "how is it operating under the hood" can have three values:
> "normal", "aggressive" (disables VM optimization), and "failsafe"
> (implies aggressive vacuum and disables many things to prioritize XID
> freezing).

Am I understanding correctly that your idea is to have a "reason" column
that would have values like "manual", "normal autovacuum", and "autovacuum
for wraparound", and a "mode" column that would have values like "normal",
"agressive", and "failsafe"?  I wonder if we could be even more granular
for the "normal autovacuum" case and point to the reason the table was
chosen.  For example, was it the insert threshold, the update/delete
threshold, etc.?

-- 
nathan



Re: Add mode column to pg_stat_progress_vacuum

От
Nathan Bossart
Дата:
On Tue, Oct 07, 2025 at 12:45:12PM -0500, Sami Imseih wrote:
> The vacuum command detail can now be determined from
> pg_stat_activity.query by joining with pg_stat_progress_vacuum, right?
> I don't see why this is not sufficient, especially because it already
> indicates how the vacuum was triggered, and the autovacuum activity
> message also tells you why it was triggered. We could perhaps add "due to
> failsafe" to the autovacuum activity message to explicitly show that reason.

Eh, IMHO requiring users to look for a certain substring in the query field
doesn't seem especially user-friendly to me.  (I was going to point out
that it's undocumented, too, but it is in fact documented [0].)

[0] https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM

-- 
nathan



Re: Add mode column to pg_stat_progress_vacuum

От
Sami Imseih
Дата:
> On Tue, Oct 07, 2025 at 12:45:12PM -0500, Sami Imseih wrote:
> > The vacuum command detail can now be determined from
> > pg_stat_activity.query by joining with pg_stat_progress_vacuum, right?
> > I don't see why this is not sufficient, especially because it already
> > indicates how the vacuum was triggered, and the autovacuum activity
> > message also tells you why it was triggered. We could perhaps add "due to
> > failsafe" to the autovacuum activity message to explicitly show that reason.
>
> Eh, IMHO requiring users to look for a certain substring in the query field
> doesn't seem especially user-friendly to me.  (I was going to point out
> that it's undocumented, too, but it is in fact documented [0].)

I am not sure if it's a bad user experience. In my experience that string
is quite easy to parse.

Also, It is also common for a DBA to have to reference
pg_stat_activity anyhow to determine how long the vacuum been
running for, wait events, etc. IMO, the progress view is the wrong place
for all information that is static ( does not change from the start of the
command ) and can be derived from the query string.

>> Right. I think we cannot display both things in one mode column. Since
>> both manual vacuums and anti-wraparound autovacuums can enter the
>> failsafe mode dynamically, if we show "failsafe" in the mode column,
>> we would lose the information "why is this vacuum running". I guess we
>> would need separate columns. For example, I guess that the column
>> showing "how is it operating under the hood" can have three values:
>> "normal", "aggressive" (disables VM optimization), and "failsafe"
>> (implies aggressive vacuum and disables many things to prioritize XID
>> freezing).

> Am I understanding correctly that your idea is to have a "reason" column
> that would have values like "manual", "normal autovacuum", and "autovacuum
> for wraparound", and a "mode" column that would have values like "normal",
> "agressive", and "failsafe"?  I wonder if we could be even more granular
> for the "normal autovacuum" case and point to the reason the table was
> chosen.  For example, was it the insert threshold, the update/delete
> threshold, etc.?

ahh, it's true that failsafe can trigger while an (auto)vacuum is in progress,
the check does not happen at the start, but in places like the main loop
of lazy_scan_heap. Since "failsafe" can be flipped on in-flight, I can see
that being a useful (bool?) field in the progress view.

--
Sami



Re: Add mode column to pg_stat_progress_vacuum

От
Masahiko Sawada
Дата:
On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote:
> > Right. I think we cannot display both things in one mode column. Since
> > both manual vacuums and anti-wraparound autovacuums can enter the
> > failsafe mode dynamically, if we show "failsafe" in the mode column,
> > we would lose the information "why is this vacuum running". I guess we
> > would need separate columns. For example, I guess that the column
> > showing "how is it operating under the hood" can have three values:
> > "normal", "aggressive" (disables VM optimization), and "failsafe"
> > (implies aggressive vacuum and disables many things to prioritize XID
> > freezing).
>
> Am I understanding correctly that your idea is to have a "reason" column
> that would have values like "manual", "normal autovacuum", and "autovacuum
> for wraparound", and a "mode" column that would have values like "normal",
> "agressive", and "failsafe"?

Right. The first column provides an insight into whether or not the
running vacuum is cancellable, and the second column provides
information on how vacuums are actually processing tables under the
hood. Users are able to get the former information by checking
pg_stat_activity too but the latter information is available only in
server logs.

>  I wonder if we could be even more granular
> for the "normal autovacuum" case and point to the reason the table was
> chosen.  For example, was it the insert threshold, the update/delete
> threshold, etc.?

Sounds like reasonable information. I guess we might want to have such
information in a cumulative statistics view but do you think it's
better to have it in a dynamic statistics view?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Re: Add mode column to pg_stat_progress_vacuum

От
Sami Imseih
Дата:
> >  I wonder if we could be even more granular
> > for the "normal autovacuum" case and point to the reason the table was
> > chosen.  For example, was it the insert threshold, the update/delete
> > threshold, etc.?
>
> Sounds like reasonable information. I guess we might want to have such
> information in a cumulative statistics view but do you think it's
> better to have it in a dynamic statistics view?

+1 for this information in cumulative stats, on a per table level for sure.
I do think however the pg_stat_all_tables views is getting too wide
and moving new relation vacuum stats to a separate stats view will
be very useful.

--
Sami



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Wed, Oct 8, 2025 at 12:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> > I was able to manually verify the failsafe mode's behavior by using
> > the existing test script at
> > src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl. This script
> > successfully triggered the failsafe condition and the view reported
> > the correct mode. However, I found this test to be somewhat flaky in
> > my environment and decided not to add it to the patch to avoid
> > introducing a potentially unstable test into the tree.
>
> Perhaps there's something we can do with injection points to improve the
> stability of the test.

Thank you for the advice. You're right, I can test it with injection
point. However, many other progress reporting views do not have
regression tests, so I do not have to add a regression test of
pg_stat_progress_report.




--
Best regards,
Shinya Kato
NTT OSS Center



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Wed, Oct 8, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >
> > On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote:
> > > Right. I think we cannot display both things in one mode column. Since
> > > both manual vacuums and anti-wraparound autovacuums can enter the
> > > failsafe mode dynamically, if we show "failsafe" in the mode column,
> > > we would lose the information "why is this vacuum running". I guess we
> > > would need separate columns. For example, I guess that the column
> > > showing "how is it operating under the hood" can have three values:
> > > "normal", "aggressive" (disables VM optimization), and "failsafe"
> > > (implies aggressive vacuum and disables many things to prioritize XID
> > > freezing).
> >
> > Am I understanding correctly that your idea is to have a "reason" column
> > that would have values like "manual", "normal autovacuum", and "autovacuum
> > for wraparound", and a "mode" column that would have values like "normal",
> > "agressive", and "failsafe"?
>
> Right. The first column provides an insight into whether or not the
> running vacuum is cancellable, and the second column provides
> information on how vacuums are actually processing tables under the
> hood. Users are able to get the former information by checking
> pg_stat_activity too but the latter information is available only in
> server logs.

Thanks for the clarification. I agree with your proposal. Separating
the "reason" from the "mode" into two columns is a great idea that
will provide much clearer insight for DBAs.


--
Best regards,
Shinya Kato
NTT OSS Center



Re: Add mode column to pg_stat_progress_vacuum

От
Shinya Kato
Дата:
On Wed, Oct 8, 2025 at 4:40 AM Sami Imseih <samimseih@gmail.com> wrote:
>
> > >  I wonder if we could be even more granular
> > > for the "normal autovacuum" case and point to the reason the table was
> > > chosen.  For example, was it the insert threshold, the update/delete
> > > threshold, etc.?
> >
> > Sounds like reasonable information. I guess we might want to have such
> > information in a cumulative statistics view but do you think it's
> > better to have it in a dynamic statistics view?
>
> +1 for this information in cumulative stats, on a per table level for sure.
> I do think however the pg_stat_all_tables views is getting too wide
> and moving new relation vacuum stats to a separate stats view will
> be very useful.

Thanks for the discussion.

IIUC are you suggesting I add such a last_autovacuum_reason column to
pg_stat_all_tables, which would be populated with one of the following
values?
- autovacuum_vacuum_threshold
- autovacuum_vacuum_insert_threshold
- autovacuum_freeze_max_age
- autovacuum_multixact_freeze_max_age

(For consistency, I should probably add  a last_autoanalyze_reason
column to pg_stat_all_tables as well.)

--
Best regards,
Shinya Kato
NTT OSS Center



Re: Add mode column to pg_stat_progress_vacuum

От
Sami Imseih
Дата:
> > > >  I wonder if we could be even more granular
> > > > for the "normal autovacuum" case and point to the reason the table was
> > > > chosen.  For example, was it the insert threshold, the update/delete
> > > > threshold, etc.?
> > >
> > > Sounds like reasonable information. I guess we might want to have such
> > > information in a cumulative statistics view but do you think it's
> > > better to have it in a dynamic statistics view?
> >
> > +1 for this information in cumulative stats, on a per table level for sure.
> > I do think however the pg_stat_all_tables views is getting too wide
> > and moving new relation vacuum stats to a separate stats view will
> > be very useful.
>
> Thanks for the discussion.
>
> IIUC are you suggesting I add such a last_autovacuum_reason column to
> pg_stat_all_tables, which would be populated with one of the following
> values?
> - autovacuum_vacuum_threshold
> - autovacuum_vacuum_insert_threshold
> - autovacuum_freeze_max_age
> - autovacuum_multixact_freeze_max_age

This should be a separate discussion. But, I would think the
counters will be n_aggressive, n_wraparound and n_failsafe.

We need to separate aggressive and wraparound due to
what is mentioned in vacuumlazy.c

/*
* While it's possible for a VACUUM to be both is_wraparound
* and !aggressive, that's just a corner-case -- is_wraparound
* implies aggressive. Produce distinct output for the corner
* case all the same, just in case.
*/

A normal vacuum is the difference of autovacuum_count and the
total of these counters.

--
Sami Imseih
Amazon Web Services (AWS)