Обсуждение: Add mode column to pg_stat_progress_vacuum
Hi hackers, I would like to propose a patch that enhances the pg_stat_progress_vacuum view by adding a mode column. The patch is attached. Although it is possible to identify an anti-wraparound VACUUM through the process title (to prevent wraparound) or specific log entries, it would be significantly more convenient for monitoring purposes to have this status clearly indicated in the pg_stat_progress_vacuum view. This would enable DBAs to immediately understand the urgency of the vacuum process without needing to check separate logs or system processes. This patch introduces a mode column to provide this visibility. The possible values are: - normal: A standard, user-initiated VACUUM or a regular autovacuum run. - anti-wraparound: An autovacuum run launched specifically to prevent transaction ID wraparound. - failsafe: A vacuum that has entered failsafe mode to prevent imminent transaction ID wraparound. This will allow administrators to better understand the context and urgency of vacuum operations, which is crucial for monitoring and troubleshooting. Design Considerations: When defining the scope of the anti-wraparound mode, I considered including manual commands like VACUUM (FREEZE) or VACUUM (DISABLE_PAGE_SKIPPING). However, I decided against this to keep the meaning of the mode clear and simple. These options can be used for various purposes, and overloading the anti-wraparound mode with too many meanings could become confusing. Therefore, the current implementation limits this mode to autovacuum runs that are explicitly launched for wraparound prevention. Regarding Testing: I was able to manually verify the failsafe mode's behavior by using the existing test script at src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl. This script successfully triggered the failsafe condition and the view reported the correct mode. However, I found this test to be somewhat flaky in my environment and decided not to add it to the patch to avoid introducing a potentially unstable test into the tree. Thought? -- Best regards, Shinya Kato NTT OSS Center
Вложения
On Thu, 14 Aug 2025 at 16:13, Shinya Kato <shinya11.kato@gmail.com> wrote: > This patch introduces a mode column to provide this visibility. The > possible values are: > - normal: A standard, user-initiated VACUUM or a regular autovacuum run. > - anti-wraparound: An autovacuum run launched specifically to prevent > transaction ID wraparound. > - failsafe: A vacuum that has entered failsafe mode to prevent > imminent transaction ID wraparound. > Thought? Just a small comment: I am more used to Lazy vs Eager vacuum types. It is how we use to call them in doc and code. Maybe this wording will be better? -- Best regards, Kirill Reshke
On Thu, Aug 14, 2025 at 9:20 PM Kirill Reshke <reshkekirill@gmail.com> wrote: > > On Thu, 14 Aug 2025 at 16:13, Shinya Kato <shinya11.kato@gmail.com> wrote: > > > This patch introduces a mode column to provide this visibility. The > > possible values are: > > - normal: A standard, user-initiated VACUUM or a regular autovacuum run. > > - anti-wraparound: An autovacuum run launched specifically to prevent > > transaction ID wraparound. > > - failsafe: A vacuum that has entered failsafe mode to prevent > > imminent transaction ID wraparound. > > > Thought? > > Just a small comment: > > I am more used to Lazy vs Eager vacuum types. It is how we use to call > them in doc and code. Maybe this wording will be better? Thanks for the feedback! Are you suggesting it would be better to change "normal" to "lazy" and "anti-wraparound" to "eager"? My hesitation is that "lazy" is a term used to contrast with VACUUM FULL, and the "lazy" vs. "eager" distinction also exists for tuple freezing logic within vacuum. Reusing these terms for a different purpose could be confusing. I also find "anti-wraparound" to be a much clearer and more descriptive term than "eager". Unless there’s a strong preference otherwise, I’d like to keep “anti-wraparound” and “failsafe” as-is, and keep “normal” (or possibly “plain”/“regular” if that reads better). -- Best regards, Shinya Kato NTT OSS Center
On Fri, 15 Aug 2025 at 09:19, Shinya Kato <shinya11.kato@gmail.com> wrote: eager". > > Unless there’s a strong preference otherwise, I’d like to keep > “anti-wraparound” and “failsafe” as-is, and keep “normal” (or possibly > “plain”/“regular” if that reads better). OK. I have tested the patch a bit, and noticed that `mode` column shows `normal` for both auto-vacuum and user-initiated vacuum (via VACUUM utility statement). Is it ok? Maybe for more visibility we can display different values for these two cases? -- Best regards, Kirill Reshke
On Thu, Aug 14, 2025 at 08:12:55PM +0900, Shinya Kato wrote: > I would like to propose a patch that enhances the > pg_stat_progress_vacuum view by adding a mode column. The patch is > attached. > > Although it is possible to identify an anti-wraparound VACUUM through > the process title (to prevent wraparound) or specific log entries, it > would be significantly more convenient for monitoring purposes to have > this status clearly indicated in the pg_stat_progress_vacuum view. > This would enable DBAs to immediately understand the urgency of the > vacuum process without needing to check separate logs or system > processes. This seems generally reasonable to me. > This patch introduces a mode column to provide this visibility. The > possible values are: > - normal: A standard, user-initiated VACUUM or a regular autovacuum run. > - anti-wraparound: An autovacuum run launched specifically to prevent > transaction ID wraparound. > - failsafe: A vacuum that has entered failsafe mode to prevent > imminent transaction ID wraparound. I wonder if we should also add "aggressive". > I was able to manually verify the failsafe mode's behavior by using > the existing test script at > src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl. This script > successfully triggered the failsafe condition and the view reported > the correct mode. However, I found this test to be somewhat flaky in > my environment and decided not to add it to the patch to avoid > introducing a potentially unstable test into the tree. Perhaps there's something we can do with injection points to improve the stability of the test. -- nathan
On Tue, Oct 7, 2025 at 11:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > On Thu, Aug 14, 2025 at 08:12:55PM +0900, Shinya Kato wrote: > > I would like to propose a patch that enhances the > > pg_stat_progress_vacuum view by adding a mode column. The patch is > > attached. > > > > Although it is possible to identify an anti-wraparound VACUUM through > > the process title (to prevent wraparound) or specific log entries, it > > would be significantly more convenient for monitoring purposes to have > > this status clearly indicated in the pg_stat_progress_vacuum view. > > This would enable DBAs to immediately understand the urgency of the > > vacuum process without needing to check separate logs or system > > processes. > > This seems generally reasonable to me. > There is a bit of an issue that an anti-wraparound vacuum is not in and of itself urgent, especially not with our defaults, so I have a little bit of concern that this patch could be mis-leading, but that isn't exactly an argument against the merits of it. > > This patch introduces a mode column to provide this visibility. The > > possible values are: > > - normal: A standard, user-initiated VACUUM or a regular autovacuum run. > > - anti-wraparound: An autovacuum run launched specifically to prevent > > transaction ID wraparound. > > - failsafe: A vacuum that has entered failsafe mode to prevent > > imminent transaction ID wraparound. > I think we should probably split out manual vacuums, which can be run for a whole host of different reasons. I'd suggest a mode of "manual" for those, and probably "standard" for a regular autovacuum run. > I wonder if we should also add "aggressive". > I don't think so. I feel like the point of the mode is to answer "why is this vacuum running" not "how is it operating under the hood". Robert Treat https://xzilla.net
On Tue, Oct 07, 2025 at 11:50:46AM -0400, Robert Treat wrote: > On Tue, Oct 7, 2025 at 11:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote: >> I wonder if we should also add "aggressive". > > I don't think so. I feel like the point of the mode is to answer "why > is this vacuum running" not "how is it operating under the hood". To some extent, those are tied together. For example, a failsafe vacuum is an anti-wraparound vacuum that skips index vacuuming, etc. And an anti-wraparound vacuum implies an aggressive scan, but not vice versa. There's also a separate parameter (vacuum_freeze_table_age) that controls when vacuum decides to perform an aggressive scan, just like there exists a parameter for anti-wraparound vacuums (autovacuum_freeze_max_age) and failsafe vacuums (vacuum_failsafe_age). -- nathan
On Tue, Oct 7, 2025 at 9:26 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Oct 07, 2025 at 11:50:46AM -0400, Robert Treat wrote: > > On Tue, Oct 7, 2025 at 11:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > >> I wonder if we should also add "aggressive". > > > > I don't think so. I feel like the point of the mode is to answer "why > > is this vacuum running" not "how is it operating under the hood". > > To some extent, those are tied together. For example, a failsafe vacuum is > an anti-wraparound vacuum that skips index vacuuming, etc. And an > anti-wraparound vacuum implies an aggressive scan, but not vice versa. > There's also a separate parameter (vacuum_freeze_table_age) that controls > when vacuum decides to perform an aggressive scan, just like there exists a > parameter for anti-wraparound vacuums (autovacuum_freeze_max_age) and > failsafe vacuums (vacuum_failsafe_age). Right. I think we cannot display both things in one mode column. Since both manual vacuums and anti-wraparound autovacuums can enter the failsafe mode dynamically, if we show "failsafe" in the mode column, we would lose the information "why is this vacuum running". I guess we would need separate columns. For example, I guess that the column showing "how is it operating under the hood" can have three values: "normal", "aggressive" (disables VM optimization), and "failsafe" (implies aggressive vacuum and disables many things to prioritize XID freezing). Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
I am bit late to this thread, but I have a comment. > This patch introduces a mode column to provide this visibility. The > possible values are: > - normal: A standard, user-initiated VACUUM or a regular autovacuum run. > - anti-wraparound: An autovacuum run launched specifically to prevent > transaction ID wraparound. > - failsafe: A vacuum that has entered failsafe mode to prevent > imminent transaction ID wraparound. The vacuum command detail can now be determined from pg_stat_activity.query by joining with pg_stat_progress_vacuum, right? I don't see why this is not sufficient, especially because it already indicates how the vacuum was triggered, and the autovacuum activity message also tells you why it was triggered. We could perhaps add "due to failsafe" to the autovacuum activity message to explicitly show that reason. -- Sami Imseih Amazon Web Services (AWS)
On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote: > Right. I think we cannot display both things in one mode column. Since > both manual vacuums and anti-wraparound autovacuums can enter the > failsafe mode dynamically, if we show "failsafe" in the mode column, > we would lose the information "why is this vacuum running". I guess we > would need separate columns. For example, I guess that the column > showing "how is it operating under the hood" can have three values: > "normal", "aggressive" (disables VM optimization), and "failsafe" > (implies aggressive vacuum and disables many things to prioritize XID > freezing). Am I understanding correctly that your idea is to have a "reason" column that would have values like "manual", "normal autovacuum", and "autovacuum for wraparound", and a "mode" column that would have values like "normal", "agressive", and "failsafe"? I wonder if we could be even more granular for the "normal autovacuum" case and point to the reason the table was chosen. For example, was it the insert threshold, the update/delete threshold, etc.? -- nathan
On Tue, Oct 07, 2025 at 12:45:12PM -0500, Sami Imseih wrote: > The vacuum command detail can now be determined from > pg_stat_activity.query by joining with pg_stat_progress_vacuum, right? > I don't see why this is not sufficient, especially because it already > indicates how the vacuum was triggered, and the autovacuum activity > message also tells you why it was triggered. We could perhaps add "due to > failsafe" to the autovacuum activity message to explicitly show that reason. Eh, IMHO requiring users to look for a certain substring in the query field doesn't seem especially user-friendly to me. (I was going to point out that it's undocumented, too, but it is in fact documented [0].) [0] https://www.postgresql.org/docs/current/routine-vacuuming.html#AUTOVACUUM -- nathan
> On Tue, Oct 07, 2025 at 12:45:12PM -0500, Sami Imseih wrote: > > The vacuum command detail can now be determined from > > pg_stat_activity.query by joining with pg_stat_progress_vacuum, right? > > I don't see why this is not sufficient, especially because it already > > indicates how the vacuum was triggered, and the autovacuum activity > > message also tells you why it was triggered. We could perhaps add "due to > > failsafe" to the autovacuum activity message to explicitly show that reason. > > Eh, IMHO requiring users to look for a certain substring in the query field > doesn't seem especially user-friendly to me. (I was going to point out > that it's undocumented, too, but it is in fact documented [0].) I am not sure if it's a bad user experience. In my experience that string is quite easy to parse. Also, It is also common for a DBA to have to reference pg_stat_activity anyhow to determine how long the vacuum been running for, wait events, etc. IMO, the progress view is the wrong place for all information that is static ( does not change from the start of the command ) and can be derived from the query string. >> Right. I think we cannot display both things in one mode column. Since >> both manual vacuums and anti-wraparound autovacuums can enter the >> failsafe mode dynamically, if we show "failsafe" in the mode column, >> we would lose the information "why is this vacuum running". I guess we >> would need separate columns. For example, I guess that the column >> showing "how is it operating under the hood" can have three values: >> "normal", "aggressive" (disables VM optimization), and "failsafe" >> (implies aggressive vacuum and disables many things to prioritize XID >> freezing). > Am I understanding correctly that your idea is to have a "reason" column > that would have values like "manual", "normal autovacuum", and "autovacuum > for wraparound", and a "mode" column that would have values like "normal", > "agressive", and "failsafe"? I wonder if we could be even more granular > for the "normal autovacuum" case and point to the reason the table was > chosen. For example, was it the insert threshold, the update/delete > threshold, etc.? ahh, it's true that failsafe can trigger while an (auto)vacuum is in progress, the check does not happen at the start, but in places like the main loop of lazy_scan_heap. Since "failsafe" can be flipped on in-flight, I can see that being a useful (bool?) field in the progress view. -- Sami
On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote: > > Right. I think we cannot display both things in one mode column. Since > > both manual vacuums and anti-wraparound autovacuums can enter the > > failsafe mode dynamically, if we show "failsafe" in the mode column, > > we would lose the information "why is this vacuum running". I guess we > > would need separate columns. For example, I guess that the column > > showing "how is it operating under the hood" can have three values: > > "normal", "aggressive" (disables VM optimization), and "failsafe" > > (implies aggressive vacuum and disables many things to prioritize XID > > freezing). > > Am I understanding correctly that your idea is to have a "reason" column > that would have values like "manual", "normal autovacuum", and "autovacuum > for wraparound", and a "mode" column that would have values like "normal", > "agressive", and "failsafe"? Right. The first column provides an insight into whether or not the running vacuum is cancellable, and the second column provides information on how vacuums are actually processing tables under the hood. Users are able to get the former information by checking pg_stat_activity too but the latter information is available only in server logs. > I wonder if we could be even more granular > for the "normal autovacuum" case and point to the reason the table was > chosen. For example, was it the insert threshold, the update/delete > threshold, etc.? Sounds like reasonable information. I guess we might want to have such information in a cumulative statistics view but do you think it's better to have it in a dynamic statistics view? Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
> > I wonder if we could be even more granular > > for the "normal autovacuum" case and point to the reason the table was > > chosen. For example, was it the insert threshold, the update/delete > > threshold, etc.? > > Sounds like reasonable information. I guess we might want to have such > information in a cumulative statistics view but do you think it's > better to have it in a dynamic statistics view? +1 for this information in cumulative stats, on a per table level for sure. I do think however the pg_stat_all_tables views is getting too wide and moving new relation vacuum stats to a separate stats view will be very useful. -- Sami
On Wed, Oct 8, 2025 at 12:04 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > I was able to manually verify the failsafe mode's behavior by using > > the existing test script at > > src/test/modules/xid_wraparound/t/001_emergency_vacuum.pl. This script > > successfully triggered the failsafe condition and the view reported > > the correct mode. However, I found this test to be somewhat flaky in > > my environment and decided not to add it to the patch to avoid > > introducing a potentially unstable test into the tree. > > Perhaps there's something we can do with injection points to improve the > stability of the test. Thank you for the advice. You're right, I can test it with injection point. However, many other progress reporting views do not have regression tests, so I do not have to add a regression test of pg_stat_progress_report. -- Best regards, Shinya Kato NTT OSS Center
On Wed, Oct 8, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > > On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote: > > > Right. I think we cannot display both things in one mode column. Since > > > both manual vacuums and anti-wraparound autovacuums can enter the > > > failsafe mode dynamically, if we show "failsafe" in the mode column, > > > we would lose the information "why is this vacuum running". I guess we > > > would need separate columns. For example, I guess that the column > > > showing "how is it operating under the hood" can have three values: > > > "normal", "aggressive" (disables VM optimization), and "failsafe" > > > (implies aggressive vacuum and disables many things to prioritize XID > > > freezing). > > > > Am I understanding correctly that your idea is to have a "reason" column > > that would have values like "manual", "normal autovacuum", and "autovacuum > > for wraparound", and a "mode" column that would have values like "normal", > > "agressive", and "failsafe"? > > Right. The first column provides an insight into whether or not the > running vacuum is cancellable, and the second column provides > information on how vacuums are actually processing tables under the > hood. Users are able to get the former information by checking > pg_stat_activity too but the latter information is available only in > server logs. Thanks for the clarification. I agree with your proposal. Separating the "reason" from the "mode" into two columns is a great idea that will provide much clearer insight for DBAs. -- Best regards, Shinya Kato NTT OSS Center
On Wed, Oct 8, 2025 at 4:40 AM Sami Imseih <samimseih@gmail.com> wrote: > > > > I wonder if we could be even more granular > > > for the "normal autovacuum" case and point to the reason the table was > > > chosen. For example, was it the insert threshold, the update/delete > > > threshold, etc.? > > > > Sounds like reasonable information. I guess we might want to have such > > information in a cumulative statistics view but do you think it's > > better to have it in a dynamic statistics view? > > +1 for this information in cumulative stats, on a per table level for sure. > I do think however the pg_stat_all_tables views is getting too wide > and moving new relation vacuum stats to a separate stats view will > be very useful. Thanks for the discussion. IIUC are you suggesting I add such a last_autovacuum_reason column to pg_stat_all_tables, which would be populated with one of the following values? - autovacuum_vacuum_threshold - autovacuum_vacuum_insert_threshold - autovacuum_freeze_max_age - autovacuum_multixact_freeze_max_age (For consistency, I should probably add a last_autoanalyze_reason column to pg_stat_all_tables as well.) -- Best regards, Shinya Kato NTT OSS Center
> > > > I wonder if we could be even more granular > > > > for the "normal autovacuum" case and point to the reason the table was > > > > chosen. For example, was it the insert threshold, the update/delete > > > > threshold, etc.? > > > > > > Sounds like reasonable information. I guess we might want to have such > > > information in a cumulative statistics view but do you think it's > > > better to have it in a dynamic statistics view? > > > > +1 for this information in cumulative stats, on a per table level for sure. > > I do think however the pg_stat_all_tables views is getting too wide > > and moving new relation vacuum stats to a separate stats view will > > be very useful. > > Thanks for the discussion. > > IIUC are you suggesting I add such a last_autovacuum_reason column to > pg_stat_all_tables, which would be populated with one of the following > values? > - autovacuum_vacuum_threshold > - autovacuum_vacuum_insert_threshold > - autovacuum_freeze_max_age > - autovacuum_multixact_freeze_max_age This should be a separate discussion. But, I would think the counters will be n_aggressive, n_wraparound and n_failsafe. We need to separate aggressive and wraparound due to what is mentioned in vacuumlazy.c /* * While it's possible for a VACUUM to be both is_wraparound * and !aggressive, that's just a corner-case -- is_wraparound * implies aggressive. Produce distinct output for the corner * case all the same, just in case. */ A normal vacuum is the difference of autovacuum_count and the total of these counters. -- Sami Imseih Amazon Web Services (AWS)
On Thu, Oct 09, 2025 at 10:07:17AM -0500, Sami Imseih wrote: >> IIUC are you suggesting I add such a last_autovacuum_reason column to >> pg_stat_all_tables, which would be populated with one of the following >> values? >> - autovacuum_vacuum_threshold >> - autovacuum_vacuum_insert_threshold >> - autovacuum_freeze_max_age >> - autovacuum_multixact_freeze_max_age > > This should be a separate discussion. But, I would think the > counters will be n_aggressive, n_wraparound and n_failsafe. Depends, I guess (separate discussion it should be, but I count not resist). If you had this information available in the cumulative stats, what should be a "correct" set of numbers, and what could be tuned to redirect the system so as it gets to a better set of numbers. Wraparound autovacuums, for one, don't seem really relevant to know about in an aggregated way. Coming back to the original proposal. Knowing about the state we are kicking an autovacuum worker job for a set of tables in the progress view would be definitely a nice thing. -- Michael
Вложения
On Thu, Oct 16, 2025 at 2:17 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Oct 09, 2025 at 10:07:17AM -0500, Sami Imseih wrote: > >> IIUC are you suggesting I add such a last_autovacuum_reason column to > >> pg_stat_all_tables, which would be populated with one of the following > >> values? > >> - autovacuum_vacuum_threshold > >> - autovacuum_vacuum_insert_threshold > >> - autovacuum_freeze_max_age > >> - autovacuum_multixact_freeze_max_age > > > > This should be a separate discussion. But, I would think the > > counters will be n_aggressive, n_wraparound and n_failsafe. > > Depends, I guess (separate discussion it should be, but I count not > resist). If you had this information available in the cumulative > stats, what should be a "correct" set of numbers, and what could be > tuned to redirect the system so as it gets to a better set of numbers. > Wraparound autovacuums, for one, don't seem really relevant > to know about in an aggregated way. Thank you for the comment. You’re right. Let’s set this discussion aside for now and continue it in a separate thread. -- Best regards, Shinya Kato NTT OSS Center
Thank you all for the reviews! On Wed, Oct 8, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > > On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote: > > > Right. I think we cannot display both things in one mode column. Since > > > both manual vacuums and anti-wraparound autovacuums can enter the > > > failsafe mode dynamically, if we show "failsafe" in the mode column, > > > we would lose the information "why is this vacuum running". I guess we > > > would need separate columns. For example, I guess that the column > > > showing "how is it operating under the hood" can have three values: > > > "normal", "aggressive" (disables VM optimization), and "failsafe" > > > (implies aggressive vacuum and disables many things to prioritize XID > > > freezing). > > > > Am I understanding correctly that your idea is to have a "reason" column > > that would have values like "manual", "normal autovacuum", and "autovacuum > > for wraparound", and a "mode" column that would have values like "normal", > > "agressive", and "failsafe"? > > Right. The first column provides an insight into whether or not the > running vacuum is cancellable, and the second column provides > information on how vacuums are actually processing tables under the > hood. Users are able to get the former information by checking > pg_stat_activity too but the latter information is available only in > server logs. I have updated the patch according to your comments. I have modified this to display the behavior mode (normal, aggressive, failsafe) in the mode column, and the trigger reason (manual, autovacuum, anti-wraparound) in the reason column -- Best regards, Shinya Kato NTT OSS Center
Вложения
On Fri, Oct 24, 2025 at 12:15 AM Shinya Kato <shinya11.kato@gmail.com> wrote:
>
> Thank you all for the reviews!
>
> On Wed, Oct 8, 2025 at 4:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Oct 7, 2025 at 12:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > >
> > > On Tue, Oct 07, 2025 at 10:17:38AM -0700, Masahiko Sawada wrote:
> > > > Right. I think we cannot display both things in one mode column. Since
> > > > both manual vacuums and anti-wraparound autovacuums can enter the
> > > > failsafe mode dynamically, if we show "failsafe" in the mode column,
> > > > we would lose the information "why is this vacuum running". I guess we
> > > > would need separate columns. For example, I guess that the column
> > > > showing "how is it operating under the hood" can have three values:
> > > > "normal", "aggressive" (disables VM optimization), and "failsafe"
> > > > (implies aggressive vacuum and disables many things to prioritize XID
> > > > freezing).
> > >
> > > Am I understanding correctly that your idea is to have a "reason" column
> > > that would have values like "manual", "normal autovacuum", and "autovacuum
> > > for wraparound", and a "mode" column that would have values like "normal",
> > > "agressive", and "failsafe"?
> >
> > Right. The first column provides an insight into whether or not the
> > running vacuum is cancellable, and the second column provides
> > information on how vacuums are actually processing tables under the
> > hood. Users are able to get the former information by checking
> > pg_stat_activity too but the latter information is available only in
> > server logs.
>
> I have updated the patch according to your comments. I have modified
> this to display the behavior mode (normal, aggressive, failsafe) in
> the mode column,
The new 'mode' column with the possible three values looks good to me.
> and the trigger reason (manual, autovacuum,
> anti-wraparound) in the reason column
Showing 'anti-wraparound' value hides the fact that the process is an
autovacuum worker. How about 'ant-wraparound_autovacuum',
'autovacuum_wraparound', or something along those lines? Also, we can
probably find a better column name than 'reason'. How about 'source'
or 'triggered_by'?
I think we need to update the documentation in maintenance.sgml as
well. For instance, we can add the reference to the new columns in the
following description:
<para>
Autovacuum workers generally don't block other commands. If a process
attempts to acquire a lock that conflicts with the
<literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock
acquisition will interrupt the autovacuum. For conflicting lock modes,
see <xref linkend="table-lock-compatibility"/>. However, if
the autovacuum
is running to prevent transaction ID wraparound (i.e., the
autovacuum query
name in the <structname>pg_stat_activity</structname> view ends with
<literal>(to prevent wraparound)</literal>), the autovacuum is not
automatically interrupted.
</para>
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
On Thu, Oct 30, 2025 at 8:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > I have updated the patch according to your comments. I have modified > > this to display the behavior mode (normal, aggressive, failsafe) in > > the mode column, > > The new 'mode' column with the possible three values looks good to me. Thank you for the review! > > and the trigger reason (manual, autovacuum, > > anti-wraparound) in the reason column > > Showing 'anti-wraparound' value hides the fact that the process is an > autovacuum worker. How about 'ant-wraparound_autovacuum', > 'autovacuum_wraparound', or something along those lines? I think 'autovacuum_wraparound' is better because it's shorter and simpler. I've updated the patch to use it. > Also, we can > probably find a better column name than 'reason'. How about 'source' > or 'triggered_by'? I've changed it to use 'triggered_by' because 'source' is overloaded and can be interpreted as data origin, I/O source, or WAL source in other contexts, making it ambiguous. Also, I've updated the docs from "The reason why the current vacuum started" to "The trigger of the current vacuum operation". > I think we need to update the documentation in maintenance.sgml as > well. For instance, we can add the reference to the new columns in the > following description: > > <para> > Autovacuum workers generally don't block other commands. If a process > attempts to acquire a lock that conflicts with the > <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock > acquisition will interrupt the autovacuum. For conflicting lock modes, > see <xref linkend="table-lock-compatibility"/>. However, if > the autovacuum > is running to prevent transaction ID wraparound (i.e., the > autovacuum query > name in the <structname>pg_stat_activity</structname> view ends with > <literal>(to prevent wraparound)</literal>), the autovacuum is not > automatically interrupted. > </para> I've added a reference to the triggered_by column in the pg_stat_progress_vacuum view. -- Best regards, Shinya Kato NTT OSS Center
Вложения
On Thu, Oct 30, 2025 at 12:39 AM Shinya Kato <shinya11.kato@gmail.com> wrote: > > On Thu, Oct 30, 2025 at 8:40 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > I have updated the patch according to your comments. I have modified > > > this to display the behavior mode (normal, aggressive, failsafe) in > > > the mode column, > > > > The new 'mode' column with the possible three values looks good to me. > > Thank you for the review! > > > > and the trigger reason (manual, autovacuum, > > > anti-wraparound) in the reason column > > > > Showing 'anti-wraparound' value hides the fact that the process is an > > autovacuum worker. How about 'ant-wraparound_autovacuum', > > 'autovacuum_wraparound', or something along those lines? > > I think 'autovacuum_wraparound' is better because it's shorter and > simpler. I've updated the patch to use it. > > > Also, we can > > probably find a better column name than 'reason'. How about 'source' > > or 'triggered_by'? > > I've changed it to use 'triggered_by' because 'source' is overloaded > and can be interpreted as data origin, I/O source, or WAL source in > other contexts, making it ambiguous. Also, I've updated the docs from > "The reason why the current vacuum started" to "The trigger of the > current vacuum operation". > > > I think we need to update the documentation in maintenance.sgml as > > well. For instance, we can add the reference to the new columns in the > > following description: > > > > <para> > > Autovacuum workers generally don't block other commands. If a process > > attempts to acquire a lock that conflicts with the > > <literal>SHARE UPDATE EXCLUSIVE</literal> lock held by autovacuum, lock > > acquisition will interrupt the autovacuum. For conflicting lock modes, > > see <xref linkend="table-lock-compatibility"/>. However, if > > the autovacuum > > is running to prevent transaction ID wraparound (i.e., the > > autovacuum query > > name in the <structname>pg_stat_activity</structname> view ends with > > <literal>(to prevent wraparound)</literal>), the autovacuum is not > > automatically interrupted. > > </para> > > I've added a reference to the triggered_by column in the > pg_stat_progress_vacuum view. Thank you for updating the patch! How about the following changes for the documentation changes? + <para> + The mode of the current vacuum operation. Possible values are: The mode in which the current VACUUM operation is running. See Section 24.1.5 for details of each mode. Possible values are: + <itemizedlist> + <listitem> + <para> + <literal>normal</literal>: A standard vacuum operation that is not + required to be aggressive and has not entered failsafe mode. + </para> normal: The operation is performing a standard vacuum and is neither required to run in aggressive mode nor operating in failsafe mode. + </listitem> + <listitem> + <para> + <literal>aggressive</literal>: A vacuum that must scan the entire + table because <xref linkend="guc-vacuum-freeze-table-age"/> or + <xref linkend="guc-vacuum-freeze-min-age"/> (or the corresponding + per-table storage parameters) required it, or because page skipping + was disabled via <command>VACUUM (DISABLE_PAGE_SKIPPING)</command>. aggressive: The operation is running an aggressive vacuum that must scan every page that is not marked all-frozen. vacuum_freeze_table_age and vacuum_multixact_freeze_table_age control when a table is aggressively vacuumed. + </para> + </listitem> + <listitem> + <para> + <literal>failsafe</literal>: A vacuum operation that entered failsafe + mode when the system was at risk of transaction ID or multixact ID + wraparound (see <xref linkend="guc-vacuum-failsafe-age"/> and + <xref linkend="guc-vacuum-multixact-failsafe-age"/>). failsafe: The operation has entered failsafe mode, in which vacuum performs only the minimum work needed to avoid transaction ID wraparound failure. vacuum_failsafe_age and vacuum_multixact_failsafe_age controls when the vacuum enters failsafe mode. + </para> + <para> + The trigger of the current vacuum operation. Possible values are: What caused the current VACUUM operation to be initiated. Possible values are: + <itemizedlist> + <listitem> + <para> + <literal>manual</literal>: Initiated by an explicit + <command>VACUUM</command> command. manual: The vacuum was initiated by an explicit VACUUM command. + <literal>autovacuum</literal>: Launched by autovacuum based on + <xref linkend="guc-autovacuum-vacuum-threshold"/> or + <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>. autovacuum: The vacuum was started by an autovacuum worker. Autovacuum workers launched for this purpose are interrupted due to lock conflicts. + <literal>autovacuum_wraparound</literal>: Launched by autovacuum to + avoid transaction ID or multixact ID wraparound (see + <xref linkend="vacuum-for-wraparound"/> as well as + <xref linkend="guc-autovacuum-freeze-max-age"/> and + <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>). autovacuum_wraparound: The vacuum was started by an autovacuum worker to prevent transaction ID or multixact ID wraparound. Autovacuum workers launched for this purpose are not interrupted because of lock conflicts. --- + /* Reset the progress counters and the mode */ + pgstat_progress_update_multi_param(3, progress_index, progress_val); This change seems not correct to me since we don't reset the mode. I'd change it to: /* Reset the progress counters and set the failsafe mode */ Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Thu, Nov 13, 2025 at 11:11 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > Thank you for updating the patch! Thank you for the review! > How about the following changes for the documentation changes? > > + <para> > + The mode of the current vacuum operation. Possible values are: > > The mode in which the current VACUUM operation is running. See Section > 24.1.5 for details of each mode. Possible values are: > > + <itemizedlist> > + <listitem> > + <para> > + <literal>normal</literal>: A standard vacuum operation that is not > + required to be aggressive and has not entered failsafe mode. > + </para> > > normal: The operation is performing a standard vacuum and is neither > required to run in aggressive mode nor operating in failsafe mode. > > + </listitem> > + <listitem> > + <para> > + <literal>aggressive</literal>: A vacuum that must scan the entire > + table because <xref linkend="guc-vacuum-freeze-table-age"/> or > + <xref linkend="guc-vacuum-freeze-min-age"/> (or the corresponding > + per-table storage parameters) required it, or because page skipping > + was disabled via <command>VACUUM (DISABLE_PAGE_SKIPPING)</command>. > > aggressive: The operation is running an aggressive vacuum that must > scan every page that is not marked all-frozen. vacuum_freeze_table_age > and vacuum_multixact_freeze_table_age control when a table is > aggressively vacuumed. > > + </para> > + </listitem> > + <listitem> > + <para> > + <literal>failsafe</literal>: A vacuum operation that entered failsafe > + mode when the system was at risk of transaction ID or multixact ID > + wraparound (see <xref linkend="guc-vacuum-failsafe-age"/> and > + <xref linkend="guc-vacuum-multixact-failsafe-age"/>). > > failsafe: The operation has entered failsafe mode, in which vacuum > performs only the minimum work needed to avoid transaction ID > wraparound failure. vacuum_failsafe_age and > vacuum_multixact_failsafe_age controls when the vacuum enters failsafe > mode. Fixed. > + </para> > + <para> > + The trigger of the current vacuum operation. Possible values are: > > What caused the current VACUUM operation to be initiated. Possible values are: > > + <itemizedlist> > + <listitem> > + <para> > + <literal>manual</literal>: Initiated by an explicit > + <command>VACUUM</command> command. > > manual: The vacuum was initiated by an explicit VACUUM command. > > + <literal>autovacuum</literal>: Launched by autovacuum based on > + <xref linkend="guc-autovacuum-vacuum-threshold"/> or > + <xref linkend="guc-autovacuum-vacuum-insert-threshold"/>. > > autovacuum: The vacuum was started by an autovacuum worker. Autovacuum > workers launched for this purpose are interrupted due to lock > conflicts. > > + <literal>autovacuum_wraparound</literal>: Launched by autovacuum to > + avoid transaction ID or multixact ID wraparound (see > + <xref linkend="vacuum-for-wraparound"/> as well as > + <xref linkend="guc-autovacuum-freeze-max-age"/> and > + <xref linkend="guc-autovacuum-multixact-freeze-max-age"/>). > > autovacuum_wraparound: The vacuum was started by an autovacuum worker > to prevent transaction ID or multixact ID wraparound. Autovacuum > workers launched for this purpose are not interrupted because of lock > conflicts. Fixed, but I have a comment. I noticed minor wording inconsistencies, e.g., 'started' vs. 'initiated' and 'due to' vs. 'because of'. Should I unify these terms? > --- > + /* Reset the progress counters and the mode */ > + pgstat_progress_update_multi_param(3, progress_index, progress_val); > > This change seems not correct to me since we don't reset the mode. I'd > change it to: > > /* Reset the progress counters and set the failsafe mode */ Fixed. -- Best regards, Shinya Kato NTT OSS Center