Обсуждение: Add pg_stat_autovacuum_priority
Hi, This is a quick follow-up to the commit d7965d65f which introduced autovacuum prioritization based on a score that is the Max of several components, such as vacuum thresholds, xid age, etc. It was also discussed in that thread [1] that we will need a view to expose the priority scores, per table in a view. This will allow a user to introspect what the autovacuum launcher will prioritize next as well as verify tuning efforts for autovacuum prioritization; the latter case likely being rare. So after spending time on this today, I am proposing a view that returns a line for each relation with information about if the table needs autovacuum/autoanalyze, as well as scores of each component and the Max score. It looks like the below: ``` postgres=# select * FROM pg_stat_autovacuum_priority; -[ RECORD 1 ]-----+---------------------------- relid | 16410 schemaname | public relname | av_priority_test needs_vacuum | f needs_analyze | f wraparound | f score | 0 xid_score | 0 mxid_score | 0 vacuum_dead_score | 0 vacuum_ins_score | 0 analyze_score | 0 ``` The function essentially calls relation_needs_vacanalyze() with some setup work, such as scanning the catalog with an AccessShareLock, etc. and emits the result of this call. To make this work 0001 introduces a small change to relation_needs_vacanalyze() to take in a boolean to force the calculation of the score (even if autovacuum is disabled for the relation). 0002 introduces the view with documentation and testing in vacuum.c (xid age and mxid age scores are not tested as they require xid_wraparound to consume enough XIDs to trigger a score, which will cost too much time for a regression test). Find the attached taking the first attempt at this view. [1] [https://www.postgresql.org/message-id/CAApHDvqQN-B2sQov8nsfZOmx-VeJMauSf4kLa3A8LsK1tUyBNw%40mail.gmail.com] -- Sami Imseih Amazon Web Services (AWS)
Вложения
Hi Sami,
On Fri, Mar 27, 2026 at 4:14 PM Sami Imseih <samimseih@gmail.com> wrote:
Hi,
This is a quick follow-up to the commit d7965d65f which
introduced autovacuum prioritization based on a score that
is the Max of several components, such as vacuum
thresholds, xid age, etc.
It was also discussed in that thread [1] that we will need
a view to expose the priority scores, per table in a view.
This will allow a user to introspect what the autovacuum
launcher will prioritize next as well as verify tuning
efforts for autovacuum prioritization; the latter case
likely being rare.
So after spending time on this today, I am proposing a view
that returns a line for each relation with information
about if the table needs autovacuum/autoanalyze, as well as
scores of each component and the Max score. It looks like
the below:
```
postgres=# select * FROM pg_stat_autovacuum_priority;
-[ RECORD 1 ]-----+----------------------------
relid | 16410
schemaname | public
relname | av_priority_test
needs_vacuum | f
needs_analyze | f
wraparound | f
score | 0
xid_score | 0
mxid_score | 0
vacuum_dead_score | 0
vacuum_ins_score | 0
analyze_score | 0
```
The function essentially calls relation_needs_vacanalyze()
with some setup work, such as scanning the catalog with an
AccessShareLock, etc. and emits the result of this call.
To make this work 0001 introduces a small change to
relation_needs_vacanalyze() to take in a boolean to force
the calculation of the score (even if autovacuum is
disabled for the relation).
0002 introduces the view with documentation and testing in
vacuum.c (xid age and mxid age scores are not tested as
they require xid_wraparound to consume enough XIDs to
trigger a score, which will cost too much time for a
regression test).
Find the attached taking the first attempt at this view.
[1] [https://www.postgresql.org/message-id/CAApHDvqQN-B2sQov8nsfZOmx-VeJMauSf4kLa3A8LsK1tUyBNw%40mail.gmail.com]
Thanks for adding this. Applied the patch and the tests passed. I haven't fully reviewed the patch but have a few comments below:
1. Please ass CFI in the function pg_stat_get_autovacuum_priority, as the list of tables can be very long
+ while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
+ {
+ {
2. Should we add filtering? The current approach pg_stat_get_autovacuum_priority does a full catalog scan without any filters and can be expensive.
3. Please add tests for tables with autovacuum = off
4. Is the view intended to be exposed to PUBLIC without any ACL restrictions?
5. Catalog version number needs to be increased
-#define CATALOG_VERSION_NO 202603241
+#define CATALOG_VERSION_NO 202603231
Thanks,
Satya
On Fri, Mar 27, 2026 at 9:14 PM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:
Hi Sami,On Fri, Mar 27, 2026 at 4:14 PM Sami Imseih <samimseih@gmail.com> wrote:Hi,
This is a quick follow-up to the commit d7965d65f which
introduced autovacuum prioritization based on a score that
is the Max of several components, such as vacuum
thresholds, xid age, etc.
It was also discussed in that thread [1] that we will need
a view to expose the priority scores, per table in a view.
This will allow a user to introspect what the autovacuum
launcher will prioritize next as well as verify tuning
efforts for autovacuum prioritization; the latter case
likely being rare.
So after spending time on this today, I am proposing a view
that returns a line for each relation with information
about if the table needs autovacuum/autoanalyze, as well as
scores of each component and the Max score. It looks like
the below:
```
postgres=# select * FROM pg_stat_autovacuum_priority;
-[ RECORD 1 ]-----+----------------------------
relid | 16410
schemaname | public
relname | av_priority_test
needs_vacuum | f
needs_analyze | f
wraparound | f
score | 0
xid_score | 0
mxid_score | 0
vacuum_dead_score | 0
vacuum_ins_score | 0
analyze_score | 0
```
The function essentially calls relation_needs_vacanalyze()
with some setup work, such as scanning the catalog with an
AccessShareLock, etc. and emits the result of this call.
To make this work 0001 introduces a small change to
relation_needs_vacanalyze() to take in a boolean to force
the calculation of the score (even if autovacuum is
disabled for the relation).
0002 introduces the view with documentation and testing in
vacuum.c (xid age and mxid age scores are not tested as
they require xid_wraparound to consume enough XIDs to
trigger a score, which will cost too much time for a
regression test).
Find the attached taking the first attempt at this view.
[1] [https://www.postgresql.org/message-id/CAApHDvqQN-B2sQov8nsfZOmx-VeJMauSf4kLa3A8LsK1tUyBNw%40mail.gmail.com]
Thanks for adding this. Applied the patch and the tests passed. I haven't fully reviewed the patch but have a few comments below:1. Please ass CFI in the function pg_stat_get_autovacuum_priority, as the list of tables can be very long+ while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
+ {2. Should we add filtering? The current approach pg_stat_get_autovacuum_priority does a full catalog scan without any filters and can be expensive.3. Please add tests for tables with autovacuum = off4. Is the view intended to be exposed to PUBLIC without any ACL restrictions?5. Catalog version number needs to be increased-#define CATALOG_VERSION_NO 202603241+#define CATALOG_VERSION_NO 202603231
Additionally, do you expect this view to be available on the hot_Standby? Because on a hot standby, the view only provides useful wraparound risk data. All activity-based columns are blind. This should either be documented, or the function should check RecoveryInProgress() and raise an error/notice
Thanks,
Satya
Hi, On Fri, Mar 27, 2026 at 4:14 PM Sami Imseih <samimseih@gmail.com> wrote: > > This is a quick follow-up to the commit d7965d65f which > introduced autovacuum prioritization based on a score that > is the Max of several components, such as vacuum > thresholds, xid age, etc. Thanks for sending the patch. +1 for the visibility into the new autovacuum scoring system and its impact on prioritization. It would also be nice to add the computed scores to pg_stat_progress_vacuum to show the current prioritization. IMHO, we can get that change first. It's relatively smaller. I quickly reviewed the v1 patches. Here are some comments: 1/ + * pg_stat_get_autovacuum_priority + * Returns the autovacuum priority score for each relation in the + * current database. + * Can we have the per-relation prioritization computation function in C and provide a per-database computation function as a SQL function over this per-relation function in system_functions.sql? This would simplify things and also address the concern raised by Satya in this thread about databases having a large number of relations. Would that work? 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority for all and grant them to pg_monitor or similar? Especially since this function loops over all the relations in a database, we may not want everyone to be able to do this. -- Bharath Rupireddy Amazon Web Services: https://aws.amazon.com
Hi, Thank you all for the review and comments! > 1. Please add CFI in the function pg_stat_get_autovacuum_priority, as the list of tables can be very long good catch. Will add. > 3. Please add tests for tables with autovacuum = off Yes, I will add this test as well. > 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions? > 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority > for all and grant them to pg_monitor or similar? Especially since this > function loops over all the relations in a database, we may not want > everyone to be able to do this. I think you're correct there. While the data is not sensitive, it should have more controlled usage. It's only taking an AccessShareLock, but you would not want anyone to be able to run this since it's doing real computation. I think requiring pg_read_all_stats is a good idea. Will do. > 5. Catalog version number needs to be increased This will be left to the committer at the time. > 2. Should we add filtering? The current approach pg_stat_get_autovacuum_priority > does a full catalog scan without any filters and can be expensive. > Can we have the per-relation prioritization computation function in C > and provide a per-database computation function as a SQL function over > this per-relation function in system_functions.sql? Yes, perhaps we should do this. So we can have a function called pg_stat_get_autovacuum_priority() that either takes a NULL or an OID to either return all the tables or just a single table. This is a similar usage pattern as pg_stat_get_subscription or pg_stat_get_activity. pg_stat_autovacuum_priority will be a view that wraps around the NULL variant of the function. The case where the OID is passed we just do a SearchSysCache1(RELOID,...) whereas the other case will do the full catalog scan. What do you think? -- Sami
Hi, On Sat, Mar 28, 2026 at 10:54 AM Sami Imseih <samimseih@gmail.com> wrote: > > > 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions? > > > 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority > > for all and grant them to pg_monitor or similar? Especially since this > > function loops over all the relations in a database, we may not want > > everyone to be able to do this. > > I think you're correct there. While the data is not sensitive, it > should have more controlled usage. It's only taking an AccessShareLock, > but you would not want anyone to be able to run this since it's > doing real computation. I think requiring pg_read_all_stats > is a good idea. Will do. +1 for pg_read_all_stats. > > Can we have the per-relation prioritization computation function in C > > and provide a per-database computation function as a SQL function over > > this per-relation function in system_functions.sql? > > Yes, perhaps we should do this. So we can have a function called > pg_stat_get_autovacuum_priority() that either takes a NULL or an OID > to either return all the tables or just a single table. > This is a similar usage pattern as pg_stat_get_subscription or > pg_stat_get_activity. > > pg_stat_autovacuum_priority will be a view that wraps around the NULL > variant of the function. > > The case where the OID is passed we just do a SearchSysCache1(RELOID,...) > whereas the other case will do the full catalog scan. > > What do you think? IMHO, we can have pg_stat_get_relation_autovacuum_priority defined as a C function to give the autovacuum scoring as of the given moment for the given table. It's easy for one to write a function to get scoring for all the relations in a database. This keeps things simple yet useful. -- Bharath Rupireddy Amazon Web Services: https://aws.amazon.com
On Sun, Mar 29, 2026 at 10:09 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > On Sat, Mar 28, 2026 at 10:54 AM Sami Imseih <samimseih@gmail.com> wrote: > > > > > 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions? > > > > > 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority > > > for all and grant them to pg_monitor or similar? Especially since this > > > function loops over all the relations in a database, we may not want > > > everyone to be able to do this. > > > > I think you're correct there. While the data is not sensitive, it > > should have more controlled usage. It's only taking an AccessShareLock, > > but you would not want anyone to be able to run this since it's > > doing real computation. I think requiring pg_read_all_stats > > is a good idea. Will do. > > +1 for pg_read_all_stats. > Is there a gap here where someone may have been granted MAINTAIN on a relation but they do not have pg_read_all_stats? > > > Can we have the per-relation prioritization computation function in C > > > and provide a per-database computation function as a SQL function over > > > this per-relation function in system_functions.sql? > > > > Yes, perhaps we should do this. So we can have a function called > > pg_stat_get_autovacuum_priority() that either takes a NULL or an OID > > to either return all the tables or just a single table. > > This is a similar usage pattern as pg_stat_get_subscription or > > pg_stat_get_activity. > > > > pg_stat_autovacuum_priority will be a view that wraps around the NULL > > variant of the function. > > > > The case where the OID is passed we just do a SearchSysCache1(RELOID,...) > > whereas the other case will do the full catalog scan. > > > > What do you think? > > IMHO, we can have pg_stat_get_relation_autovacuum_priority defined as > a C function to give the autovacuum scoring as of the given moment for > the given table. It's easy for one to write a function to get scoring > for all the relations in a database. This keeps things simple yet > useful. > I don't have a strong opinion on the above, but I do suspect that the most common way people will interact with this is by querying against the view with a WHERE clause, so optimizing for that case seems important. Robert Treat https://xzilla.net
On 2026-Mar-27, Sami Imseih wrote: > 0002 introduces the view with documentation and testing in > vacuum.c (xid age and mxid age scores are not tested as > they require xid_wraparound to consume enough XIDs to > trigger a score, which will cost too much time for a > regression test). After this patch, there are three copies of the combined call to pgstat_fetch_stat_tabentry_ext() and relation_needs_vacanalyze(). Can we avoid that, ideally reducing to only one such? Or, at least, it looks like recheck_relation_needs_vacanalyze() may no longer need to exist at all. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Ed is the standard text editor." http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3
> On Sun, Mar 29, 2026 at 10:09 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Sat, Mar 28, 2026 at 10:54 AM Sami Imseih <samimseih@gmail.com> wrote: > > > > > > > 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions? > > > > > > > 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority > > > > for all and grant them to pg_monitor or similar? Especially since this > > > > function loops over all the relations in a database, we may not want > > > > everyone to be able to do this. > > > > > > I think you're correct there. While the data is not sensitive, it > > > should have more controlled usage. It's only taking an AccessShareLock, > > > but you would not want anyone to be able to run this since it's > > > doing real computation. I think requiring pg_read_all_stats > > > is a good idea. Will do. > > > > +1 for pg_read_all_stats. > > > > Is there a gap here where someone may have been granted MAINTAIN on a > relation but they do not have pg_read_all_stats? Yes, that is possible. MAINTAIN is a per-object privilege granted on a relation, whereas pg_read_all_stats is a global role membership. They operate at different levels. I don't think one needs to have MAINTAIN permissions on the table to see the autovacuum score. DBA Monitoring users are usually separate from the DBA operational users. I think pg_read_all_stats is the right permission here and it should be implemented similar to how pg_get_shmem_allocations is done where the default permissions are pg_read_all_stats. pg_monitor inherits pg_read_all_stats so any user with this privilege will be able to access this view. A DBA is free to also add privileges to to other users if they wish. This is unlike other pg_stat_* views that have tuple level permission checks ( i.e. pg_stat_activity), but in those cases the permissions are needed to hide sensitive data. This is not the case here. > > IMHO, we can have pg_stat_get_relation_autovacuum_priority defined as > > a C function to give the autovacuum scoring as of the given moment for > > the given table. It's easy for one to write a function to get scoring > > for all the relations in a database. This keeps things simple yet > > useful. > > > > I don't have a strong opinion on the above, but I do suspect that the > most common way people will interact with this is by querying against > the view with a WHERE clause, so optimizing for that case seems > important. Yeah, after sleeping on it I actually think the most common case will likely be ORDER BY score DESC LIMIT ... because you usually want to see where your table priority is relative to everything else in the database. For the rare case where someone wants to look up an individual table, the caller can just use a WHERE clause. So, we should just always do the full pg_class scan. I don't see why we need to complicate the c-function more than this. Attached v2 implements it as above. A few other things in v2: 1/ I set autovacuum_enabled = OFF in the tests. This will make sure the test is both stable and will also test that the score is returned even in the case where autovacuum is disabled. 2/ Moved pg_stat_autovacuum_priority to the end of the docs in "Monitoring Database Activity". 3/ Also added a mention of the extremely high score values when failsafe is triggered [1] [1] https://www.postgresql.org/message-id/abGP87A3JPIXDG2I%40nathan -- Sami
Вложения
> > 0002 introduces the view with documentation and testing in > > vacuum.c (xid age and mxid age scores are not tested as > > they require xid_wraparound to consume enough XIDs to > > trigger a score, which will cost too much time for a > > regression test). > > After this patch, there are three copies of the combined call to > pgstat_fetch_stat_tabentry_ext() and relation_needs_vacanalyze(). Can > we avoid that, ideally reducing to only one such? Or, at least, it > looks like recheck_relation_needs_vacanalyze() may no longer need to > exist at all. recheck_relation_needs_vacanalyze() can be removed and replace with compute_autovac_score(). I was trying to minimize refactoring, but this seems worthwhile to do. -- Sami
On Mon, Mar 30, 2026 at 11:17 AM Sami Imseih <samimseih@gmail.com> wrote: > > On Sun, Mar 29, 2026 at 10:09 PM Bharath Rupireddy > > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > On Sat, Mar 28, 2026 at 10:54 AM Sami Imseih <samimseih@gmail.com> wrote: > > > > > > > > > 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions? > > > > > > > > > 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority > > > > > for all and grant them to pg_monitor or similar? Especially since this > > > > > function loops over all the relations in a database, we may not want > > > > > everyone to be able to do this. > > > > > > > > I think you're correct there. While the data is not sensitive, it > > > > should have more controlled usage. It's only taking an AccessShareLock, > > > > but you would not want anyone to be able to run this since it's > > > > doing real computation. I think requiring pg_read_all_stats > > > > is a good idea. Will do. > > > > > > +1 for pg_read_all_stats. > > > > > > > Is there a gap here where someone may have been granted MAINTAIN on a > > relation but they do not have pg_read_all_stats? > > Yes, that is possible. MAINTAIN is a per-object privilege granted on a relation, > whereas pg_read_all_stats is a global role membership. They operate at > different levels. > > I don't think one needs to have MAINTAIN permissions on the table to see the > autovacuum score. DBA Monitoring users are usually separate from the DBA > operational users. > > I think pg_read_all_stats is the right permission here and it should > be implemented > similar to how pg_get_shmem_allocations is done where the default permissions > are pg_read_all_stats. pg_monitor inherits pg_read_all_stats so any > user with this > privilege will be able to access this view. A DBA is free to also add > privileges to > to other users if they wish. > > This is unlike other pg_stat_* views that have tuple level permission > checks ( i.e. > pg_stat_activity), but in those cases the permissions are needed to > hide sensitive data. > This is not the case here. > I don't think we are in disagreement here, I was just thinking about it the other way round; someone might have MAINTAIN privileges on a table and want to see what the relevant "autovacuum score" is before taking action. If the solution for that is to give those roles pg_read_all_stats, I guess that's ok, but there was probably a reason the permissions were limited in the first place. *shrug* > > > IMHO, we can have pg_stat_get_relation_autovacuum_priority defined as > > > a C function to give the autovacuum scoring as of the given moment for > > > the given table. It's easy for one to write a function to get scoring > > > for all the relations in a database. This keeps things simple yet > > > useful. > > > > > > > I don't have a strong opinion on the above, but I do suspect that the > > most common way people will interact with this is by querying against > > the view with a WHERE clause, so optimizing for that case seems > > important. > > Yeah, after sleeping on it I actually think the most common case will likely be > ORDER BY score DESC LIMIT ... because you usually want to see where your > table priority is relative to everything else in the database. > For the rare case where someone wants to look up an individual table, the caller > can just use a WHERE clause. So, we should just always do the full pg_class > scan. I don't see why we need to complicate the c-function more than this. > I think we are also in agreement here, although based on my experience, filtering out things like system and toast tables will be common, but I don't see that changing what you said above. On a similar note, +1 to your changes in v2. Robert Treat https://xzilla.net
> I think we are also in agreement here, although based on my > experience, filtering out things like system and toast tables will be > common, but I don't see that changing what you said above. On a > similar note, +1 to your changes in v2. Thanks! v3 now includes the refactoring [1] suggestion brought up by Alvarro [1] [https://www.postgresql.org/message-id/202603301508.up22nvhgnnoj%40alvherre.pgsql] -- Sami
Вложения
I didn't read any of the thread, but I looked at the patches. 0001: + * force_scores set to true forces the computation of a score. This is useful for + * tools that wish to inspect scores outside of the do_vacuum() path. I'm of two minds about this new function parameter. On one hand, I see the utility of forcing score calculations even when autovacuum is disabled. On the other hand, when autovacuum is disabled, the scores are actually 0.0, and it's probably a good idea to report exactly what autovacuum workers see. I also see that we're not forcing the computation of the (M)XID scores. Is that intentional? I wonder if we can rework this function to always calculate the scores, even if autovacuum is disabled or !force_vacuum. This way, both paths are doing the exact same thing and reporting the same scores. We might still want an extra parameter to avoid DEBUG3 in the system view path, but that seems like a reasonable difference between the two. 0002: Seems okay to me. 0003: + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>vacuum_dead_score</structfield> <type>double precision</type> + </para> + <para> + Score component based on the estimated number of dead tuples + needing removal by vacuum. + </para></entry> + </row> I think we should make sure the column names align with the names given to the new parameters [0] and the new "Autovacuum Prioritization" section in the docs [1]. [0] https://www.postgresql.org/docs/devel/runtime-config-vacuum.html#GUC-AUTOVACUUM-VACUUM-SCORE-WEIGHT [1] https://www.postgresql.org/docs/devel/routine-vacuuming.html#AUTOVACUUM-PRIORITY -- nathan
> + * force_scores set to true forces the computation of a score. This is useful for > + * tools that wish to inspect scores outside of the do_vacuum() path. > > I'm of two minds about this new function parameter. On one hand, I see the > utility of forcing score calculations even when autovacuum is disabled. On > the other hand, when autovacuum is disabled, the scores are actually 0.0, > and it's probably a good idea to report exactly what autovacuum workers > see. I went back and forth on this. Showing 0.0 when autovacuum is disabled would reflect what autovacuum workers actually see, but I think the more useful behavior is to always compute the score based on the table's actual state. This way, a DBA who has disabled autovacuum on a table can still see that its score is climbing and needs attention. The view shows need, not eligibility. This will also make the view more useful for maintenance jobs that wish to supplement autovacuum by looking at high scores and triggering a manual vacuum for those tables. > I also see that we're not forcing the computation of the (M)XID > scores. Is that intentional? hmm, the force_score does not need to be in the force_vacuum path because the score is calculated there naturally when the table is in need of force_vacuum. The force_score is there to ensure that we are not existing early in the autovacuum disabled case. > I wonder if we can rework this function to always calculate the scores, > even if autovacuum is disabled or !force_vacuum. This way, both paths are > doing the exact same thing and reporting the same scores. I prefer that we still calculate the score as if autovacuum is enabled for the reason above. I do think one potential middle ground is to have needs_analyze, needs_vacuum, eligible_analyze, eligible_vacuum fields to differentiate. I just rather not hide a score because a/v is disabled on a table. > + <row> > + <entry role="catalog_table_entry"><para role="column_definition"> > + <structfield>vacuum_dead_score</structfield> <type>double precision</type> > + </para> > + <para> > + Score component based on the estimated number of dead tuples > + needing removal by vacuum. > + </para></entry> > + </row> > > I think we should make sure the column names align with the names given to > the new parameters [0] and the new "Autovacuum Prioritization" section in > the docs [1]. I will look into this in the next rev. -- Sami
On Tue, Mar 31, 2026 at 11:15:35AM -0500, Sami Imseih wrote: >> + * force_scores set to true forces the computation of a score. This is useful for >> + * tools that wish to inspect scores outside of the do_vacuum() path. >> >> I'm of two minds about this new function parameter. On one hand, I see the >> utility of forcing score calculations even when autovacuum is disabled. On >> the other hand, when autovacuum is disabled, the scores are actually 0.0, >> and it's probably a good idea to report exactly what autovacuum workers >> see. > > I went back and forth on this. Showing 0.0 when autovacuum is disabled > would reflect what autovacuum workers actually see, but I think the more > useful behavior is to always compute the score based on the table's actual > state. This way, a DBA who has disabled autovacuum on a table can still > see that its score is climbing and needs attention. The view shows need, > not eligibility. This will also make the view more useful for maintenance > jobs that wish to supplement autovacuum by looking at high scores > and triggering a manual vacuum for those tables. That's a fair point. >> I also see that we're not forcing the computation of the (M)XID >> scores. Is that intentional? > > hmm, the force_score does not need to be in the force_vacuum path > because the score is calculated there naturally when the table is in > need of force_vacuum. The force_score is there to ensure that > we are not existing early in the autovacuum disabled case. So, unless the table is beyond a freeze-max-age parameter, the (M)XID scores will always be 0.0? >> I wonder if we can rework this function to always calculate the scores, >> even if autovacuum is disabled or !force_vacuum. This way, both paths are >> doing the exact same thing and reporting the same scores. > > I prefer that we still calculate the score as if autovacuum is enabled > for the reason above. I do think one potential middle ground is to have > needs_analyze, needs_vacuum, eligible_analyze, eligible_vacuum > fields to differentiate. I just rather not hide a score because a/v > is disabled on a table. My point is that instead of introducing a parameter to force score computations, we could just _always_ do that in this function. IOW maybe we could use this as an opportunity to simplify the function while also preparing it for the system view. -- nathan
> >> I also see that we're not forcing the computation of the (M)XID > >> scores. Is that intentional? > > > > hmm, the force_score does not need to be in the force_vacuum path > > because the score is calculated there naturally when the table is in > > need of force_vacuum. The force_score is there to ensure that > > we are not existing early in the autovacuum disabled case. > > So, unless the table is beyond a freeze-max-age parameter, the (M)XID > scores will always be 0.0? You're right after thinking about this again. There is no reason why we should treat the force_vacuum case differently. The score should still be included so someone monitoring can see the xid|mxid_age climbing well before it becomes an issue. -- Sami
Hi, On Mon, Mar 30, 2026 at 11:16 AM Sami Imseih <samimseih@gmail.com> wrote: > > > I think we are also in agreement here, although based on my > > experience, filtering out things like system and toast tables will be > > common, but I don't see that changing what you said above. On a > > similar note, +1 to your changes in v2. > > Thanks! > > v3 now includes the refactoring [1] suggestion brought up by Alvarro Thank you for sending the latest patches. Here are some comments: 1/ + while ((classTup = heap_getnext(relScan, ForwardScanDirection)) != NULL) Missing check_for_interrupts call while scanning the pg_class system catalog. 2/ + avopts = extract_autovac_opts(classTup, pg_class_desc); + + compute_autovac_score(classTup, pg_class_desc, + effective_multixact_freeze_max_age, avopts, + true, &dovacuum, &doanalyze, + &wraparound, &scores); + + if (avopts) + pfree(avopts); + When a database has a large number of tables (which is quite common in production scenarios), I expect the costs of palloc and pfree being used for fetching autovacuum relopts would make this function slower. Can we invent a new function or pass a caller-allocated AutoVacOpts memory to just copy the relopts and use that in this tight loop when scanning for all the relations? 3/ + values[8] = Float8GetDatum(scores.vac_ins); + values[9] = Float8GetDatum(scores.anl); Nit: It's a matter of taste. How about using something like below instead of hardcoded column numbers? I expect this view to grow in the future, so it helps to keep things simple. values[i++] = Float8GetDatum(scores.anl); Assert(i == NUM_AV_SCORE_COLS); 4/ + The <link linkend="monitoring-pg-stat-autovacuum-priority-view"> + <structname>pg_stat_autovacuum_priority</structname></link> view can be + used to inspect each table's autovacuum need and priority score. How about adding "as of the moment" to convey that it doesn't report what currently running autovacuum or pending autovacuum would consider? 5/ Also, can we add a simple paragraph on how to interpret and take actions based on the scores reported (like prioritizing one table over the other - adjust these parameters in the table's relopts or something like that - no need to cover all the possible cases, but just one example would be sufficient for the user to understand)? 6/ + descr => 'statistics: autovacuum priority scores for all relations', s/"for all relations"/"for all relations in the current database" 7/ Addition of force_scores to relation_needs_vacanalyze makes the code unreadable (IMO) with a lot of if-else branching. Why not make force_vacuum an option and pass it as true from the stats function and leave a note in the function comment on when to use this parameter? Would something like that work? Also, when autovacuum is disabled (either via GUC or via relopts), we don't want to calculate and report any scores. IMHO, this keeps things simple and code readable. -- Bharath Rupireddy Amazon Web Services: https://aws.amazon.com
On Tue, Mar 31, 2026 at 11:09:43AM -0700, Bharath Rupireddy wrote: > 1/ + while ((classTup = heap_getnext(relScan, ForwardScanDirection)) != NULL) > > Missing check_for_interrupts call while scanning the pg_class system catalog. From a glance I don't see one in the scanning code in do_autovacuum(), either. I'm not sure we need to be worried about this. > + avopts = extract_autovac_opts(classTup, pg_class_desc); > + > + compute_autovac_score(classTup, pg_class_desc, > + effective_multixact_freeze_max_age, avopts, > + true, &dovacuum, &doanalyze, > + &wraparound, &scores); > + > + if (avopts) > + pfree(avopts); > > When a database has a large number of tables (which is quite common in > production scenarios), I expect the costs of palloc and pfree being > used for fetching autovacuum relopts would make this function slower. > Can we invent a new function or pass a caller-allocated AutoVacOpts > memory to just copy the relopts and use that in this tight loop when > scanning for all the relations? Before making this code more complicated, I think we ought to demonstrate there's an actual problem or slowness. I am quite dubious we need to do anything here. > + values[8] = Float8GetDatum(scores.vac_ins); > + values[9] = Float8GetDatum(scores.anl); > > Nit: It's a matter of taste. How about using something like below > instead of hardcoded column numbers? I expect this view to grow in the > future, so it helps to keep things simple. > > values[i++] = Float8GetDatum(scores.anl); > Assert(i == NUM_AV_SCORE_COLS); I don't think either way is substantially better. > + The <link linkend="monitoring-pg-stat-autovacuum-priority-view"> > + <structname>pg_stat_autovacuum_priority</structname></link> view can be > + used to inspect each table's autovacuum need and priority score. > > How about adding "as of the moment" to convey that it doesn't report > what currently running autovacuum or pending autovacuum would > consider? I don't think "as of the moment" adds any clarity about that. If we did want to add something along those lines, I'd add a separate sentence that says that it doesn't report the values of current autovacuum workers and is freshly calculated by the current backend (or something like that). -- nathan
> > 1/ + while ((classTup = heap_getnext(relScan, ForwardScanDirection)) != NULL) > > > > Missing check_for_interrupts call while scanning the pg_class system catalog. > > From a glance I don't see one in the scanning code in do_autovacuum(), > either. I'm not sure we need to be worried about this. Yes, I will leave it out. Even for catalogs with thousands of tables, I don't foresee this being an issue. > > + values[8] = Float8GetDatum(scores.vac_ins); > > + values[9] = Float8GetDatum(scores.anl); > > > > Nit: It's a matter of taste. How about using something like below > > instead of hardcoded column numbers? I expect this view to grow in the > > future, so it helps to keep things simple. > > > > values[i++] = Float8GetDatum(scores.anl); > > Assert(i == NUM_AV_SCORE_COLS); > > I don't think either way is substantially better. I agree. using numbers is more readable IMO. > > + <row> > > + <entry role="catalog_table_entry"><para role="column_definition"> > > + <structfield>vacuum_dead_score</structfield> <type>double precision</type> > > + </para> > > + <para> > > + Score component based on the estimated number of dead tuples > > + needing removal by vacuum. > > + </para></entry> > > + </row> > > > > I think we should make sure the column names align with the names given to > > the new parameters [0] and the new "Autovacuum Prioritization" section in > > the docs [1]. > I will look into this in the next rev. The field names now match the GUC names (without the _weight). We might as well also make this name change in the AutoVacuumScore struct. I attached v4 which includes 4 patches in the set to address earlier comments: 0001: This modifies relation_needs_vacanalyze() to always compute the score so the monitoring view can report the score regardless if autovacuum is enabled on the table. AutoVacuumScores is renamed to AutoVacuumPriority as it now also tracks needs_analyze, needs_vacuum and is_wraparound which are different from the dovacuum, doanalyze and wraparound output parameters which are acted on by autovacuum. This is a clean separation of reporting the scores/needs for autovacuum/analyze and the eligibility for autovacuum which is based on the state of autovacuum being enabled and set on the table level. 0002: relation_needs_vacanalyze() takes a new elevel argument to control logging. Callers pass DEBUG3 for autovacuum or 0 to suppress logging. 0003: Alvaro's earlier suggestion to factor out recheck_relation_needs_vacanalyze() and introduce compute_autovac_score() 0004: The view implementation with field names that better match the GUCs as suggested by Nathan in an earlier comment. -- Sami
Вложения
0001: - AutoVacuumScores *scores); + AutoVacuumPriority *priority); IMHO we need to minimize these kinds of extraneous changes in this patch set. AutoVacuumScores still seems accurate enough, even when the struct contains some extra bool members. - * A table whose autovacuum_enabled option is false is - * automatically skipped (unless we have to vacuum it due to freeze_max_age). - * Thus autovacuum can be disabled for specific tables. Also, when the cumulative - * stats system does not have data about a table, it will be skipped. + * A table whose autovacuum_enabled option is false is automatically skipped + * by autovacuum (unless we have to vacuum it due to freeze_max_age), + * but scores are still computed. Also, when the cumulative stats system does + * not have data about a table, threshold-based scores will be zero. I don't think we need to update this comment. - * One exception to the previous paragraph is for tables nearing wraparound, - * i.e., those that have surpassed the effective failsafe ages. In that case, - * the relfrozen/relminmxid-based score is scaled aggressively so that the - * table has a decent chance of sorting to the front of the list. + * Furthermore, for tables nearing wraparound, i.e., those that have surpassed + * the effective failsafe ages, the relfrozen/relminmxid-based score is scaled + * aggressively so that the table has a decent chance of sorting to the front + * of the list. Or this one. + * Priority scores are always computed. dovacuum and doanalyze are only set when + * autovacuum is active and enabled for the relation. I think we should more explicitly state that while scores->needs_vacuum and friends are always set regardless of whether autovacuum is enabled, the return parameters dovacuum, etc., are not. Or perhaps we should return whether autovacuum is enabled in the struct and consolidate the return parameters and the struct members. WDYT? 0002: Seems fine. 0003: Seems fine. 0004: + FROM pg_stat_get_autovacuum_priority() S + JOIN pg_class C ON C.oid = S.relid + LEFT JOIN pg_namespace N ON N.oid = C.relnamespace; What do you think about ordering by score so this view automatically shows the tables most in need of vacuuming/analyzing first? -- nathan
> 0001:
>
> - AutoVacuumScores *scores);
> + AutoVacuumPriority *priority);
>
> IMHO we need to minimize these kinds of extraneous changes in this patch
> set. AutoVacuumScores still seems accurate enough, even when the struct
> contains some extra bool members.
>
> - * A table whose autovacuum_enabled option is false is
> - * automatically skipped (unless we have to vacuum it due to freeze_max_age).
> - * Thus autovacuum can be disabled for specific tables. Also, when the cumulative
> - * stats system does not have data about a table, it will be skipped.
> + * A table whose autovacuum_enabled option is false is automatically skipped
> + * by autovacuum (unless we have to vacuum it due to freeze_max_age),
> + * but scores are still computed. Also, when the cumulative stats system does
> + * not have data about a table, threshold-based scores will be zero.
>
> I don't think we need to update this comment.
>
> - * One exception to the previous paragraph is for tables nearing wraparound,
> - * i.e., those that have surpassed the effective failsafe ages. In that case,
> - * the relfrozen/relminmxid-based score is scaled aggressively so that the
> - * table has a decent chance of sorting to the front of the list.
> + * Furthermore, for tables nearing wraparound, i.e., those that have surpassed
> + * the effective failsafe ages, the relfrozen/relminmxid-based score is scaled
> + * aggressively so that the table has a decent chance of sorting to the front
> + * of the list.
>
> Or this one.
Fixed both.
> + * Priority scores are always computed. dovacuum and doanalyze are only set when
> + * autovacuum is active and enabled for the relation.
>
> I think we should more explicitly state that while scores->needs_vacuum and
> friends are always set regardless of whether autovacuum is enabled, the
> return parameters dovacuum, etc., are not.
I made this more explicit by saying " All fields in AutoVacuumScores
are always computed
regardless of autovacuum settings...." I think that is clear enough.
> Or perhaps we should return
> whether autovacuum is enabled in the struct and consolidate the return
> parameters and the struct members. WDYT?
dovacuum, doanalyze will be unused by callers like the sql function, and putting
them in the struct could make this cleaner, but I don't think it's
worth it to blur
the purpose of the struct. I rather keep it just for score computation purposes.
> 0002:
>
> Seems fine.
I found a bug in my v4 that I fixed.
+ if (elevel > 0 && vac_ins_base_thresh >= 0)
is wrong. It should be:
if (elevel > 0)
{
if (vac_ins_base_thresh >= 0)
> 0004:
>
> + FROM pg_stat_get_autovacuum_priority() S
> + JOIN pg_class C ON C.oid = S.relid
> + LEFT JOIN pg_namespace N ON N.oid = C.relnamespace;
>
> What do you think about ordering by score so this view automatically shows
> the tables most in need of vacuuming/analyzing first?
I thought about that initially, but we don't really have an example where the
data is ordered in a stats view ( or any other catalog view ), and I prefer not
to impose that on the user automatically.
See v5.
--
Sami
Вложения
v6 attached addresses a silly bug with v5-0001 that Nathan pointed out offline. v5 was not computing the scores for vacuum,vacuum insert and analyze thresholds for the !wraparound case. -- Sami
Вложения
+ bool needs_vacuum; /* threshold exceeded for vacuum */
+ bool needs_analyze; /* threshold exceeded for analyze */
+ bool is_wraparound; /* at risk of XID/MXID wraparound */
I've been thinking about whether to also return whether autovacuum is
enabled in the view, i.e., AutoVacuumingActive() && av_enabled. My
instinct was that would help explain why autovacuum isn't picking up tables
that are eligible for autovacuum based on the view. One wrinkle is that
it'd report false even when autovacuum is working on a table for wraparound
prevention. I'm also not especially excited about further complicating
this stuff for folks who disable autovacuum. Furthermore, the value of the
reloptions and autovacuum GUC are discoverable elsewhere. So, I'm
currently leaning towards leaving that information out for now.
+ if (vactuples > vacthresh)
+ {
+ scores->needs_vacuum = true;
+ if (av_enabled && AutoVacuumingActive())
*dovacuum = true;
}
nitpick: We might be able to simplify this a bit by 1) storing "av_enabled
&& AutoVacuumingActive()" in a variable and 2) reworking the code to look
more like this:
scores->needs_vacuum = (vactuples > vacthresh);
*do_vacuum |= (av_enabled && scores->needs_vacuum);
... but others might find your version easier to read.
Otherwise, 0001 looks good.
In 0003, I think you missed renaming the last argument to
compute_autovac_score() in table_recheck_autovac().
I didn't see anything else in this read-through. I'm planning to start
preparing this for commit tomorrow.
--
nathan
> I've been thinking about whether to also return whether autovacuum is > enabled in the view, i.e., AutoVacuumingActive() && av_enabled. I don't think we can rely on AutoVacuumingActive() being stable since a backend that does a SET track_counts = off for whatever reason and then calls the view will get false. av_enabled will likely be the only thing we can report. > So, I'm > currently leaning towards leaving that information out for now. I agree. > > scores->needs_vacuum = (vactuples > vacthresh); > *do_vacuum |= (av_enabled && scores->needs_vacuum); > > ... but others might find your version easier to read. yeah, for readability, I'll stick with the current. > Otherwise, 0001 looks good. > > In 0003, I think you missed renaming the last argument to > compute_autovac_score() in table_recheck_autovac(). Earlier, I did not compile with 0003 only. Fixed. > I didn't see anything else in this read-through. I'm planning to start > preparing this for commit tomorrow. Thanks! here is v7 -- Sami
Вложения
On Wed, Apr 01, 2026 at 06:04:36PM -0500, Sami Imseih wrote: > Thanks! here is v7 Alright, I've been preparing these for commit. Most changes are cosmetic, but there are a couple of bigger ones I should note: * I added a prerequisite patch for relation_needs_vacanalyze() that saves a level of indentation on a chunk of code. This simplifies 0001 (now 0002) a bit. * I noticed that if autovacuum decides to force a vacuum for anti-wraparound purposes, it might also decide to analyze the table even if autovacuum is disabled for it. AFAICT this is accidental, but since it's behaved this way since commit 48188e1621 (2006) [0], I am slightly worried that this bug may have become a feature. In 0002, I separated this edge case in the code and added a comment, and I intend to start a new thread about removing it. * I removed the booleans in the view in favor of just noting that scores >= 1.0 means the table needs processing. IMHO trying to distinguish needs_vacuum from do_vacuum is just going to confuse folks more than anything, and IIUC this information is redundant with "score >= 1.0", anyway. * I renamed the view to pg_autovacuum_scores. While some of the information in the view depends on cumulative statistics, not all of it does, and what does is quite heavily modified from the original stats. So, I didn't think the pg_stat_* prefix was appropriate, although I can see how reasonable people might disagree. * I considered whether to make the backing function per-table and ultimately decided against it. The initialization logic is a bit expensive, and I assume most folks will be interested in the full picture of the current database. Maybe we could add a per-table function down the road, but I don't see any strong need for that for now. I'm planning to commit 0001-0004 this afternoon, assuming cfbot is happy. I'm hoping to commit 0005 on Monday or Tuesday. Please take a look at v8 if you have time. [0] https://postgr.es/m/23710.1162661716%40sss.pgh.pa.us -- nathan
Вложения
> Alright, I've been preparing these for commit. Most changes are cosmetic, > but there are a couple of bigger ones I should note: Thanks! > * I added a prerequisite patch for relation_needs_vacanalyze() that saves a > level of indentation on a chunk of code. This simplifies 0001 (now 0002) a > bit. I like this this. > * I noticed that if autovacuum decides to force a vacuum for > anti-wraparound purposes, it might also decide to analyze the table even if > autovacuum is disabled for it. AFAICT this is accidental, but since it's > behaved this way since commit 48188e1621 (2006) [0], I am slightly worried > that this bug may have become a feature. In 0002, I separated this edge > case in the code and added a comment, and I intend to start a new thread > about removing it. hmm yeah, I think this just needs to be documented clearly. I always thought it was expected for auto-analyze to run in this case, and I don't see why it shouldn't. If this needs to be clarified in docs, we should do that in a separate discussion. > * I removed the booleans in the view in favor of just noting that scores >= > 1.0 means the table needs processing. IMHO trying to distinguish > needs_vacuum from do_vacuum is just going to confuse folks more than > anything, and IIUC this information is redundant with "score >= 1.0", > anyway. That's fine by me. > * I renamed the view to pg_autovacuum_scores. While some of the > information in the view depends on cumulative statistics, not all of it > does, and what does is quite heavily modified from the original stats. So, > I didn't think the pg_stat_* prefix was appropriate, although I can see how > reasonable people might disagree. Initially I thought about moving this away from the cumulative stats section, but this view does need to lookup relation stats and if relation stats are reset, the same rules will apply to this view. Also not all views under "cumulative stats" are necessarily cumulative. Some just show real-time data; pg_stat_activity, pg_stat_progress_*, etc. This view does not have precedent in the type of work it does, but I do really think it belongs under pg_stat_*, and not be too far away conceptually from the vacuum stats in pg_stat_all_tables. > * I considered whether to make the backing function per-table and > ultimately decided against it. The initialization logic is a bit > expensive, and I assume most folks will be interested in the full picture > of the current database. Maybe we could add a per-table function down the > road, but I don't see any strong need for that for now. Yes, I did not proceed with this since the common use-case will be comparing tables in a broader context. I don't see a string single table use-case at this point. Besides the above, I have one comment on 0005: Where it says "indicate the table needs analyzing" or "needs processing" or "needs vacuuming", we should instead say "may need". Since the actually processing depends on the thresholds or force vacuum conditions, but no need to go into that level of detail in the row descriptions. That is all explained in the existing autovacuum prioritization docs. -- Sami Imseih Amazon Web Services (AWS)
I committed 0001-0003. Here is a new version of the last two patches. Some notes: * Instead of renaming recheck_relation_needs_vacanalyze(), I followed Álvaro's suggestion to remove that function and instead fetch the stats within relation_needs_vacanalyze() itself. * Per your feedback, I renamed the view back to pg_stat_autovacuum_scores. * Instead of limiting the view to pg_read_all_stats, I've left it accessible to all users. AFAICT there's nothing sensitive here. * I reworked the column descriptions a bit to make it clear that values >= 1.0 mean autovacuum will process it (except if autovacuum is disabled). WDYT? -- nathan
Вложения
> WDYT? "Scores greater than or equal to <literal>1.0</literal>" in the comments of each field are misleading. This conflates scoring with vacuum/analyze eligibility and it's possible with a autovacuum_*_weight < 1.0 to trigger an autovacuum/analyze. I suggest the attached changes on top of v9. -- Sami
Вложения
On Sat, Apr 04, 2026 at 08:25:26AM -0500, Sami Imseih wrote: > "Scores greater than or equal to <literal>1.0</literal>" in the comments > of each field are misleading. This conflates scoring with vacuum/analyze > eligibility and it's possible with a autovacuum_*_weight < 1.0 to trigger an > autovacuum/analyze. Ah, that's unfortunate. I think it'd be good to give folks some idea of what autovacuum will actually process. I wonder if we could adjust the documentation accordingly. -- nathan
> On Sat, Apr 04, 2026 at 08:25:26AM -0500, Sami Imseih wrote: > > "Scores greater than or equal to <literal>1.0</literal>" in the comments > > of each field are misleading. This conflates scoring with vacuum/analyze > > eligibility and it's possible with a autovacuum_*_weight < 1.0 to trigger an > > autovacuum/analyze. > > Ah, that's unfortunate. I think it'd be good to give folks some idea of > what autovacuum will actually process. I wonder if we could adjust the > documentation accordingly. That's why I thought having the bool fields made sense in the earlier versions of the view. Since autovacuum is dealing with 2 concepts: eligibility: is av enabled and is the table meeting thresholds score: The priority of how the eligible tables will be processed. So, while this could be explained in docs, I think it's better we report these fields. We might as well just call the view pg_stat_autovacuum in that case. What do you think? -- Sami
On Sat, Apr 04, 2026 at 10:32:07AM -0500, Sami Imseih wrote: >> On Sat, Apr 04, 2026 at 08:25:26AM -0500, Sami Imseih wrote: >> > "Scores greater than or equal to <literal>1.0</literal>" in the comments >> > of each field are misleading. This conflates scoring with vacuum/analyze >> > eligibility and it's possible with a autovacuum_*_weight < 1.0 to trigger an >> > autovacuum/analyze. >> >> Ah, that's unfortunate. I think it'd be good to give folks some idea of >> what autovacuum will actually process. I wonder if we could adjust the >> documentation accordingly. > > That's why I thought having the bool fields made sense in the earlier > versions of the view. Since autovacuum is dealing with 2 concepts: > > eligibility: is av enabled and is the table meeting thresholds > score: The priority of how the eligible tables will be processed. > > So, while this could be explained in docs, I think it's better we report > these fields. I understand your position, but I still worry about potential confusion if when dovacuum and needs_vacuum differ. And I don't know how much we really ought to be tailoring this stuff to clusters where autovacuum is disabled or where the scores are being adjusted. Also, I think we ought to go into more detail in the documentation, anyway. So my instinct was to do something more like the attached. If we did report booleans, I would probably argue for just reporting dovacuum and doanalyze and calling out the criteria for why they may be false even when it looks like the table needs processing. -- nathan
Вложения
> > eligibility: is av enabled and is the table meeting thresholds > > score: The priority of how the eligible tables will be processed. > > > > So, while this could be explained in docs, I think it's better we report > > these fields. > > I understand your position, but I still worry about potential confusion if > when dovacuum and needs_vacuum differ. And I don't know how much we really > ought to be tailoring this stuff to clusters where autovacuum is disabled > or where the scores are being adjusted. Also, I think we ought to go into > more detail in the documentation, anyway. After looking at it a bit more, I don't think we can avoid putting the needs_analyze and needs_vacuum fields, because without them there will still be a gap in understanding of if a table is eligible for autovacuum. A score on its own will not draw the full picture. For example, if I tune the _weight settings, I want to know if this table now becomes a candidate for vacuum/analyze, irrespective if autovacuum will actually process it due to some bad configuration. I can see someone seaching this view by just the bool fields to see the eligibile tables also without caring about the score. > If we did report booleans, I would probably argue for just reporting > dovacuum and doanalyze and calling out the criteria for why they may be > false even when it looks like the table needs processing. Yes, we only require a needs_analyze and needs_vacuum. The latter can be set to true due to thresholds or wraparound. But, I don't think we should rely on the dovacuum or doanalyze, instead we can just have a flag in AutoVacuumScores->needs and track what is needed. This will separate the autovacuum processing from the reporting. Here is what I am thinking we should do. v11-0001 is same as v10-0001 v11-0002 introduces the additional bit field for reporting. This sets the need for vacuum regardless of av_enabled. v11-0003 the view with slightly more detailed documentation. -- Sami
Вложения
On Sat, Apr 04, 2026 at 12:48:28PM -0500, Sami Imseih wrote: >> If we did report booleans, I would probably argue for just reporting >> dovacuum and doanalyze and calling out the criteria for why they may be >> false even when it looks like the table needs processing. > > Yes, we only require a needs_analyze and needs_vacuum. The latter > can be set to true due to thresholds or wraparound. But, I don't think we > should rely on the dovacuum or doanalyze, instead we can just have a flag > in AutoVacuumScores->needs and track what is needed. This will separate > the autovacuum processing from the reporting. Sorry for going in circles about this, but I'm not seeing why we wouldn't just return the booleans that relation_needs_vacanalyze() already returns. I think the question people will have is "what will autovacuum process and in what order?", and if we aren't giving them the exact same information that autovacuum is using to make its decisions, then I'm not sure what is the point. It's true that someone might disable autovacuum for a table and that it would otherwise be processed, but so be it. Concretely, like the attached 0003. IMHO this feels much more natural than giving folks booleans that usually represents dovacuum/doanalyze but that don't in certain cases. -- nathan
Вложения
> >> If we did report booleans, I would probably argue for just reporting > >> dovacuum and doanalyze and calling out the criteria for why they may be > >> false even when it looks like the table needs processing. > > > > Yes, we only require a needs_analyze and needs_vacuum. The latter > > can be set to true due to thresholds or wraparound. But, I don't think we > > should rely on the dovacuum or doanalyze, instead we can just have a flag > > in AutoVacuumScores->needs and track what is needed. This will separate > > the autovacuum processing from the reporting. > > Sorry for going in circles about this, but I'm not seeing why we wouldn't > just return the booleans that relation_needs_vacanalyze() already returns. > I think the question people will have is "what will autovacuum process and > in what order?", and if we aren't giving them the exact same information > that autovacuum is using to make its decisions, then I'm not sure what is > the point. It's true that someone might disable autovacuum for a table and > that it would otherwise be processed, but so be it. Maybe there’s no need to worry too much about the autovacuum disabled case or track_counts being disabled when querying the view. Both are edge cases, and it seemed fairly trivial to compensate for this with what I had attached earlier. Anyhow, I will not push this point further. I am ok with proceeding with what you have in v12. The patches overall LGTM. Thanks! -- Sami
Committed after some more editorialization. -- nathan