Обсуждение: Add pg_stat_autovacuum_priority

Поиск
Список
Период
Сортировка

Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
Hi,

This is a quick follow-up to the commit d7965d65f which
introduced autovacuum prioritization based on a score that
is the Max of several components, such as vacuum
thresholds, xid age, etc.

It was also discussed in that thread [1] that we will need
a view to expose the priority scores, per table in a view.
This will allow a user to introspect what the autovacuum
launcher will prioritize next as well as verify tuning
efforts for autovacuum prioritization; the latter case
likely being rare.

So after spending time on this today, I am proposing a view
that returns a line for each relation with information
about if the table needs autovacuum/autoanalyze, as well as
scores of each component and the Max score. It looks like
the below:

```
postgres=# select * FROM pg_stat_autovacuum_priority;
-[ RECORD 1 ]-----+----------------------------
relid             | 16410
schemaname        | public
relname           | av_priority_test
needs_vacuum      | f
needs_analyze     | f
wraparound        | f
score             | 0
xid_score         | 0
mxid_score        | 0
vacuum_dead_score | 0
vacuum_ins_score  | 0
analyze_score     | 0
```

The function essentially calls relation_needs_vacanalyze()
with some setup work, such as scanning the catalog with an
AccessShareLock, etc. and emits the result of this call.

To make this work 0001 introduces a small change to
relation_needs_vacanalyze() to take in a boolean to force
the calculation of the score (even if autovacuum is
disabled for the relation).

0002 introduces the view with documentation and testing in
vacuum.c (xid age and mxid age scores are not tested as
they require xid_wraparound to consume enough XIDs to
trigger a score, which will cost too much time for a
regression test).

Find the attached taking the first attempt at this view.

[1] [https://www.postgresql.org/message-id/CAApHDvqQN-B2sQov8nsfZOmx-VeJMauSf4kLa3A8LsK1tUyBNw%40mail.gmail.com]

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: Add pg_stat_autovacuum_priority

От
SATYANARAYANA NARLAPURAM
Дата:

Hi Sami,


On Fri, Mar 27, 2026 at 4:14 PM Sami Imseih <samimseih@gmail.com> wrote:
Hi,

This is a quick follow-up to the commit d7965d65f which
introduced autovacuum prioritization based on a score that
is the Max of several components, such as vacuum
thresholds, xid age, etc.

It was also discussed in that thread [1] that we will need
a view to expose the priority scores, per table in a view.
This will allow a user to introspect what the autovacuum
launcher will prioritize next as well as verify tuning
efforts for autovacuum prioritization; the latter case
likely being rare.

So after spending time on this today, I am proposing a view
that returns a line for each relation with information
about if the table needs autovacuum/autoanalyze, as well as
scores of each component and the Max score. It looks like
the below:

```
postgres=# select * FROM pg_stat_autovacuum_priority;
-[ RECORD 1 ]-----+----------------------------
relid             | 16410
schemaname        | public
relname           | av_priority_test
needs_vacuum      | f
needs_analyze     | f
wraparound        | f
score             | 0
xid_score         | 0
mxid_score        | 0
vacuum_dead_score | 0
vacuum_ins_score  | 0
analyze_score     | 0
```

The function essentially calls relation_needs_vacanalyze()
with some setup work, such as scanning the catalog with an
AccessShareLock, etc. and emits the result of this call.

To make this work 0001 introduces a small change to
relation_needs_vacanalyze() to take in a boolean to force
the calculation of the score (even if autovacuum is
disabled for the relation).

0002 introduces the view with documentation and testing in
vacuum.c (xid age and mxid age scores are not tested as
they require xid_wraparound to consume enough XIDs to
trigger a score, which will cost too much time for a
regression test).

Find the attached taking the first attempt at this view.

[1] [https://www.postgresql.org/message-id/CAApHDvqQN-B2sQov8nsfZOmx-VeJMauSf4kLa3A8LsK1tUyBNw%40mail.gmail.com]


Thanks for adding this. Applied the patch and the tests passed. I haven't fully reviewed the patch but have a few comments below:

1. Please ass CFI in the function pg_stat_get_autovacuum_priority, as the list of tables can be very long

+ while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
+ {

2. Should we add filtering? The current approach pg_stat_get_autovacuum_priority does a full catalog scan without any filters and can be expensive.

3. Please add tests for tables with autovacuum = off

4. Is the view intended to be exposed to PUBLIC without any ACL restrictions?

5. Catalog version number needs to be increased

-#define CATALOG_VERSION_NO 202603241
+#define CATALOG_VERSION_NO 202603231

Thanks,
Satya 

Re: Add pg_stat_autovacuum_priority

От
SATYANARAYANA NARLAPURAM
Дата:


On Fri, Mar 27, 2026 at 9:14 PM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote:

Hi Sami,


On Fri, Mar 27, 2026 at 4:14 PM Sami Imseih <samimseih@gmail.com> wrote:
Hi,

This is a quick follow-up to the commit d7965d65f which
introduced autovacuum prioritization based on a score that
is the Max of several components, such as vacuum
thresholds, xid age, etc.

It was also discussed in that thread [1] that we will need
a view to expose the priority scores, per table in a view.
This will allow a user to introspect what the autovacuum
launcher will prioritize next as well as verify tuning
efforts for autovacuum prioritization; the latter case
likely being rare.

So after spending time on this today, I am proposing a view
that returns a line for each relation with information
about if the table needs autovacuum/autoanalyze, as well as
scores of each component and the Max score. It looks like
the below:

```
postgres=# select * FROM pg_stat_autovacuum_priority;
-[ RECORD 1 ]-----+----------------------------
relid             | 16410
schemaname        | public
relname           | av_priority_test
needs_vacuum      | f
needs_analyze     | f
wraparound        | f
score             | 0
xid_score         | 0
mxid_score        | 0
vacuum_dead_score | 0
vacuum_ins_score  | 0
analyze_score     | 0
```

The function essentially calls relation_needs_vacanalyze()
with some setup work, such as scanning the catalog with an
AccessShareLock, etc. and emits the result of this call.

To make this work 0001 introduces a small change to
relation_needs_vacanalyze() to take in a boolean to force
the calculation of the score (even if autovacuum is
disabled for the relation).

0002 introduces the view with documentation and testing in
vacuum.c (xid age and mxid age scores are not tested as
they require xid_wraparound to consume enough XIDs to
trigger a score, which will cost too much time for a
regression test).

Find the attached taking the first attempt at this view.

[1] [https://www.postgresql.org/message-id/CAApHDvqQN-B2sQov8nsfZOmx-VeJMauSf4kLa3A8LsK1tUyBNw%40mail.gmail.com]


Thanks for adding this. Applied the patch and the tests passed. I haven't fully reviewed the patch but have a few comments below:

1. Please ass CFI in the function pg_stat_get_autovacuum_priority, as the list of tables can be very long

+ while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
+ {

2. Should we add filtering? The current approach pg_stat_get_autovacuum_priority does a full catalog scan without any filters and can be expensive.

3. Please add tests for tables with autovacuum = off

4. Is the view intended to be exposed to PUBLIC without any ACL restrictions?

5. Catalog version number needs to be increased

-#define CATALOG_VERSION_NO 202603241
+#define CATALOG_VERSION_NO 202603231

Additionally, do you expect this view to be available on the hot_Standby? Because on a hot standby, the view only provides useful wraparound risk data. All activity-based columns are blind. This should either be documented, or the function should check RecoveryInProgress() and raise an error/notice

Thanks,
Satya

Re: Add pg_stat_autovacuum_priority

От
Bharath Rupireddy
Дата:
Hi,

On Fri, Mar 27, 2026 at 4:14 PM Sami Imseih <samimseih@gmail.com> wrote:
>
> This is a quick follow-up to the commit d7965d65f which
> introduced autovacuum prioritization based on a score that
> is the Max of several components, such as vacuum
> thresholds, xid age, etc.

Thanks for sending the patch.

+1 for the visibility into the new autovacuum scoring system and its
impact on prioritization. It would also be nice to add the computed
scores to pg_stat_progress_vacuum to show the current prioritization.
IMHO, we can get that change first. It's relatively smaller.

I quickly reviewed the v1 patches. Here are some comments:

1/
+ * pg_stat_get_autovacuum_priority
+ * Returns the autovacuum priority score for each relation in the
+ * current database.
+ *
Can we have the per-relation prioritization computation function in C
and provide a per-database computation function as a SQL function over
this per-relation function in system_functions.sql? This would
simplify things and also address the concern raised by Satya in this
thread about databases having a large number of relations. Would that
work?

2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority
for all and grant them to pg_monitor or similar? Especially since this
function loops over all the relations in a database, we may not want
everyone to be able to do this.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com



Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
Hi,

Thank you all for the review and comments!

> 1. Please add CFI in the function pg_stat_get_autovacuum_priority, as the list of tables can be very long
good catch. Will add.


> 3. Please add tests for tables with autovacuum = off
Yes, I will add this test as well.


> 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions?

> 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority
> for all and grant them to pg_monitor or similar? Especially since this
> function loops over all the relations in a database, we may not want
> everyone to be able to do this.

I think you're correct there. While the data is not sensitive, it
should have more controlled usage. It's only taking an AccessShareLock,
but you would not want anyone to be able to run this since it's
doing real computation. I think requiring pg_read_all_stats
is a good idea. Will do.


> 5. Catalog version number needs to be increased
This will be left to the committer at the time.

> 2. Should we add filtering? The current approach pg_stat_get_autovacuum_priority
> does a full catalog scan without any filters and can be expensive.

> Can we have the per-relation prioritization computation function in C
> and provide a per-database computation function as a SQL function over
> this per-relation function in system_functions.sql?

Yes, perhaps we should do this. So we can have a function called
pg_stat_get_autovacuum_priority() that either takes a NULL or an OID
to either return all the tables or just a single table.
This is a similar usage pattern as pg_stat_get_subscription or
pg_stat_get_activity.

pg_stat_autovacuum_priority will be a view that wraps around the NULL
variant of the function.

The case where the OID is passed we just do a SearchSysCache1(RELOID,...)
whereas the other case will do the full catalog scan.

What do you think?

--
Sami



Re: Add pg_stat_autovacuum_priority

От
Bharath Rupireddy
Дата:
Hi,

On Sat, Mar 28, 2026 at 10:54 AM Sami Imseih <samimseih@gmail.com> wrote:
>
> > 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions?
>
> > 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority
> > for all and grant them to pg_monitor or similar? Especially since this
> > function loops over all the relations in a database, we may not want
> > everyone to be able to do this.
>
> I think you're correct there. While the data is not sensitive, it
> should have more controlled usage. It's only taking an AccessShareLock,
> but you would not want anyone to be able to run this since it's
> doing real computation. I think requiring pg_read_all_stats
> is a good idea. Will do.

+1 for pg_read_all_stats.

> > Can we have the per-relation prioritization computation function in C
> > and provide a per-database computation function as a SQL function over
> > this per-relation function in system_functions.sql?
>
> Yes, perhaps we should do this. So we can have a function called
> pg_stat_get_autovacuum_priority() that either takes a NULL or an OID
> to either return all the tables or just a single table.
> This is a similar usage pattern as pg_stat_get_subscription or
> pg_stat_get_activity.
>
> pg_stat_autovacuum_priority will be a view that wraps around the NULL
> variant of the function.
>
> The case where the OID is passed we just do a SearchSysCache1(RELOID,...)
> whereas the other case will do the full catalog scan.
>
> What do you think?

IMHO, we can have pg_stat_get_relation_autovacuum_priority defined as
a C function to give the autovacuum scoring as of the given moment for
the given table. It's easy for one to write a function to get scoring
for all the relations in a database. This keeps things simple yet
useful.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com



Re: Add pg_stat_autovacuum_priority

От
Robert Treat
Дата:
On Sun, Mar 29, 2026 at 10:09 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> On Sat, Mar 28, 2026 at 10:54 AM Sami Imseih <samimseih@gmail.com> wrote:
> >
> > > 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions?
> >
> > > 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority
> > > for all and grant them to pg_monitor or similar? Especially since this
> > > function loops over all the relations in a database, we may not want
> > > everyone to be able to do this.
> >
> > I think you're correct there. While the data is not sensitive, it
> > should have more controlled usage. It's only taking an AccessShareLock,
> > but you would not want anyone to be able to run this since it's
> > doing real computation. I think requiring pg_read_all_stats
> > is a good idea. Will do.
>
> +1 for pg_read_all_stats.
>

Is there a gap here where someone may have been granted MAINTAIN on a
relation but they do not have pg_read_all_stats?

> > > Can we have the per-relation prioritization computation function in C
> > > and provide a per-database computation function as a SQL function over
> > > this per-relation function in system_functions.sql?
> >
> > Yes, perhaps we should do this. So we can have a function called
> > pg_stat_get_autovacuum_priority() that either takes a NULL or an OID
> > to either return all the tables or just a single table.
> > This is a similar usage pattern as pg_stat_get_subscription or
> > pg_stat_get_activity.
> >
> > pg_stat_autovacuum_priority will be a view that wraps around the NULL
> > variant of the function.
> >
> > The case where the OID is passed we just do a SearchSysCache1(RELOID,...)
> > whereas the other case will do the full catalog scan.
> >
> > What do you think?
>
> IMHO, we can have pg_stat_get_relation_autovacuum_priority defined as
> a C function to give the autovacuum scoring as of the given moment for
> the given table. It's easy for one to write a function to get scoring
> for all the relations in a database. This keeps things simple yet
> useful.
>

I don't have a strong opinion on the above, but I do suspect that the
most common way people will interact with this is by querying against
the view with a WHERE clause, so optimizing for that case seems
important.


Robert Treat
https://xzilla.net



Re: Add pg_stat_autovacuum_priority

От
Álvaro Herrera
Дата:
On 2026-Mar-27, Sami Imseih wrote:

> 0002 introduces the view with documentation and testing in
> vacuum.c (xid age and mxid age scores are not tested as
> they require xid_wraparound to consume enough XIDs to
> trigger a score, which will cost too much time for a
> regression test).

After this patch, there are three copies of the combined call to
pgstat_fetch_stat_tabentry_ext() and relation_needs_vacanalyze().  Can
we avoid that, ideally reducing to only one such?  Or, at least, it
looks like recheck_relation_needs_vacanalyze() may no longer need to
exist at all.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Ed is the standard text editor."
      http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3



Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> On Sun, Mar 29, 2026 at 10:09 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> > On Sat, Mar 28, 2026 at 10:54 AM Sami Imseih <samimseih@gmail.com> wrote:
> > >
> > > > 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions?
> > >
> > > > 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority
> > > > for all and grant them to pg_monitor or similar? Especially since this
> > > > function loops over all the relations in a database, we may not want
> > > > everyone to be able to do this.
> > >
> > > I think you're correct there. While the data is not sensitive, it
> > > should have more controlled usage. It's only taking an AccessShareLock,
> > > but you would not want anyone to be able to run this since it's
> > > doing real computation. I think requiring pg_read_all_stats
> > > is a good idea. Will do.
> >
> > +1 for pg_read_all_stats.
> >
>
> Is there a gap here where someone may have been granted MAINTAIN on a
> relation but they do not have pg_read_all_stats?

Yes, that is possible. MAINTAIN is a per-object privilege granted on a relation,
whereas pg_read_all_stats is a global role membership. They operate at
different levels.

I don't think one needs to have MAINTAIN permissions on the table to see the
autovacuum score. DBA Monitoring users are usually separate from the DBA
operational users.

I think pg_read_all_stats is the right permission here and it should
be implemented
similar to how pg_get_shmem_allocations is done where the default permissions
are pg_read_all_stats. pg_monitor inherits pg_read_all_stats so any
user with this
privilege will be able to access this view. A DBA is free to also add
privileges to
to other users if they wish.

This is unlike other pg_stat_* views that have tuple level permission
checks ( i.e.
pg_stat_activity), but in those cases the permissions are needed to
hide sensitive data.
This is not the case here.

> > IMHO, we can have pg_stat_get_relation_autovacuum_priority defined as
> > a C function to give the autovacuum scoring as of the given moment for
> > the given table. It's easy for one to write a function to get scoring
> > for all the relations in a database. This keeps things simple yet
> > useful.
> >
>
> I don't have a strong opinion on the above, but I do suspect that the
> most common way people will interact with this is by querying against
> the view with a WHERE clause, so optimizing for that case seems
> important.

Yeah, after sleeping on it I actually think the most common case will likely be
ORDER BY score DESC LIMIT ... because you usually want to see where your
table priority is relative to everything else in the database.
For the rare case where someone wants to look up an individual table, the caller
can just use a WHERE clause. So, we should just always do the full pg_class
scan. I don't see why we need to complicate the c-function more than this.

Attached v2 implements it as above.

A few other things in v2:

1/ I set autovacuum_enabled = OFF in the tests. This will make
sure the test is both stable and will also test that the score is
returned even in
the case where autovacuum is disabled.

2/ Moved pg_stat_autovacuum_priority to the end of the docs in
"Monitoring Database Activity".

3/ Also added a mention of the extremely high score values when failsafe
is triggered [1]

[1] https://www.postgresql.org/message-id/abGP87A3JPIXDG2I%40nathan

--
Sami

Вложения

Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> > 0002 introduces the view with documentation and testing in
> > vacuum.c (xid age and mxid age scores are not tested as
> > they require xid_wraparound to consume enough XIDs to
> > trigger a score, which will cost too much time for a
> > regression test).
>
> After this patch, there are three copies of the combined call to
> pgstat_fetch_stat_tabentry_ext() and relation_needs_vacanalyze().  Can
> we avoid that, ideally reducing to only one such?  Or, at least, it
> looks like recheck_relation_needs_vacanalyze() may no longer need to
> exist at all.

recheck_relation_needs_vacanalyze() can be removed and replace
with compute_autovac_score(). I was trying to minimize refactoring,
but this seems worthwhile to do.

--
Sami



Re: Add pg_stat_autovacuum_priority

От
Robert Treat
Дата:
On Mon, Mar 30, 2026 at 11:17 AM Sami Imseih <samimseih@gmail.com> wrote:
> > On Sun, Mar 29, 2026 at 10:09 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > On Sat, Mar 28, 2026 at 10:54 AM Sami Imseih <samimseih@gmail.com> wrote:
> > > >
> > > > > 4. Is the view intended to be exposed to PUBLIC without any ACL restrictions?
> > > >
> > > > > 2/ Do we need to revoke permissions on pg_stat_get_autovacuum_priority
> > > > > for all and grant them to pg_monitor or similar? Especially since this
> > > > > function loops over all the relations in a database, we may not want
> > > > > everyone to be able to do this.
> > > >
> > > > I think you're correct there. While the data is not sensitive, it
> > > > should have more controlled usage. It's only taking an AccessShareLock,
> > > > but you would not want anyone to be able to run this since it's
> > > > doing real computation. I think requiring pg_read_all_stats
> > > > is a good idea. Will do.
> > >
> > > +1 for pg_read_all_stats.
> > >
> >
> > Is there a gap here where someone may have been granted MAINTAIN on a
> > relation but they do not have pg_read_all_stats?
>
> Yes, that is possible. MAINTAIN is a per-object privilege granted on a relation,
> whereas pg_read_all_stats is a global role membership. They operate at
> different levels.
>
> I don't think one needs to have MAINTAIN permissions on the table to see the
> autovacuum score. DBA Monitoring users are usually separate from the DBA
> operational users.
>
> I think pg_read_all_stats is the right permission here and it should
> be implemented
> similar to how pg_get_shmem_allocations is done where the default permissions
> are pg_read_all_stats. pg_monitor inherits pg_read_all_stats so any
> user with this
> privilege will be able to access this view. A DBA is free to also add
> privileges to
> to other users if they wish.
>
> This is unlike other pg_stat_* views that have tuple level permission
> checks ( i.e.
> pg_stat_activity), but in those cases the permissions are needed to
> hide sensitive data.
> This is not the case here.
>

I don't think we are in disagreement here, I was just thinking about
it the other way round; someone might have MAINTAIN privileges on a
table and want to see what the relevant "autovacuum score" is before
taking action. If the solution for that is to give those roles
pg_read_all_stats, I guess that's ok, but there was probably a reason
the permissions were limited in the first place. *shrug*

> > > IMHO, we can have pg_stat_get_relation_autovacuum_priority defined as
> > > a C function to give the autovacuum scoring as of the given moment for
> > > the given table. It's easy for one to write a function to get scoring
> > > for all the relations in a database. This keeps things simple yet
> > > useful.
> > >
> >
> > I don't have a strong opinion on the above, but I do suspect that the
> > most common way people will interact with this is by querying against
> > the view with a WHERE clause, so optimizing for that case seems
> > important.
>
> Yeah, after sleeping on it I actually think the most common case will likely be
> ORDER BY score DESC LIMIT ... because you usually want to see where your
> table priority is relative to everything else in the database.
> For the rare case where someone wants to look up an individual table, the caller
> can just use a WHERE clause. So, we should just always do the full pg_class
> scan. I don't see why we need to complicate the c-function more than this.
>

I think we are also in agreement here, although based on my
experience, filtering out things like system and toast tables will be
common, but I don't see that changing what you said above.  On a
similar note, +1 to your changes in v2.


Robert Treat
https://xzilla.net



Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> I think we are also in agreement here, although based on my
> experience, filtering out things like system and toast tables will be
> common, but I don't see that changing what you said above.  On a
> similar note, +1 to your changes in v2.

Thanks!

v3 now includes the refactoring [1] suggestion brought up by Alvarro


[1] [https://www.postgresql.org/message-id/202603301508.up22nvhgnnoj%40alvherre.pgsql]

--
Sami

Вложения

Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
I didn't read any of the thread, but I looked at the patches.

0001:

+ * force_scores set to true forces the computation of a score. This is useful for
+ * tools that wish to inspect scores outside of the do_vacuum() path.

I'm of two minds about this new function parameter.  On one hand, I see the
utility of forcing score calculations even when autovacuum is disabled.  On
the other hand, when autovacuum is disabled, the scores are actually 0.0,
and it's probably a good idea to report exactly what autovacuum workers
see.  I also see that we're not forcing the computation of the (M)XID
scores.  Is that intentional?

I wonder if we can rework this function to always calculate the scores,
even if autovacuum is disabled or !force_vacuum.  This way, both paths are
doing the exact same thing and reporting the same scores.  We might still
want an extra parameter to avoid DEBUG3 in the system view path, but that
seems like a reasonable difference between the two.

0002:

Seems okay to me.

0003:

+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>vacuum_dead_score</structfield> <type>double precision</type>
+      </para>
+      <para>
+       Score component based on the estimated number of dead tuples
+       needing removal by vacuum.
+      </para></entry>
+     </row>

I think we should make sure the column names align with the names given to
the new parameters [0] and the new "Autovacuum Prioritization" section in
the docs [1].

[0] https://www.postgresql.org/docs/devel/runtime-config-vacuum.html#GUC-AUTOVACUUM-VACUUM-SCORE-WEIGHT
[1] https://www.postgresql.org/docs/devel/routine-vacuuming.html#AUTOVACUUM-PRIORITY

-- 
nathan



Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> + * force_scores set to true forces the computation of a score. This is useful for
> + * tools that wish to inspect scores outside of the do_vacuum() path.
>
> I'm of two minds about this new function parameter.  On one hand, I see the
> utility of forcing score calculations even when autovacuum is disabled.  On
> the other hand, when autovacuum is disabled, the scores are actually 0.0,
> and it's probably a good idea to report exactly what autovacuum workers
> see.

I went back and forth on this. Showing 0.0 when autovacuum is disabled
would reflect what autovacuum workers actually see, but I think the more
useful behavior is to always compute the score based on the table's actual
state. This way, a DBA who has disabled autovacuum on a table can still
see that its score is climbing and needs attention. The view shows need,
not eligibility. This will also make the view more useful for maintenance
jobs that wish to supplement autovacuum by looking at high scores
and triggering a manual vacuum for those tables.

> I also see that we're not forcing the computation of the (M)XID
> scores.  Is that intentional?

hmm, the force_score does not need to be in the force_vacuum path
because the score is calculated there naturally when the table is in
need of force_vacuum. The force_score is there to ensure that
we are not existing early in the autovacuum disabled case.

> I wonder if we can rework this function to always calculate the scores,
> even if autovacuum is disabled or !force_vacuum.  This way, both paths are
> doing the exact same thing and reporting the same scores.

I prefer that we still calculate the score as if autovacuum is enabled
for the reason above. I do think one potential middle ground is to have
needs_analyze, needs_vacuum, eligible_analyze, eligible_vacuum
fields to differentiate. I just rather not hide a score because a/v
is disabled on a table.

> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>vacuum_dead_score</structfield> <type>double precision</type>
> +      </para>
> +      <para>
> +       Score component based on the estimated number of dead tuples
> +       needing removal by vacuum.
> +      </para></entry>
> +     </row>
>
> I think we should make sure the column names align with the names given to
> the new parameters [0] and the new "Autovacuum Prioritization" section in
> the docs [1].

I will look into this in the next rev.

--
Sami



Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
On Tue, Mar 31, 2026 at 11:15:35AM -0500, Sami Imseih wrote:
>> + * force_scores set to true forces the computation of a score. This is useful for
>> + * tools that wish to inspect scores outside of the do_vacuum() path.
>>
>> I'm of two minds about this new function parameter.  On one hand, I see the
>> utility of forcing score calculations even when autovacuum is disabled.  On
>> the other hand, when autovacuum is disabled, the scores are actually 0.0,
>> and it's probably a good idea to report exactly what autovacuum workers
>> see.
> 
> I went back and forth on this. Showing 0.0 when autovacuum is disabled
> would reflect what autovacuum workers actually see, but I think the more
> useful behavior is to always compute the score based on the table's actual
> state. This way, a DBA who has disabled autovacuum on a table can still
> see that its score is climbing and needs attention. The view shows need,
> not eligibility. This will also make the view more useful for maintenance
> jobs that wish to supplement autovacuum by looking at high scores
> and triggering a manual vacuum for those tables.

That's a fair point.

>> I also see that we're not forcing the computation of the (M)XID
>> scores.  Is that intentional?
> 
> hmm, the force_score does not need to be in the force_vacuum path
> because the score is calculated there naturally when the table is in
> need of force_vacuum. The force_score is there to ensure that
> we are not existing early in the autovacuum disabled case.

So, unless the table is beyond a freeze-max-age parameter, the (M)XID
scores will always be 0.0?

>> I wonder if we can rework this function to always calculate the scores,
>> even if autovacuum is disabled or !force_vacuum.  This way, both paths are
>> doing the exact same thing and reporting the same scores.
> 
> I prefer that we still calculate the score as if autovacuum is enabled
> for the reason above. I do think one potential middle ground is to have
> needs_analyze, needs_vacuum, eligible_analyze, eligible_vacuum
> fields to differentiate. I just rather not hide a score because a/v
> is disabled on a table.

My point is that instead of introducing a parameter to force score
computations, we could just _always_ do that in this function.  IOW maybe
we could use this as an opportunity to simplify the function while also
preparing it for the system view.

-- 
nathan



Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> >> I also see that we're not forcing the computation of the (M)XID
> >> scores.  Is that intentional?
> >
> > hmm, the force_score does not need to be in the force_vacuum path
> > because the score is calculated there naturally when the table is in
> > need of force_vacuum. The force_score is there to ensure that
> > we are not existing early in the autovacuum disabled case.
>
> So, unless the table is beyond a freeze-max-age parameter, the (M)XID
> scores will always be 0.0?

You're right after thinking about this again. There is no reason why we
should treat the force_vacuum case differently. The score should still
be included so someone monitoring can see the xid|mxid_age climbing
well before it becomes an issue.

--
Sami



Re: Add pg_stat_autovacuum_priority

От
Bharath Rupireddy
Дата:
Hi,

On Mon, Mar 30, 2026 at 11:16 AM Sami Imseih <samimseih@gmail.com> wrote:
>
> > I think we are also in agreement here, although based on my
> > experience, filtering out things like system and toast tables will be
> > common, but I don't see that changing what you said above.  On a
> > similar note, +1 to your changes in v2.
>
> Thanks!
>
> v3 now includes the refactoring [1] suggestion brought up by Alvarro

Thank you for sending the latest patches. Here are some comments:

1/ + while ((classTup = heap_getnext(relScan, ForwardScanDirection)) != NULL)

Missing check_for_interrupts call while scanning the pg_class system catalog.

2/
+ avopts = extract_autovac_opts(classTup, pg_class_desc);
+
+ compute_autovac_score(classTup, pg_class_desc,
+   effective_multixact_freeze_max_age, avopts,
+   true, &dovacuum, &doanalyze,
+   &wraparound, &scores);
+
+ if (avopts)
+ pfree(avopts);
+

When a database has a large number of tables (which is quite common in
production scenarios), I expect the costs of palloc and pfree being
used for fetching autovacuum relopts would make this function slower.
Can we invent a new function or pass a caller-allocated AutoVacOpts
memory to just copy the relopts and use that in this tight loop when
scanning for all the relations?

3/
+ values[8] = Float8GetDatum(scores.vac_ins);
+ values[9] = Float8GetDatum(scores.anl);

Nit: It's a matter of taste. How about using something like below
instead of hardcoded column numbers? I expect this view to grow in the
future, so it helps to keep things simple.

values[i++] = Float8GetDatum(scores.anl);
Assert(i == NUM_AV_SCORE_COLS);

4/
+     The <link linkend="monitoring-pg-stat-autovacuum-priority-view">
+     <structname>pg_stat_autovacuum_priority</structname></link> view can be
+     used to inspect each table's autovacuum need and priority score.

How about adding "as of the moment" to convey that it doesn't report
what currently running autovacuum or pending autovacuum would
consider?

5/ Also, can we add a simple paragraph on how to interpret and take
actions based on the scores reported (like prioritizing one table over
the other - adjust these parameters in the table's relopts or
something like that - no need to cover all the possible cases, but
just one example would be sufficient for the user to understand)?

6/ +  descr => 'statistics: autovacuum priority scores for all relations',

s/"for all relations"/"for all relations in the current database"

7/ Addition of force_scores to relation_needs_vacanalyze makes the
code unreadable (IMO) with a lot of if-else branching. Why not make
force_vacuum an option and pass it as true from the stats function and
leave a note in the function comment on when to use this parameter?
Would something like that work? Also, when autovacuum is disabled
(either via GUC or via relopts), we don't want to calculate and report
any scores. IMHO, this keeps things simple and code readable.

--
Bharath Rupireddy
Amazon Web Services: https://aws.amazon.com



Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
On Tue, Mar 31, 2026 at 11:09:43AM -0700, Bharath Rupireddy wrote:
> 1/ + while ((classTup = heap_getnext(relScan, ForwardScanDirection)) != NULL)
> 
> Missing check_for_interrupts call while scanning the pg_class system catalog.

From a glance I don't see one in the scanning code in do_autovacuum(),
either.  I'm not sure we need to be worried about this.

> + avopts = extract_autovac_opts(classTup, pg_class_desc);
> +
> + compute_autovac_score(classTup, pg_class_desc,
> +   effective_multixact_freeze_max_age, avopts,
> +   true, &dovacuum, &doanalyze,
> +   &wraparound, &scores);
> +
> + if (avopts)
> + pfree(avopts);
> 
> When a database has a large number of tables (which is quite common in
> production scenarios), I expect the costs of palloc and pfree being
> used for fetching autovacuum relopts would make this function slower.
> Can we invent a new function or pass a caller-allocated AutoVacOpts
> memory to just copy the relopts and use that in this tight loop when
> scanning for all the relations?

Before making this code more complicated, I think we ought to demonstrate
there's an actual problem or slowness.  I am quite dubious we need to do
anything here.

> + values[8] = Float8GetDatum(scores.vac_ins);
> + values[9] = Float8GetDatum(scores.anl);
> 
> Nit: It's a matter of taste. How about using something like below
> instead of hardcoded column numbers? I expect this view to grow in the
> future, so it helps to keep things simple.
> 
> values[i++] = Float8GetDatum(scores.anl);
> Assert(i == NUM_AV_SCORE_COLS);

I don't think either way is substantially better.

> +     The <link linkend="monitoring-pg-stat-autovacuum-priority-view">
> +     <structname>pg_stat_autovacuum_priority</structname></link> view can be
> +     used to inspect each table's autovacuum need and priority score.
> 
> How about adding "as of the moment" to convey that it doesn't report
> what currently running autovacuum or pending autovacuum would
> consider?

I don't think "as of the moment" adds any clarity about that.  If we did
want to add something along those lines, I'd add a separate sentence that
says that it doesn't report the values of current autovacuum workers and is
freshly calculated by the current backend (or something like that).

-- 
nathan



Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> > 1/ + while ((classTup = heap_getnext(relScan, ForwardScanDirection)) != NULL)
> >
> > Missing check_for_interrupts call while scanning the pg_class system catalog.
>
> From a glance I don't see one in the scanning code in do_autovacuum(),
> either.  I'm not sure we need to be worried about this.

Yes, I will leave it out. Even for catalogs with thousands of tables, I don't
foresee this being an issue.

> > + values[8] = Float8GetDatum(scores.vac_ins);
> > + values[9] = Float8GetDatum(scores.anl);
> >
> > Nit: It's a matter of taste. How about using something like below
> > instead of hardcoded column numbers? I expect this view to grow in the
> > future, so it helps to keep things simple.
> >
> > values[i++] = Float8GetDatum(scores.anl);
> > Assert(i == NUM_AV_SCORE_COLS);
>
> I don't think either way is substantially better.

I agree. using numbers is more readable IMO.

> > +     <row>
> > +      <entry role="catalog_table_entry"><para role="column_definition">
> > +       <structfield>vacuum_dead_score</structfield> <type>double precision</type>
> > +      </para>
> > +      <para>
> > +       Score component based on the estimated number of dead tuples
> > +       needing removal by vacuum.
> > +      </para></entry>
> > +     </row>
> >
> > I think we should make sure the column names align with the names given to
> > the new parameters [0] and the new "Autovacuum Prioritization" section in
> > the docs [1].

> I will look into this in the next rev.

The field names now match the GUC names (without the _weight). We might as well
also make this name change in the AutoVacuumScore struct.

I attached v4 which includes 4 patches in the set to address earlier comments:

0001:

This modifies relation_needs_vacanalyze() to always compute the score so
the monitoring view can report the score regardless if autovacuum is enabled
on the table. AutoVacuumScores is renamed to AutoVacuumPriority as it
now also tracks needs_analyze, needs_vacuum and is_wraparound which
are different from the dovacuum, doanalyze and wraparound output parameters
which are acted on by autovacuum. This is a clean separation of reporting
the scores/needs for autovacuum/analyze and the eligibility for autovacuum
which is based on the state of autovacuum being enabled and set on the table
level.

0002:

relation_needs_vacanalyze() takes a new elevel argument to control logging.
Callers pass DEBUG3 for autovacuum or 0 to suppress logging.

0003:

Alvaro's earlier suggestion to factor out recheck_relation_needs_vacanalyze()
and introduce compute_autovac_score()

0004:

The view implementation with field names that better match the GUCs as suggested
by Nathan in an earlier comment.

--
Sami

Вложения

Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
0001:

-                                      AutoVacuumScores *scores);
+                                      AutoVacuumPriority *priority);

IMHO we need to minimize these kinds of extraneous changes in this patch
set.  AutoVacuumScores still seems accurate enough, even when the struct
contains some extra bool members.

- * A table whose autovacuum_enabled option is false is
- * automatically skipped (unless we have to vacuum it due to freeze_max_age).
- * Thus autovacuum can be disabled for specific tables. Also, when the cumulative
- * stats system does not have data about a table, it will be skipped.
+ * A table whose autovacuum_enabled option is false is automatically skipped
+ * by autovacuum (unless we have to vacuum it due to freeze_max_age),
+ * but scores are still computed.  Also, when the cumulative stats system does
+ * not have data about a table, threshold-based scores will be zero.

I don't think we need to update this comment.

- * One exception to the previous paragraph is for tables nearing wraparound,
- * i.e., those that have surpassed the effective failsafe ages.  In that case,
- * the relfrozen/relminmxid-based score is scaled aggressively so that the
- * table has a decent chance of sorting to the front of the list.
+ * Furthermore, for tables nearing wraparound, i.e., those that have surpassed
+ * the effective failsafe ages, the relfrozen/relminmxid-based score is scaled
+ * aggressively so that the table has a decent chance of sorting to the front
+ * of the list.

Or this one.

+ * Priority scores are always computed.  dovacuum and doanalyze are only set when
+ * autovacuum is active and enabled for the relation.

I think we should more explicitly state that while scores->needs_vacuum and
friends are always set regardless of whether autovacuum is enabled, the
return parameters dovacuum, etc., are not.  Or perhaps we should return
whether autovacuum is enabled in the struct and consolidate the return
parameters and the struct members.  WDYT?

0002:

Seems fine.

0003:

Seems fine.

0004:

+    FROM pg_stat_get_autovacuum_priority() S
+         JOIN pg_class C ON C.oid = S.relid
+         LEFT JOIN pg_namespace N ON N.oid = C.relnamespace;

What do you think about ordering by score so this view automatically shows
the tables most in need of vacuuming/analyzing first?

-- 
nathan



Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> 0001:
>
> -                                      AutoVacuumScores *scores);
> +                                      AutoVacuumPriority *priority);
>
> IMHO we need to minimize these kinds of extraneous changes in this patch
> set.  AutoVacuumScores still seems accurate enough, even when the struct
> contains some extra bool members.
>
> - * A table whose autovacuum_enabled option is false is
> - * automatically skipped (unless we have to vacuum it due to freeze_max_age).
> - * Thus autovacuum can be disabled for specific tables. Also, when the cumulative
> - * stats system does not have data about a table, it will be skipped.
> + * A table whose autovacuum_enabled option is false is automatically skipped
> + * by autovacuum (unless we have to vacuum it due to freeze_max_age),
> + * but scores are still computed.  Also, when the cumulative stats system does
> + * not have data about a table, threshold-based scores will be zero.
>
> I don't think we need to update this comment.
>
> - * One exception to the previous paragraph is for tables nearing wraparound,
> - * i.e., those that have surpassed the effective failsafe ages.  In that case,
> - * the relfrozen/relminmxid-based score is scaled aggressively so that the
> - * table has a decent chance of sorting to the front of the list.
> + * Furthermore, for tables nearing wraparound, i.e., those that have surpassed
> + * the effective failsafe ages, the relfrozen/relminmxid-based score is scaled
> + * aggressively so that the table has a decent chance of sorting to the front
> + * of the list.
>
> Or this one.

Fixed both.

> + * Priority scores are always computed.  dovacuum and doanalyze are only set when
> + * autovacuum is active and enabled for the relation.
>
> I think we should more explicitly state that while scores->needs_vacuum and
> friends are always set regardless of whether autovacuum is enabled, the
> return parameters dovacuum, etc., are not.

I made this more explicit by saying " All fields in AutoVacuumScores
are always computed
regardless of autovacuum settings...." I think that is clear enough.

> Or perhaps we should return
> whether autovacuum is enabled in the struct and consolidate the return
> parameters and the struct members.  WDYT?

dovacuum, doanalyze will be unused by callers like the sql function, and putting
them in the struct could make this cleaner, but I don't think it's
worth it to blur
the purpose of the struct. I rather keep it just for score computation purposes.

> 0002:
>
> Seems fine.

I found a bug in my v4 that I fixed.

+        if (elevel > 0 && vac_ins_base_thresh >= 0)

is wrong. It should be:

if (elevel > 0)
{
      if (vac_ins_base_thresh >= 0)


> 0004:
>
> +    FROM pg_stat_get_autovacuum_priority() S
> +         JOIN pg_class C ON C.oid = S.relid
> +         LEFT JOIN pg_namespace N ON N.oid = C.relnamespace;
>
> What do you think about ordering by score so this view automatically shows
> the tables most in need of vacuuming/analyzing first?

I thought about that initially, but we don't really have an example where the
data is ordered in a stats view ( or any other catalog view ), and I prefer not
to impose that on the user automatically.

See v5.

--
Sami

Вложения

Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
v6 attached addresses a silly bug with v5-0001 that Nathan pointed out
offline. v5 was not computing the scores for vacuum,vacuum insert
and analyze thresholds for the !wraparound case.

--
Sami

Вложения

Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
+    bool        needs_vacuum;   /* threshold exceeded for vacuum */
+    bool        needs_analyze;  /* threshold exceeded for analyze */
+    bool        is_wraparound;  /* at risk of XID/MXID wraparound */

I've been thinking about whether to also return whether autovacuum is
enabled in the view, i.e., AutoVacuumingActive() && av_enabled.  My
instinct was that would help explain why autovacuum isn't picking up tables
that are eligible for autovacuum based on the view.  One wrinkle is that
it'd report false even when autovacuum is working on a table for wraparound
prevention.  I'm also not especially excited about further complicating
this stuff for folks who disable autovacuum.  Furthermore, the value of the
reloptions and autovacuum GUC are discoverable elsewhere.  So, I'm
currently leaning towards leaving that information out for now.

+        if (vactuples > vacthresh)
+        {
+            scores->needs_vacuum = true;
+            if (av_enabled && AutoVacuumingActive())
                 *dovacuum = true;
         }

nitpick:  We might be able to simplify this a bit by 1) storing "av_enabled
&& AutoVacuumingActive()" in a variable and 2) reworking the code to look
more like this:

    scores->needs_vacuum = (vactuples > vacthresh);
    *do_vacuum |= (av_enabled && scores->needs_vacuum);

... but others might find your version easier to read.

Otherwise, 0001 looks good.

In 0003, I think you missed renaming the last argument to
compute_autovac_score() in table_recheck_autovac().

I didn't see anything else in this read-through.  I'm planning to start
preparing this for commit tomorrow.

-- 
nathan



Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> I've been thinking about whether to also return whether autovacuum is
> enabled in the view, i.e., AutoVacuumingActive() && av_enabled.

I don't think we can rely on AutoVacuumingActive() being stable since a
backend that does a SET track_counts = off for whatever reason and
then calls the view will get false. av_enabled will likely be the only
thing we can report.

> So, I'm
> currently leaning towards leaving that information out for now.

I agree.

>
>         scores->needs_vacuum = (vactuples > vacthresh);
>         *do_vacuum |= (av_enabled && scores->needs_vacuum);
>
> ... but others might find your version easier to read.

yeah, for readability, I'll stick with the current.

> Otherwise, 0001 looks good.
>
> In 0003, I think you missed renaming the last argument to
> compute_autovac_score() in table_recheck_autovac().

Earlier, I did not compile with 0003 only. Fixed.

> I didn't see anything else in this read-through.  I'm planning to start
> preparing this for commit tomorrow.

Thanks! here is v7

--
Sami

Вложения

Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
On Wed, Apr 01, 2026 at 06:04:36PM -0500, Sami Imseih wrote:
> Thanks! here is v7

Alright, I've been preparing these for commit.  Most changes are cosmetic,
but there are a couple of bigger ones I should note:

* I added a prerequisite patch for relation_needs_vacanalyze() that saves a
level of indentation on a chunk of code.  This simplifies 0001 (now 0002) a
bit.

* I noticed that if autovacuum decides to force a vacuum for
anti-wraparound purposes, it might also decide to analyze the table even if
autovacuum is disabled for it.  AFAICT this is accidental, but since it's
behaved this way since commit 48188e1621 (2006) [0], I am slightly worried
that this bug may have become a feature.  In 0002, I separated this edge
case in the code and added a comment, and I intend to start a new thread
about removing it.

* I removed the booleans in the view in favor of just noting that scores >=
1.0 means the table needs processing.  IMHO trying to distinguish
needs_vacuum from do_vacuum is just going to confuse folks more than
anything, and IIUC this information is redundant with "score >= 1.0",
anyway.

* I renamed the view to pg_autovacuum_scores.  While some of the
information in the view depends on cumulative statistics, not all of it
does, and what does is quite heavily modified from the original stats.  So,
I didn't think the pg_stat_* prefix was appropriate, although I can see how
reasonable people might disagree.

* I considered whether to make the backing function per-table and
ultimately decided against it.  The initialization logic is a bit
expensive, and I assume most folks will be interested in the full picture
of the current database.  Maybe we could add a per-table function down the
road, but I don't see any strong need for that for now.

I'm planning to commit 0001-0004 this afternoon, assuming cfbot is happy.
I'm hoping to commit 0005 on Monday or Tuesday.  Please take a look at v8
if you have time.

[0] https://postgr.es/m/23710.1162661716%40sss.pgh.pa.us

-- 
nathan

Вложения

Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> Alright, I've been preparing these for commit.  Most changes are cosmetic,
> but there are a couple of bigger ones I should note:

Thanks!

> * I added a prerequisite patch for relation_needs_vacanalyze() that saves a
> level of indentation on a chunk of code.  This simplifies 0001 (now 0002) a
> bit.

I like this this.

> * I noticed that if autovacuum decides to force a vacuum for
> anti-wraparound purposes, it might also decide to analyze the table even if
> autovacuum is disabled for it.  AFAICT this is accidental, but since it's
> behaved this way since commit 48188e1621 (2006) [0], I am slightly worried
> that this bug may have become a feature.  In 0002, I separated this edge
> case in the code and added a comment, and I intend to start a new thread
> about removing it.

hmm yeah, I think this just needs to be documented clearly. I always
thought it was expected for auto-analyze to run in this case, and I don't
see why it shouldn't. If this needs to be clarified in docs, we should
do that in a separate discussion.

> * I removed the booleans in the view in favor of just noting that scores >=
> 1.0 means the table needs processing.  IMHO trying to distinguish
> needs_vacuum from do_vacuum is just going to confuse folks more than
> anything, and IIUC this information is redundant with "score >= 1.0",
> anyway.

That's fine by me.

> * I renamed the view to pg_autovacuum_scores.  While some of the
> information in the view depends on cumulative statistics, not all of it
> does, and what does is quite heavily modified from the original stats.  So,
> I didn't think the pg_stat_* prefix was appropriate, although I can see how
> reasonable people might disagree.

Initially I thought about moving this away from the cumulative stats section,
but this view does need to lookup relation stats and if relation stats
are reset,
the same rules will apply to this view.

Also not all views under "cumulative stats" are necessarily cumulative.
Some just show real-time data; pg_stat_activity, pg_stat_progress_*, etc.

This view does not have precedent in the type of work it does, but I do
really think it belongs under pg_stat_*, and not be too far away conceptually
from the vacuum stats in pg_stat_all_tables.

> * I considered whether to make the backing function per-table and
> ultimately decided against it.  The initialization logic is a bit
> expensive, and I assume most folks will be interested in the full picture
> of the current database.  Maybe we could add a per-table function down the
> road, but I don't see any strong need for that for now.

Yes, I did not proceed with this since the common use-case will be comparing
tables in a broader context. I don't see a string single table use-case at this
point.

Besides the above, I have one comment on 0005:

Where it says "indicate the table needs analyzing" or "needs processing"
or "needs vacuuming", we should instead say "may need". Since the
actually processing depends on the thresholds or force vacuum conditions,
but no need to go into that level of detail in the row descriptions. That is
all explained in the existing autovacuum prioritization docs.

--
Sami Imseih
Amazon Web Services (AWS)



Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
I committed 0001-0003.  Here is a new version of the last two patches.
Some notes:

* Instead of renaming recheck_relation_needs_vacanalyze(), I followed
Álvaro's suggestion to remove that function and instead fetch the stats
within relation_needs_vacanalyze() itself.

* Per your feedback, I renamed the view back to pg_stat_autovacuum_scores.

* Instead of limiting the view to pg_read_all_stats, I've left it
accessible to all users.  AFAICT there's nothing sensitive here.

* I reworked the column descriptions a bit to make it clear that values >=
1.0 mean autovacuum will process it (except if autovacuum is disabled).

WDYT?

-- 
nathan

Вложения

Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> WDYT?

"Scores greater than or equal to <literal>1.0</literal>" in the comments
of each field are misleading. This conflates scoring with vacuum/analyze
eligibility and it's possible with a autovacuum_*_weight < 1.0 to trigger an
autovacuum/analyze.

I suggest the attached changes on top of v9.

--
Sami

Вложения

Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
On Sat, Apr 04, 2026 at 08:25:26AM -0500, Sami Imseih wrote:
> "Scores greater than or equal to <literal>1.0</literal>" in the comments
> of each field are misleading. This conflates scoring with vacuum/analyze
> eligibility and it's possible with a autovacuum_*_weight < 1.0 to trigger an
> autovacuum/analyze.

Ah, that's unfortunate.  I think it'd be good to give folks some idea of
what autovacuum will actually process.  I wonder if we could adjust the
documentation accordingly.

-- 
nathan



Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> On Sat, Apr 04, 2026 at 08:25:26AM -0500, Sami Imseih wrote:
> > "Scores greater than or equal to <literal>1.0</literal>" in the comments
> > of each field are misleading. This conflates scoring with vacuum/analyze
> > eligibility and it's possible with a autovacuum_*_weight < 1.0 to trigger an
> > autovacuum/analyze.
>
> Ah, that's unfortunate.  I think it'd be good to give folks some idea of
> what autovacuum will actually process.  I wonder if we could adjust the
> documentation accordingly.

That's why I thought having the bool fields made sense in the earlier
versions of the view. Since autovacuum is dealing with 2 concepts:

eligibility: is av enabled and is the table meeting thresholds
score: The priority of how the eligible tables will be processed.

So, while this could be explained in docs, I think it's better we report
these fields.

We might as well just call the view pg_stat_autovacuum in that case.

What do you think?

--
Sami



Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
On Sat, Apr 04, 2026 at 10:32:07AM -0500, Sami Imseih wrote:
>> On Sat, Apr 04, 2026 at 08:25:26AM -0500, Sami Imseih wrote:
>> > "Scores greater than or equal to <literal>1.0</literal>" in the comments
>> > of each field are misleading. This conflates scoring with vacuum/analyze
>> > eligibility and it's possible with a autovacuum_*_weight < 1.0 to trigger an
>> > autovacuum/analyze.
>>
>> Ah, that's unfortunate.  I think it'd be good to give folks some idea of
>> what autovacuum will actually process.  I wonder if we could adjust the
>> documentation accordingly.
> 
> That's why I thought having the bool fields made sense in the earlier
> versions of the view. Since autovacuum is dealing with 2 concepts:
> 
> eligibility: is av enabled and is the table meeting thresholds
> score: The priority of how the eligible tables will be processed.
> 
> So, while this could be explained in docs, I think it's better we report
> these fields.

I understand your position, but I still worry about potential confusion if
when dovacuum and needs_vacuum differ.  And I don't know how much we really
ought to be tailoring this stuff to clusters where autovacuum is disabled
or where the scores are being adjusted.  Also, I think we ought to go into
more detail in the documentation, anyway.  So my instinct was to do
something more like the attached.

If we did report booleans, I would probably argue for just reporting
dovacuum and doanalyze and calling out the criteria for why they may be
false even when it looks like the table needs processing.

-- 
nathan

Вложения

Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> > eligibility: is av enabled and is the table meeting thresholds
> > score: The priority of how the eligible tables will be processed.
> >
> > So, while this could be explained in docs, I think it's better we report
> > these fields.
>
> I understand your position, but I still worry about potential confusion if
> when dovacuum and needs_vacuum differ.  And I don't know how much we really
> ought to be tailoring this stuff to clusters where autovacuum is disabled
> or where the scores are being adjusted.  Also, I think we ought to go into
> more detail in the documentation, anyway.

After looking at it a bit more, I don't think we can avoid putting
the needs_analyze and needs_vacuum fields, because without them
there will still be a gap in understanding of if a table is eligible for
autovacuum. A score on its own will not draw the full picture. For example,
if I tune the _weight settings, I want to know if this table now becomes
a candidate for vacuum/analyze, irrespective if autovacuum will actually
process it due to some bad configuration. I can see someone seaching
this view by just the bool fields to see the eligibile tables also without
caring about the score.

> If we did report booleans, I would probably argue for just reporting
> dovacuum and doanalyze and calling out the criteria for why they may be
> false even when it looks like the table needs processing.

Yes, we only require a needs_analyze and needs_vacuum. The latter
can be set to true due to thresholds or wraparound. But, I don't think we
should rely on the dovacuum or doanalyze, instead we can just have a flag
in AutoVacuumScores->needs and track what is needed. This will separate
the autovacuum processing from the reporting.

Here is what I am thinking we should do.

v11-0001 is same as v10-0001
v11-0002 introduces the additional bit field for reporting. This sets the
need for vacuum regardless of av_enabled.
v11-0003 the view with slightly more detailed documentation.

--
Sami

Вложения

Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
On Sat, Apr 04, 2026 at 12:48:28PM -0500, Sami Imseih wrote:
>> If we did report booleans, I would probably argue for just reporting
>> dovacuum and doanalyze and calling out the criteria for why they may be
>> false even when it looks like the table needs processing.
> 
> Yes, we only require a needs_analyze and needs_vacuum. The latter
> can be set to true due to thresholds or wraparound. But, I don't think we
> should rely on the dovacuum or doanalyze, instead we can just have a flag
> in AutoVacuumScores->needs and track what is needed. This will separate
> the autovacuum processing from the reporting.

Sorry for going in circles about this, but I'm not seeing why we wouldn't
just return the booleans that relation_needs_vacanalyze() already returns.
I think the question people will have is "what will autovacuum process and
in what order?", and if we aren't giving them the exact same information
that autovacuum is using to make its decisions, then I'm not sure what is
the point.  It's true that someone might disable autovacuum for a table and
that it would otherwise be processed, but so be it.

Concretely, like the attached 0003.  IMHO this feels much more natural than
giving folks booleans that usually represents dovacuum/doanalyze but that
don't in certain cases.

-- 
nathan

Вложения

Re: Add pg_stat_autovacuum_priority

От
Sami Imseih
Дата:
> >> If we did report booleans, I would probably argue for just reporting
> >> dovacuum and doanalyze and calling out the criteria for why they may be
> >> false even when it looks like the table needs processing.
> >
> > Yes, we only require a needs_analyze and needs_vacuum. The latter
> > can be set to true due to thresholds or wraparound. But, I don't think we
> > should rely on the dovacuum or doanalyze, instead we can just have a flag
> > in AutoVacuumScores->needs and track what is needed. This will separate
> > the autovacuum processing from the reporting.
>
> Sorry for going in circles about this, but I'm not seeing why we wouldn't
> just return the booleans that relation_needs_vacanalyze() already returns.
> I think the question people will have is "what will autovacuum process and
> in what order?", and if we aren't giving them the exact same information
> that autovacuum is using to make its decisions, then I'm not sure what is
> the point.  It's true that someone might disable autovacuum for a table and
> that it would otherwise be processed, but so be it.

Maybe there’s no need to worry too much about the autovacuum disabled
case or track_counts being disabled when querying the view. Both are
edge cases, and it seemed fairly trivial to compensate for this with what
I had attached earlier. Anyhow, I will not push this point further. I am ok
with proceeding with what you have in v12. The patches overall LGTM.

Thanks!

--
Sami



Re: Add pg_stat_autovacuum_priority

От
Nathan Bossart
Дата:
Committed after some more editorialization.

-- 
nathan