Обсуждение: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

Поиск

Список

Период

Сортировка

Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От

VASUKI M

Дата:

17 февраля, 13:42:58

Hi all,

Following up on the recent discussion around ANALYZE (MISSING_STATS_ONLY):[1]

I would like to start a separate discussion about a potential ANALYZE (MODIFIED_STATS) option.

The idea is to allow manual ANALYZE to reuse the same threshold logic that autoanalyze uses, so that when a user explicitly runs ANALYZE, only relations that have crossed the modification threshold are processed.

Conceptually, this would use the existing formula:

analyze threshold = analyze_base_threshold
+ analyze_scale_factor * reltuples

and compare it against n_mod_since_analyze, similar to how autovacuum decides when to trigger analyze.

The goal is not to replace autoanalyze, but to expose its decision model at SQL level for deterministic, user-controlled execution. For example:

-Running maintenance in scripted environments
-Triggering analysis immediately after batch data loads
-Avoiding unnecessary work when running manual ANALYZE across many relations

- Environments where autovacuum is tuned conservatively or partially disabled

Autoanalyze runs opportunistically in the background. This proposal would allow a user to apply the same threshold logic explicitly and immediately.

A possible usage would look like: ANALYZE (MODIFIED_STATS);

I understand there is conceptual overlap with autovacuum, so I would especially appreciate feedback on:

-Whether exposing the threshold logic at SQL level makes sense architecturally
-Whether this should remain an explicit opt-in option
-Naming (e.g., MODIFIED_STATS vs SKIP_UNMODIFIED or something clearer)
-Whether the thresholds should reuse existing GUCs or accept per-command overrides

I intentionally kept this separate from MISSING_STATS_ONLY, since that option answers a different question (“are stats missing?”) while this one would answer (“have enough rows changed to justify re-analysis?”).

I would greatly appreciate thoughts before working on a prototype patch.

Thanks again for all the feedback so far — it has been very helpful ,expecting here the same.

Regards,
Vasuki M
C-DAC,Chennai

[1][https://www.postgresql.org/message-id/CAE2r8H61ZtT4ek3JmLKDPmr7ALQ0uE9WSwwJRFHbXm0WdOJnEQ@mail.gmail.com]

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От

Nathan Bossart

Дата:

17 февраля, 20:35:43

On Tue, Feb 17, 2026 at 04:12:58PM +0530, VASUKI M wrote:
> I would greatly appreciate thoughts before working on a prototype patch.

At the risk of substantially expanding the scope of these patches, I wonder
if a better long-term approach would be to first centralize the
autovacuum/autoanalyze prioritization code and share it via a system view
that vacuumdb could use in place of the giant query for
--missing-stats-only.  That would also give users visibility into
auto{vacuum,analyze}'s decisions.  TBH I'm not totally sold on the idea of
giving ANALYZE more options for this stuff, if for no other reason than I'm
not really following the concrete use-cases, but I wouldn't say I'm
mortally opposed to it.

-- 
nathan

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От

David Rowley

Дата:

18 февраля, 02:12:38

On Wed, 18 Feb 2026 at 06:35, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Feb 17, 2026 at 04:12:58PM +0530, VASUKI M wrote:
> > I would greatly appreciate thoughts before working on a prototype patch.
>
> At the risk of substantially expanding the scope of these patches, I wonder
> if a better long-term approach would be to first centralize the
> autovacuum/autoanalyze prioritization code and share it via a system view
> that vacuumdb could use in place of the giant query for
> --missing-stats-only.  That would also give users visibility into
> auto{vacuum,analyze}'s decisions.  TBH I'm not totally sold on the idea of
> giving ANALYZE more options for this stuff, if for no other reason than I'm
> not really following the concrete use-cases, but I wouldn't say I'm
> mortally opposed to it.

Yeah, I'm not at all excited about adding options to ANALYZE for this
sort of thing either. I agree with the VIEW idea. If we had the vacuum
scoring stuff, I imagined it'd be useful to have a view that lists
tables and their vacuum/analyze score. If people want to script this
sort of thing then querying that view and running analyze on the
returned tables seems simple enough. I imagined a common thing that
people might want to do would be freeze tables that have a freeze age
somewhere close to autovacuum_freeze_max_age while off-peak so that
autovacuum doesn't trigger for that on-peak. If we allow ANALYZE to
have the proposed option, then we're opening a can of worms for
various other possible requirements for similar options in the VACUUM
command.

I also agree that vacuumdb could be a good place to code this up.

David

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От

Corey Huinker

Дата:

24 февраля, 16:45:26

Yeah, I'm not at all excited about adding options to ANALYZE for this
sort of thing either. I agree with the VIEW idea. If we had the vacuum
scoring stuff, I imagined it'd be useful to have a view that lists
tables and their vacuum/analyze score.

Several of the stats import iterations that never made it to commits had exportable stats view, so I took a stab at making pg_missing_stats, and a couple of problems emerged.

1. The view would need to expose pg_class.oid so that it could be joined to the listed_object CTE. Our pattern seems to be schemaname+tablename with no exposed oids, something that has caused a problem in the past, namely in pg_dump when batching attribute stats. Is exposing oids in these views a no-no?

2. The view either needs the pg_class.oid to join back to pg_class to get relkind, relpersistence, relfrozenxid, relminmxid, etc...or it needs to expose those columns as a part of the view, which would be problematic when vacuumdb and other apps decide that they need to filter on some other part of pg_class, and then we've got different versions of the system view which makes that option highly unattractive.

If we instead did a system function `pg_rel_is_missing_stats(oid) returns boolean`, but it would still need to sanity check on relkind and filter on relpersistence and inherited.

So either way we're doing some self-joins on pg_class, probably with a security barrier.

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От

Sami Imseih

Дата:

24 февраля, 17:19:55

Hi,

>> Yeah, I'm not at all excited about adding options to ANALYZE for this
>> sort of thing either. I agree with the VIEW idea. If we had the vacuum
>> scoring stuff, I imagined it'd be useful to have a view that lists
>> tables and their vacuum/analyze score.
>
> If we instead did a system function `pg_rel_is_missing_stats(oid) returns boolean`, but it would still need to sanity
checkon relkind and filter on relpersistence and inherited.
 
>
> So either way we're doing some self-joins on pg_class, probably with a security barrier.

It seems like this thread is now discussing both the modified stats
option ( this thread )
and the skipped stats option [1], which will be hard to follow.

But from a high-level, I am not sure how providing a VIEW or functions
in which users
will be required to script out the ANALYZE commands to run is better
than providing
ANALYZE options? the difference is putting the burden on the user to
come up with
the SQL script vs a simple ANALYZE option. IMO, the latter is more appealing
from a user perspective.

I think the view is a good idea, and this was discussed a bit here [2], if we
do implement a scoring algorithm, but this view will be more for
monitoring/visibility.


[1][https://www.postgresql.org/message-id/CAE2r8H61ZtT4ek3JmLKDPmr7ALQ0uE9WSwwJRFHbXm0WdOJnEQ@mail.gmail.com]
[2][https://www.postgresql.org/message-id/CAApHDvpVE5F-_8rpPC%2B-L98mA0yK0S_jtQGqLn69fkRevf726g%40mail.gmail.com]

--
Sami Imseih
Amazon Web Services (AWS)

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От

Nathan Bossart

Дата:

24 февраля, 19:27:25

On Tue, Feb 24, 2026 at 08:45:26AM -0500, Corey Huinker wrote:
> 1. The view would need to expose pg_class.oid so that it could be joined to
> the listed_object CTE. Our pattern seems to be schemaname+tablename with no
> exposed oids, something that has caused a problem in the past, namely in
> pg_dump when batching attribute stats. Is exposing oids in these views a
> no-no?

I don't think so.  Counterexamples include pg_stat_activity,
pg_stat_subscription, pg_stat_database, pg_stat_all_tables, and more.  If
anything, maybe we should consider _adding_ oid columns to views like
pg_stats_ext...

> 2. The view either needs the pg_class.oid to join back to pg_class to get
> relkind, relpersistence, relfrozenxid, relminmxid, etc...or it needs to
> expose those columns as a part of the view, which would be problematic when
> vacuumdb and other apps decide that they need to filter on some other part
> of pg_class, and then we've got different versions of the system view which
> makes that option highly unattractive.

Hm.  I don't know how much to worry about this.  System view/catalog
changes do happen from time to time, so unless we have a bunch of possible
changes that will happen soon, I'd expect it to probably be okay.

> If we instead did a system function `pg_rel_is_missing_stats(oid) returns
> boolean`, but it would still need to sanity check on relkind and filter on
> relpersistence and inherited.

Couldn't the function handle those sanity checks?

-- 
nathan

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От

Nathan Bossart

Дата:

24 февраля, 20:01:45

On Tue, Feb 24, 2026 at 08:19:55AM -0600, Sami Imseih wrote:
> But from a high-level, I am not sure how providing a VIEW or functions in
> which users will be required to script out the ANALYZE commands to run is
> better than providing ANALYZE options? the difference is putting the
> burden on the user to come up with the SQL script vs a simple ANALYZE
> option. IMO, the latter is more appealing from a user perspective.

My principle objection isn't with the option itself, it's that _just_
adding the option papers over the current state of affairs without taking
the opportunity to do a bunch of really useful centralization/refactoring.
By exposing the auto{vacuum,analyze} prioritization code and building off
that, we get a consistent picture wherever it is needed (e.g.,
auto{vacuum,analyze}, vacuumdb, monitoring tools, custom scripts, and any
new VACUUM/ANALYZE options) without having to reinvent the wheel each time.
Sure, we could have vacuumdb just use MISSING_STATS_ONLY for every command
in --missing-stats-only mode, but that seems like a lot of overhead, and we
still aren't helping with observability (which I expect to become even more
important as autovacuum's prioritization code evolves).

I'm admittedly waving my hands a bit here, but in any case, I think this is
an opportunity to do a bunch of super useful improvements that will have
much bigger benefits down the road.

-- 
nathan

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds