Обсуждение: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

Поиск
Список
Период
Сортировка

Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От
VASUKI M
Дата:

Hi all,

Following up on the recent discussion around ANALYZE (MISSING_STATS_ONLY):[1]

I would like to start a separate discussion about a potential ANALYZE (MODIFIED_STATS) option.

The idea is to allow manual ANALYZE to reuse the same threshold logic that autoanalyze uses, so that when a user explicitly runs ANALYZE, only relations that have crossed the modification threshold are processed.

Conceptually, this would use the existing formula:


analyze threshold = analyze_base_threshold
                    + analyze_scale_factor * reltuples


and compare it against n_mod_since_analyze, similar to how autovacuum decides when to trigger analyze.

The goal is not to replace autoanalyze, but to expose its decision model at SQL level for deterministic, user-controlled execution. For example:

-Running maintenance in scripted environments
-Triggering analysis immediately after batch data loads
-Avoiding unnecessary work when running manual ANALYZE across many relations
- Environments where autovacuum is tuned conservatively or partially disabled

Autoanalyze runs opportunistically in the background. This proposal would allow a user to apply the same threshold logic explicitly and immediately.

A possible usage would look like: ANALYZE (MODIFIED_STATS);

I understand there is conceptual overlap with autovacuum, so I would especially appreciate feedback on:

-Whether exposing the threshold logic at SQL level makes sense architecturally
-Whether this should remain an explicit opt-in option
-Naming (e.g., MODIFIED_STATS vs SKIP_UNMODIFIED or something clearer)
-Whether the thresholds should reuse existing GUCs or accept per-command overrides

I intentionally kept this separate from MISSING_STATS_ONLY, since that option answers a different question (“are stats missing?”) while this one would answer (“have enough rows changed to justify re-analysis?”).

I would greatly appreciate thoughts before working on a prototype patch.

Thanks again for all the feedback so far — it has been very helpful ,expecting here the same.

Regards,
Vasuki M
C-DAC,Chennai

[1][https://www.postgresql.org/message-id/CAE2r8H61ZtT4ek3JmLKDPmr7ALQ0uE9WSwwJRFHbXm0WdOJnEQ@mail.gmail.com]

Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От
Nathan Bossart
Дата:
On Tue, Feb 17, 2026 at 04:12:58PM +0530, VASUKI M wrote:
> I would greatly appreciate thoughts before working on a prototype patch.

At the risk of substantially expanding the scope of these patches, I wonder
if a better long-term approach would be to first centralize the
autovacuum/autoanalyze prioritization code and share it via a system view
that vacuumdb could use in place of the giant query for
--missing-stats-only.  That would also give users visibility into
auto{vacuum,analyze}'s decisions.  TBH I'm not totally sold on the idea of
giving ANALYZE more options for this stuff, if for no other reason than I'm
not really following the concrete use-cases, but I wouldn't say I'm
mortally opposed to it.

-- 
nathan



Re: Proposal: ANALYZE (MODIFIED_STATS) using autoanalyze thresholds

От
David Rowley
Дата:
On Wed, 18 Feb 2026 at 06:35, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Feb 17, 2026 at 04:12:58PM +0530, VASUKI M wrote:
> > I would greatly appreciate thoughts before working on a prototype patch.
>
> At the risk of substantially expanding the scope of these patches, I wonder
> if a better long-term approach would be to first centralize the
> autovacuum/autoanalyze prioritization code and share it via a system view
> that vacuumdb could use in place of the giant query for
> --missing-stats-only.  That would also give users visibility into
> auto{vacuum,analyze}'s decisions.  TBH I'm not totally sold on the idea of
> giving ANALYZE more options for this stuff, if for no other reason than I'm
> not really following the concrete use-cases, but I wouldn't say I'm
> mortally opposed to it.

Yeah, I'm not at all excited about adding options to ANALYZE for this
sort of thing either. I agree with the VIEW idea. If we had the vacuum
scoring stuff, I imagined it'd be useful to have a view that lists
tables and their vacuum/analyze score. If people want to script this
sort of thing then querying that view and running analyze on the
returned tables seems simple enough. I imagined a common thing that
people might want to do would be freeze tables that have a freeze age
somewhere close to autovacuum_freeze_max_age while off-peak so that
autovacuum doesn't trigger for that on-peak. If we allow ANALYZE to
have the proposed option, then we're opening a can of worms for
various other possible requirements for similar options in the VACUUM
command.

I also agree that vacuumdb could be a good place to code this up.

David