Обсуждение: [PATCH] Add sampling statistics to autoanalyze log output

Поиск

Список

Период

Сортировка

[PATCH] Add sampling statistics to autoanalyze log output

От

河田達也

Дата:

06 декабря 2025 г., 20:39:45

Hi,

I would like to propose a patch to add sampling statistics to autoanalyze log output, addressing an inconsistency between ANALYZE VERBOSE and autoanalyze logging.

## Problem

Currently, ANALYZE VERBOSE displays sampling statistics, but autoanalyze does not log this information.
This makes it harder to diagnose issues with automatic statistics collection.

Example (current behavior):
- ANALYZE VERBOSE: Shows "INFO: "pg_class": scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total rows."
- autoanalyze: No sampling information

## Solution

This patch unifies the logging output by moving sampling statistics from acquire_sample_rows() to do_analyze_rel()'s instrumentation section. Now
both ANALYZE VERBOSE and autoanalyze output the same sampling information in a consolidated log message.

Key changes:
1. Updated AcquireSampleRowsFunc typedef to include 4 new output parameters
2. Modified acquire_sample_rows() and acquire_inherited_sample_rows() to populate these parameters
3. Added sampling statistics output in do_analyze_rel()
4. Updated postgres_fdw and file_fdw implementations

## Example Output

After the patch(adding both ANALYZE VERBOSE and autoanalyze) :
sampling: scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total rows

For inherited tables, statistics are accumulated across all children.

## Design Question

For inherited tables, the current patch shows only the accumulated total.
An alternative approach would be to show per-child statistics followed by the total.
I wanted to align with do_analyze_rel()'s structure to properly support autoanalyze (autovacuum) logging.
However, I haven't found a clean way to preserve per-child output while maintaining this structure.
I would appreciate any advice or suggestions on how to achieve both goals if there's a better approach I'm missing.

I would appreciate your feedback!

Regards,

Вложения

v1-0001-Add-sampling-statistics-to-autoanalyze-log-output.patch

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

河田達也

Дата:

07 декабря 2025 г., 06:37:08

Hi,

Here is the updated version (v2).
This revision fixes the CI failure reported in the previous patch.
No other functional changes.

Regards,

Вложения

v2-0001-Add-sampling-statistics-to-autoanalyze-log-output.patch

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Fujii Masao

Дата:

07 января, 16:01:12

On Sun, Dec 7, 2025 at 2:40 AM 河田達也 <kawatatatsuya0913@gmail.com> wrote:
>
> Hi,
>
> I would like to propose a patch to add sampling statistics to autoanalyze log output, addressing an inconsistency
betweenANALYZE VERBOSE and autoanalyze logging. 
>
> ## Problem
>
> Currently, ANALYZE VERBOSE displays sampling statistics, but autoanalyze does not log this information.
> This makes it harder to diagnose issues with automatic statistics collection.
>
> Example (current behavior):
> - ANALYZE VERBOSE: Shows "INFO:  "pg_class": scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434
rowsin sample, 434 estimated total rows." 
> - autoanalyze: No sampling information
>
> ## Solution
>
> This patch unifies the logging output by moving sampling statistics from acquire_sample_rows() to do_analyze_rel()'s
instrumentationsection. Now 
> both ANALYZE VERBOSE and autoanalyze output the same sampling information in a consolidated log message.

+1 to including sampling information in the autoanalyze log message as well.


> Key changes:
> 1. Updated AcquireSampleRowsFunc typedef to include 4 new output parameters
> 2. Modified acquire_sample_rows() and acquire_inherited_sample_rows() to populate these parameters
> 3. Added sampling statistics output in do_analyze_rel()
> 4. Updated postgres_fdw and file_fdw implementations
>
> ## Example Output
>
> After the patch(adding both ANALYZE VERBOSE and autoanalyze) :
> sampling: scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total
rows
>
> For inherited tables, statistics are accumulated across all children.
>
> ## Design Question
>
> For inherited tables, the current patch shows only the accumulated total.
> An alternative approach would be to show per-child statistics followed by the total.
> I wanted to align with do_analyze_rel()'s structure to properly support autoanalyze (autovacuum) logging.
> However, I haven't found a clean way to preserve per-child output while maintaining this structure.
> I would appreciate any advice or suggestions on how to achieve both goals if there's a better approach I'm missing.
>
> I would appreciate your feedback!

I noticed an issue with the sampling information shown when
analyzing foreign tables. Based on my testing, the reported values seem
incorrect. Is this expected behavior?

For example, running ANALYZE VERBOSE on a regular table and its foreign
table produces different sampling output:

    CREATE EXTENSION postgres_fdw ;
    CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw ;
    CREATE USER MAPPING FOR public SERVER loopback ;
    CREATE TABLE t (i int) ;
    CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't') ;
    INSERT INTO ft SELECT n FROM generate_series(1, 100000) n ;

    The sampling output by ANALYZE VERBOSE t:
    sampling: scanned 443 of 443 pages, containing 100000 live rows
and 0 dead rows; 30000 rows in sample, 100000 estimated total rows

    The sampling output by ANALYZE VERBOSE ft:
    sampling: scanned 1000 of 1000 pages, containing 30261 live rows
and 0 dead rows; 30000 rows in sample, 100000 estimated total rows

In particular, the reported number of scanned pages and live rows for
the foreign table look wrong.


Also, the patch moves the sampling information to appear after
buffer usage, WAL usage, etc. In my opinion, it's more intuitive to
report analyze activity before buffer and WAL usage, as ANALYZE VERBOSE
currently does. VACUUM VERBOSE follows the same pattern,
reporting activity details before buffer and WAL usage.

Regards,

--
Fujii Masao

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Sami Imseih

Дата:

08 января, 01:22:02

Hi,

Thanks for the work on this, and I agree that the logging between a manual
ANALYZE and autoanalyze should be unified. +1

I have a few comments:

1/ Wouldn’t it be better to combine the new sampling output fields into a
single struct? This is more idiomatic, similar to BufferUsage or WalUsage,
and it would also make the API easier to extend in the future. Right now,
changing the function signature breaks the AcquireSampleRowsFunc ABI,
which is acceptable for a major release, but using a struct would help
avoid future breaks if we ever add more sampling data in later releases.

```
@@ -535,11 +546,15 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
        if (inh)
                numrows = acquire_inherited_sample_rows(onerel, elevel,

                         rows, targrows,
-
                         &totalrows, &totaldeadrows);
+
                         &totalrows, &totaldeadrows,
+
                         &totalpages, &scannedpages,
+
                         &sampleliverows, &sampledeadrows);
        else
                numrows = (*acquirefunc) (onerel, elevel,

rows, targrows,
-
&totalrows, &totaldeadrows);
+
&totalrows, &totaldeadrows,
+
&totalpages, &scannedpages,
+
&sampleliverows, &sampledeadrows);

```

2/ I think this patch should just focus on unifying the existing logging only.

The " estimated total rows" for a foreign table should be a separate
thread/patch, IMO.

3/ You should also run pgindent.

Regards,

--
Sami Imseih
Amazon Web Services (AWS)

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Chao Li

Дата:

08 января, 05:53:46

> On Dec 7, 2025, at 11:37, 河田達也 <kawatatatsuya0913@gmail.com> wrote:
>
> Hi,
>
> Here is the updated version (v2).
> This revision fixes the CI failure reported in the previous patch.
> No other functional changes.
>
> Regards,
>
> <v2-0001-Add-sampling-statistics-to-autoanalyze-log-output.patch>

I echo Fujii-san’s test result, I think that is because:

```
+    *scannedpages = relpages;    /* use estimate for foreign tables */
```

You are using estimated value for foreign tables. I also have a concern about assuming 100 rows per page. IMO, if
somethingwe don’t know, we should not pretend that we know. Because from the code comment, we read that that’s assumed,
butfrom an end user’s point of view, they may consider the number is real. 

The other nit comment is, as you add 4 parameters to acquire_sample_rows(), they are all output parameters, I think you
needto update the header comment to describe them. 

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Tatsuya Kawata

Дата:

08 января, 17:13:51

Hi Fujii-san, Sami-san, Chao-san,

Thank you for the valuable feedback!
I plan to work on addressing these comments this weekend and will post an update.

Regards,
Tatsuya Kawata

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Tatsuya Kawata

Дата:

10 января, 21:25:16

Hi Fujii-san, Sami-san, Chao-san,

Thank you for the valuable feedback!

I addressed these comments and updated the patch.

> I noticed an issue with the sampling information shown when
> analyzing foreign tables. Based on my testing, the reported values seem
> incorrect. Is this expected behavior?
>
>
>
> For example, running ANALYZE VERBOSE on a regular table and its foreign
> table produces different sampling output:
>
>
>
> CREATE EXTENSION postgres_fdw ;
> CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw ;
> CREATE USER MAPPING FOR public SERVER loopback ;
> CREATE TABLE t (i int) ;
> CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't') ;
> INSERT INTO ft SELECT n FROM generate_series(1, 100000) n ;
>
>
>
> The sampling output by ANALYZE VERBOSE t:
> sampling: scanned 443 of 443 pages, containing 100000 live rows
> and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
>
>
>
> The sampling output by ANALYZE VERBOSE ft:
> sampling: scanned 1000 of 1000 pages, containing 30261 live rows
> and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
>
>
>
> In particular, the reported number of scanned pages and live rows for
> the foreign table look wrong.

> 2/ I think this patch should just focus on unifying the existing logging only.
>
>
>
> The " estimated total rows" for a foreign table should be a separate
> thread/patch, IMO.

> echo Fujii-san’s test result, I think that is because:
>
>
>
> ```
> + *scannedpages = relpages; /* use estimate for foreign tables */
> ```
>
>
>
> You are using estimated value for foreign tables. I also have a concern about assuming 100 rows per page. IMO, if something we don’t know, we should n> ot pretend that we know. Because from the code comment, we read that that’s assumed, but from an end user’s point of view, they may consider the numb> er is real.

Thank you for pointing this out. I agree that the displayed values were inappropriate. I have removed the foreign table sampling estimation logic from this patch. To avoid misleading, when analyzing foreign tables, the log now omits the page and row details:

```
scanned 0 of 0 pages, containing 0 live rows and 0 dead rows
```

> 1/ Wouldn’t it be better to combine the new sampling output fields into a
> single struct? This is more idiomatic, similar to BufferUsage or WalUsage,
> and it would also make the API easier to extend in the future. Right now,
> changing the function signature breaks the AcquireSampleRowsFunc ABI,
> which is acceptable for a major release, but using a struct would help
> avoid future breaks if we ever add more sampling data in later releases.
>
>
>
> ```
> @@ -535,11 +546,15 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
> if (inh)
> numrows = acquire_inherited_sample_rows(onerel, elevel,
>
>
>
> rows, targrows,
> -
> &totalrows, &totaldeadrows);
> +
> &totalrows, &totaldeadrows,
> +
> &totalpages, &scannedpages,
> +
> &sampleliverows, &sampledeadrows);
> else
> numrows = (*acquirefunc) (onerel, elevel,
>
>
>
> rows, targrows,
> -
> &totalrows, &totaldeadrows);
> +
> &totalrows, &totaldeadrows,
> +
> &totalpages, &scannedpages,
> +
> &sampleliverows, &sampledeadrows);
>
>
>
> ```

> The other nit comment is, as you add 4 parameters to acquire_sample_rows(), they are all output parameters, I think you need to update the header comment to describe them.

I agree with consolidating the sampling statistics into a single struct, and I have implemented this change. I considered whether to include the existing totalrows and totaldeadrows in the new struct, but decided against it for now. Since totalrows in particular is used extensively throughout the codebase, including it would expand the scope beyond the original goal of improving log output. This can be addressed in a future patch if needed.

> Also, the patch moves the sampling information to appear after
> buffer usage, WAL usage, etc. In my opinion, it's more intuitive to
> report analyze activity before buffer and WAL usage, as ANALYZE VERBOSE
> currently does. VACUUM VERBOSE follows the same pattern,
> reporting activity details before buffer and WAL usage.

Fixed as suggested. The sampling information now appears before buffer and WAL usage, consistent with ANALYZE VERBOSE and VACUUM VERBOSE.

> 3/ You should also run pgindent.

I ran pgindent, but the results differed from the existing coding style in the same files:
- Three tabs were inserted before the struct name at the end of typedef (existing style uses one space)
- A space was inserted between * and the variable name (existing style has no space)
So, I have reverted the pgindent changes. If there is a specific configuration needed, please let me know.

Regarding the log output, I would like to add some context. Originally, when running ANALYZE VERBOSE on a parent table with inheritance, the parent table's sampling information was displayed twice - once for the parent alone and once at the beginning of the inheritance tree output. This may appear to be unintentional:

```
ANALYZE VERBOSE parent_table;
INFO: analyzing "public.parent_table"
INFO: "parent_table": scanned 4 of 4 pages, containing 500 live rows and 0 dead rows; 500 rows in sample, 500 estimated total rows
...
INFO: analyzing "public.parent_table" inheritance tree
INFO: "parent_table": scanned 4 of 4 pages, containing 500 live rows and 0 dead rows; 500 rows in sample, 500 estimated total rows
INFO: "child_table1": scanned 16 of 16 pages, containing 2500 live rows and 0 dead rows; 2500 rows in sample, 2500 estimated total rows
INFO: "child_table2": scanned 13 of 13 pages, containing 2000 live rows and 0 dead rows; 2000 rows in sample, 2000 estimated total rows
INFO: "child_table3": scanned 10 of 10 pages, containing 1500 live rows and 0 dead rows; 1500 rows in sample, 1500 estimated total rows
...
```

In this patch, I have added "inheritance tree" to the autoanalyze log message as well, and the inheritance tree statistics now show the aggregated totals from all child tables:
```
2026-01-11 02:05:34.620 JST [864411] LOG: automatic analyze of table "postgres.public.parent_table"
sampling: scanned 6 of 6 pages, containing 1000 live rows and 0 dead rows; 1000 rows in sample, 1000 estimated total rows
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 215 hits, 0 reads, 0 dirtied
WAL usage: 5 records, 0 full page images, 3839 bytes, 0 full page image bytes, 0 buffers full
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2026-01-11 02:05:34.649 JST [864411] LOG: automatic analyze of table "postgres.public.parent_table" inheritance tree
sampling: scanned 84 of 84 pages, containing 14500 live rows and 0 dead rows; 14500 rows in sample, 14500 estimated total rows
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 169 hits, 0 reads, 0 dirtied
WAL usage: 6 records, 0 full page images, 3213 bytes, 0 full page image bytes, 0 buffers full
system usage: CPU: user: 0.01 s, system: 0.01 s, elapsed: 0.02 s
2026-01-11 02:05:34.659 JST [864411] LOG: automatic analyze of table "postgres.public.child_table1"
sampling: scanned 29 of 29 pages, containing 5000 live rows and 0 dead rows; 5000 rows in sample, 5000 estimated total rows
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 64 hits, 0 reads, 0 dirtied
WAL usage: 4 records, 0 full page images, 3211 bytes, 0 full page image bytes, 0 buffers full
```

I have attached the updated patch v3.

Regards,
Tatsuya Kawata

Вложения

v3-0001-Add-sampling-statistics-to-autoanalyze-log-output.patch

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Chao Li

Дата:

12 января, 12:21:49


> On Jan 11, 2026, at 02:25, Tatsuya Kawata <kawatatatsuya0913@gmail.com> wrote:
>
> Hi Fujii-san, Sami-san, Chao-san,
>
> Thank you for the valuable feedback!
> I addressed these comments and  updated the patch.
>
> > I noticed an issue with the sampling information shown when
> > analyzing foreign tables. Based on my testing, the reported values seem
> > incorrect. Is this expected behavior?
> >
> >
> >
> > For example, running ANALYZE VERBOSE on a regular table and its foreign
> > table produces different sampling output:
> >
> >
> >
> >     CREATE EXTENSION postgres_fdw ;
> >     CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw ;
> >     CREATE USER MAPPING FOR public SERVER loopback ;
> >     CREATE TABLE t (i int) ;
> >     CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't') ;
> >     INSERT INTO ft SELECT n FROM generate_series(1, 100000) n ;
> >
> >
> >
> >     The sampling output by ANALYZE VERBOSE t:
> >     sampling: scanned 443 of 443 pages, containing 100000 live rows
> > and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
> >
> >
> >
> >     The sampling output by ANALYZE VERBOSE ft:
> >     sampling: scanned 1000 of 1000 pages, containing 30261 live rows
> > and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
> >
> >
> >
> > In particular, the reported number of scanned pages and live rows for
> > the foreign table look wrong.
>
> > 2/ I think this patch should just focus on unifying the existing logging only.
> >
> >
> >
> > The " estimated total rows" for a foreign table should be a separate
> > thread/patch, IMO.
>
> >  echo Fujii-san’s test result, I think that is because:
> >
> >
> >
> > ```
> > + *scannedpages = relpages; /* use estimate for foreign tables */
> > ```
> >
> >
> >
> > You are using estimated value for foreign tables. I also have a concern about assuming 100 rows per page. IMO, if
somethingwe don’t know, we should n> ot pretend that we know. Because from the code comment, we read that that’s
assumed,but from an end user’s point of view, they may consider the numb> er is real. 
>
> Thank you for pointing this out. I agree that the displayed values were inappropriate. I have removed the foreign
tablesampling estimation logic from this patch. To avoid misleading, when analyzing foreign tables, the log now omits
thepage and row details: 
>
> ```
> scanned 0 of 0 pages, containing 0 live rows and 0 dead rows
> ```
>
>
> > 1/ Wouldn’t it be better to combine the new sampling output fields into a
> > single struct? This is more idiomatic, similar to BufferUsage or WalUsage,
> > and it would also make the API easier to extend in the future. Right now,
> > changing the function signature breaks the AcquireSampleRowsFunc ABI,
> > which is acceptable for a major release, but using a struct would help
> > avoid future breaks if we ever add more sampling data in later releases.
> >
> >
> >
> > ```
> > @@ -535,11 +546,15 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
> >         if (inh)
> >                 numrows = acquire_inherited_sample_rows(onerel, elevel,
> >
> >
> >
> >                          rows, targrows,
> > -
> >                          &totalrows, &totaldeadrows);
> > +
> >                          &totalrows, &totaldeadrows,
> > +
> >                          &totalpages, &scannedpages,
> > +
> >                          &sampleliverows, &sampledeadrows);
> >         else
> >                 numrows = (*acquirefunc) (onerel, elevel,
> >
> >
> >
> > rows, targrows,
> > -
> > &totalrows, &totaldeadrows);
> > +
> > &totalrows, &totaldeadrows,
> > +
> > &totalpages, &scannedpages,
> > +
> > &sampleliverows, &sampledeadrows);
> >
> >
> >
> > ```
>
> > The other nit comment is, as you add 4 parameters to acquire_sample_rows(), they are all output parameters, I think
youneed to update the header comment to describe them. 
>
> I agree with consolidating the sampling statistics into a single struct, and I have implemented this change. I
consideredwhether to include the existing totalrows and totaldeadrows in the new struct, but decided against it for
now.Since totalrows in particular is used extensively throughout the codebase, including it would expand the scope
beyondthe original goal of improving log output. This can be addressed in a future patch if needed. 
>
>
> > Also, the patch moves the sampling information to appear after
> > buffer usage, WAL usage, etc. In my opinion, it's more intuitive to
> > report analyze activity before buffer and WAL usage, as ANALYZE VERBOSE
> > currently does. VACUUM VERBOSE follows the same pattern,
> > reporting activity details before buffer and WAL usage.
>
> Fixed as suggested. The sampling information now appears before buffer and WAL usage, consistent with ANALYZE VERBOSE
andVACUUM VERBOSE. 
>
>
> > 3/ You should also run pgindent.
>
> I ran pgindent, but the results differed from the existing coding style in the same files:
>  - Three tabs were inserted before the struct name at the end of typedef (existing style uses one space)
>  - A space was inserted between * and the variable name (existing style has no space)
> So, I have reverted the pgindent changes. If there is a specific configuration needed, please let me know.
>
>
> Regarding the log output, I would like to add some context. Originally, when running ANALYZE VERBOSE on a parent
tablewith inheritance, the parent table's sampling information was displayed twice - once for the parent alone and once
atthe beginning of the inheritance tree output. This may appear to be unintentional: 
>
> ```
> ANALYZE VERBOSE parent_table;
> INFO:  analyzing "public.parent_table"
> INFO:  "parent_table": scanned 4 of 4 pages, containing 500 live rows and 0 dead rows; 500 rows in sample, 500
estimatedtotal rows 
> ...
> INFO:  analyzing "public.parent_table" inheritance tree
> INFO:  "parent_table": scanned 4 of 4 pages, containing 500 live rows and 0 dead rows; 500 rows in sample, 500
estimatedtotal rows 
> INFO:  "child_table1": scanned 16 of 16 pages, containing 2500 live rows and 0 dead rows; 2500 rows in sample, 2500
estimatedtotal rows 
> INFO:  "child_table2": scanned 13 of 13 pages, containing 2000 live rows and 0 dead rows; 2000 rows in sample, 2000
estimatedtotal rows 
> INFO:  "child_table3": scanned 10 of 10 pages, containing 1500 live rows and 0 dead rows; 1500 rows in sample, 1500
estimatedtotal rows 
> ...
> ```
>
> In this patch, I have added "inheritance tree" to the autoanalyze log message as well, and the inheritance tree
statisticsnow show the aggregated totals from all child tables: 
> ```
> 2026-01-11 02:05:34.620 JST [864411] LOG:  automatic analyze of table "postgres.public.parent_table"
> sampling: scanned 6 of 6 pages, containing 1000 live rows and 0 dead rows; 1000 rows in sample, 1000 estimated total
rows
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 215 hits, 0 reads, 0 dirtied
> WAL usage: 5 records, 0 full page images, 3839 bytes, 0 full page image bytes, 0 buffers full
> system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 2026-01-11 02:05:34.649 JST [864411] LOG:  automatic analyze of table "postgres.public.parent_table" inheritance tree
> sampling: scanned 84 of 84 pages, containing 14500 live rows and 0 dead rows; 14500 rows in sample, 14500 estimated
totalrows 
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 169 hits, 0 reads, 0 dirtied
> WAL usage: 6 records, 0 full page images, 3213 bytes, 0 full page image bytes, 0 buffers full
> system usage: CPU: user: 0.01 s, system: 0.01 s, elapsed: 0.02 s
> 2026-01-11 02:05:34.659 JST [864411] LOG:  automatic analyze of table "postgres.public.child_table1"
> sampling: scanned 29 of 29 pages, containing 5000 live rows and 0 dead rows; 5000 rows in sample, 5000 estimated
totalrows 
> avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
> buffer usage: 64 hits, 0 reads, 0 dirtied
> WAL usage: 4 records, 0 full page images, 3211 bytes, 0 full page image bytes, 0 buffers full
> ```
>
> I have attached the updated patch v3.
>
> Regards,
> Tatsuya Kawata
>
> <v3-0001-Add-sampling-statistics-to-autoanalyze-log-output.patch>

Thanks for updating the patch. Just have a few comments on v3:

1
```
 static int
 acquire_inherited_sample_rows(Relation onerel, int elevel,
                               HeapTuple *rows, int targrows,
-                              double *totalrows, double *totaldeadrows)
+                              double *totalrows, double *totaldeadrows,
+                              SamplingStats *sampling_stats)
 {
     List       *tableOIDs;
     Relation   *rels;
@@ -1408,10 +1426,12 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
                 i;
     ListCell   *lc;
     bool        has_child;
+    SamplingStats child_sampling_stats;
```

child_sampling_stats is only used inside if (childblocks > 0), so it would be better to just define it there.

2
```
+typedef struct SamplingStats
+{
+    BlockNumber totalpages;        /* total pages in relation */
+    BlockNumber scannedpages;    /* pages actually scanned */
+    double        liverows;        /* live rows found during sampling */
+    double        deadrows;        /* dead rows found during sampling */
+} SamplingStats;
```

SamplingStats is a very generic name, maybe rename to AnalyzeSamplingStats or something else.

BTW, from reviewing this patch, I found an issue and proposed a patch [1], would you please help review that?

[1] https://www.postgresql.org/message-id/CAEoWx2mAwXnxVWTP%3DdgwA98dJ7JxzxM8qToaxAGuByHN1-%3DoNQ%40mail.gmail.com

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Tatsuya Kawata

Дата:

12 января, 19:04:28

Hi Chao-san,

Thank you for your review!
I will take a look this week.

> BTW, from reviewing this patch, I found an issue and proposed a patch [1], would you please help review that?
>
> [1] https://www.postgresql.org/message-id/CAEoWx2mAwXnxVWTP%3DdgwA98dJ7JxzxM8qToaxAGuByHN1-%3DoNQ%40mail.gmail.com

Thank you for pointing me to the related issue and patch!
I've just added a few comments, but this was my first time doing a review, so please excuse me if there are any mistakes or things I may have overlooked.

Regards,
Tatsuya Kawata

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Fujii Masao

Дата:

13 января, 11:22:01

On Sun, Jan 11, 2026 at 3:25 AM Tatsuya Kawata
<kawatatatsuya0913@gmail.com> wrote:
> Thank you for the valuable feedback!
> I addressed these comments and  updated the patch.

Thanks for updating the patch!


> Thank you for pointing this out. I agree that the displayed values were inappropriate. I have removed the foreign
tablesampling estimation logic from this patch. To avoid misleading, when analyzing foreign tables, the log now omits
thepage and row details: 
>
> ```
> scanned 0 of 0 pages, containing 0 live rows and 0 dead rows
> ```

With this change, we would end up reporting nearly the same sampling
information twice, for example:

    INFO:  analyzing "public.ft"
    INFO:  "ft": table contains 10 rows, 10 rows in sample
    INFO:  finished analyzing table "postgres.public.ft"
    sampling: 10 rows in sample, 10 estimated total rows

Wouldn't it be less confusing to avoid reporting the second sampling line?


> > The other nit comment is, as you add 4 parameters to acquire_sample_rows(), they are all output parameters, I think
youneed to update the header comment to describe them. 
>
> I agree with consolidating the sampling statistics into a single struct, and I have implemented this change. I
consideredwhether to include the existing totalrows and totaldeadrows in the new struct, but decided against it for
now.Since totalrows in particular is used extensively throughout the codebase, including it would expand the scope
beyondthe original goal of improving log output. This can be addressed in a future patch if needed. 

I'm not sure it's acceptable to change the FDW API and require
FDW authors to update their extensions, especially since
the benefit on the FDW side seems limited at this point.

OTOH, *if* we decide to change the API, we should clearly define
how the SamplingStats returned by AcquireSampleRowsFunc is
processed in core, update fdwhandler.sgml, and consider
whether this API change is actually useful for FDWs.


> I ran pgindent, but the results differed from the existing coding style in the same files:
>  - Three tabs were inserted before the struct name at the end of typedef (existing style uses one space)
>  - A space was inserted between * and the variable name (existing style has no space)
> So, I have reverted the pgindent changes. If there is a specific configuration needed, please let me know.

You may need to update src/tools/pgindent/typedefs.list.


> In this patch, I have added "inheritance tree" to the autoanalyze log message as well, and the inheritance tree
statisticsnow show the aggregated totals from all child tables: 

Based on my testing, the aggregated values look incorrect.
In the example below, both t_0 and t_1 report 5,000,000 live rows,
but the aggregated result is 6,779,052. Is that the expected
aggregation behavior?

    INFO:  analyzing "public.t" inheritance tree
    sampling: scanned 30000 of 44253 pages, containing 6779052 live
rows and 0 dead rows; 30000 rows in sample, 9999780 estimated total
rows
    ...
    INFO:  analyzing "public.t_0"
    sampling: scanned 22126 of 22126 pages, containing 5000000 live
rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total
rows
    ...
    INFO:  analyzing "public.t_1"
    sampling: scanned 22127 of 22127 pages, containing 5000000 live
rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total
rows

    =# \d+ t
                                         Partitioned table "public.t"
    Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
    --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
    i      | integer |           |          |         | plain   |
       |              |
    j      | integer |           |          |         | plain   |
       |              |
    Partition key: RANGE (i)
    Partitions: t_0 FOR VALUES FROM (1) TO (5000001),
               t_1 FOR VALUES FROM (5000001) TO (10000001)

    =# select count(*) from t_0;
     count
    ---------
    5000000
    (1 row)

    =# select count(*) from t_1;
     count
    ---------
    5000000
    (1 row)

Regards,

--
Fujii Masao

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Tatsuya Kawata

Дата:

14 января, 18:43:32

Hi Fujii-san, Chao-san,

Thank you for your review!

> With this change, we would end up reporting nearly the same sampling
> information twice, for example:
>
> INFO: analyzing "public.ft"
> INFO: "ft": table contains 10 rows, 10 rows in sample
> INFO: finished analyzing table "postgres.public.ft"
> sampling: 10 rows in sample, 10 estimated total rows
>
> Wouldn't it be less confusing to avoid reporting the second sampling line?

I agree, that would be better. I'll fix this.

> I'm not sure it's acceptable to change the FDW API and require
> FDW authors to update their extensions, especially since
> the benefit on the FDW side seems limited at this point.

I agree that it's premature to require FDW-side changes at this stage. I'll remove those modifications.

> You may need to update src/tools/pgindent/typedefs.list.

Thank you! I'll check and update it.

> Based on my testing, the aggregated values look incorrect.
> In the example below, both t_0 and t_1 report 5,000,000 live rows,
> but the aggregated result is 6,779,052. Is that the expected
> aggregation behavior?

I had misunderstood the sampling behavior for inheritance tables. I initially thought it would be straightforward to display the sum of child table values for the parent table. However, the parent table's sampling logic proportionally distributes samples from inherited child tables based on their relative block counts, and stores that as the parent's statistics. Therefore, simply changing the log output to show summed values would imply a change in the sampling methodology itself. While such a modification might be possible in the future, it would deviate from the original purpose of this patch, which is to align the log output between ANALYZE VERBOSE and autoanalyze. For now, I'll exclude this from the current patch.

I plan to post an updated version addressing Chao-san's feedback as well.

Regards,
Tatsuya Kawata

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Tatsuya Kawata

Дата:

19 января, 19:33:21

Hi Fujii-san, Chao-san,

Sorry for the late reply.
I have addressed your comments in the attached v4 patch.

> child_sampling_stats is only used inside if (childblocks > 0), so it would be better to just define it there.

Fixed.

> SamplingStats is a very generic name, maybe rename to AnalyzeSamplingStats or something else.

Fixed. I changed the struct name.

> > With this change, we would end up reporting nearly the same sampling
> > information twice, for example:
> >
> > INFO: analyzing "public.ft"
> > INFO: "ft": table contains 10 rows, 10 rows in sample
> > INFO: finished analyzing table "postgres.public.ft"
> > sampling: 10 rows in sample, 10 estimated total rows
> >
> > Wouldn't it be less confusing to avoid reporting the second sampling line?
>
> I agree, that would be better. I'll fix this.

Fixed. I removed the duplicated messages.

> > I'm not sure it's acceptable to change the FDW API and require
> > FDW authors to update their extensions, especially since
> > the benefit on the FDW side seems limited at this point.
>
> I agree that it's premature to require FDW-side changes at this stage. I'll remove those modifications.

Fixed. I reverted the FDW API changes, so no modifications are required for the FDW API.

> > You may need to update src/tools/pgindent/typedefs.list.
>
> Thank you! I'll check and update it.

Done. Thank you for pointing this out!

> > Based on my testing, the aggregated values look incorrect.
> > In the example below, both t_0 and t_1 report 5,000,000 live rows,
> > but the aggregated result is 6,779,052. Is that the expected
> > aggregation behavior?
>
> I had misunderstood the sampling behavior for inheritance tables. I initially thought it would be straightforward to display the sum of child table values for the parent table. However, the parent table's sampling logic proportionally distributes samples from inherited child tables based on their relative block counts, and stores that as the parent's statistics. Therefore, simply changing the log output to show summed values would imply a change in the sampling methodology itself. While such a modification might be possible in the future, it would deviate from the original purpose of this patch, which is to align the log output between ANALYZE VERBOSE and autoanalyze. For now, I'll exclude this from the current patch.

I have excluded this modification from v4 patch. With v4, the log output now shows per-child sampling statistics for inheritance trees and partitioned tables, which matches the current behavior.
I also conducted several patterns of tests to ensure new log output matches current output. I have attached a SETUP SQL script(setup_tables.sql) and test results(CURRENT:verboselog_current.log, APPLIED PATCH:verboselog_after_patch.log) demonstrating that the ANALYZE VERBOSE output which is the same as the autoanalyze log format.

Regards,
Tatsuya Kawata

Вложения

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Tatsuya Kawata

Дата:

10 февраля, 18:18:26

Hi,

If anyone has had a chance to look at this patch and has any comments or suggestions, I'd be more than happy to address them.
Thanks!

Regards,
Tatsuya Kawata

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Chao Li

Дата:

12 февраля, 05:49:18


> On Feb 10, 2026, at 23:18, Tatsuya Kawata <kawatatatsuya0913@gmail.com> wrote:
>
> Hi,
>
> If anyone has had a chance to look at this patch and has any comments or suggestions, I'd be more than happy to
addressthem. 
> Thanks!
>
> Regards,
> Tatsuya Kawata

I reviewed v4, then applied it locally and played a little bit with it. I don’t have further comment.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

Re: [PATCH] Add sampling statistics to autoanalyze log output

От

Tatsuya Kawata

Дата:

12 февраля, 17:33:55

Hi Chao-san,

Thank you for taking the time to review and test the patch!

Regards,

Tatsuya Kawata

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [PATCH] Add sampling statistics to autoanalyze log output

Вложения

Вложения

Вложения

Вложения