Обсуждение: [PATCH] Add sampling statistics to autoanalyze log output
Hi,
I would like to propose a patch to add sampling statistics to autoanalyze log output, addressing an inconsistency between ANALYZE VERBOSE and autoanalyze logging.
## Problem
Currently, ANALYZE VERBOSE displays sampling statistics, but autoanalyze does not log this information.
This makes it harder to diagnose issues with automatic statistics collection.
Example (current behavior):
- ANALYZE VERBOSE: Shows "INFO: "pg_class": scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total rows."
- autoanalyze: No sampling information
## Solution
This patch unifies the logging output by moving sampling statistics from acquire_sample_rows() to do_analyze_rel()'s instrumentation section. Now
both ANALYZE VERBOSE and autoanalyze output the same sampling information in a consolidated log message.
Key changes:
1. Updated AcquireSampleRowsFunc typedef to include 4 new output parameters
2. Modified acquire_sample_rows() and acquire_inherited_sample_rows() to populate these parameters
3. Added sampling statistics output in do_analyze_rel()
4. Updated postgres_fdw and file_fdw implementations
## Example Output
After the patch(adding both ANALYZE VERBOSE and autoanalyze) :
sampling: scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total rows
For inherited tables, statistics are accumulated across all children.
## Design Question
For inherited tables, the current patch shows only the accumulated total.
An alternative approach would be to show per-child statistics followed by the total.
I wanted to align with do_analyze_rel()'s structure to properly support autoanalyze (autovacuum) logging.
However, I haven't found a clean way to preserve per-child output while maintaining this structure.
I would appreciate any advice or suggestions on how to achieve both goals if there's a better approach I'm missing.
I would appreciate your feedback!
Regards,
I would like to propose a patch to add sampling statistics to autoanalyze log output, addressing an inconsistency between ANALYZE VERBOSE and autoanalyze logging.
## Problem
Currently, ANALYZE VERBOSE displays sampling statistics, but autoanalyze does not log this information.
This makes it harder to diagnose issues with automatic statistics collection.
Example (current behavior):
- ANALYZE VERBOSE: Shows "INFO: "pg_class": scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total rows."
- autoanalyze: No sampling information
## Solution
This patch unifies the logging output by moving sampling statistics from acquire_sample_rows() to do_analyze_rel()'s instrumentation section. Now
both ANALYZE VERBOSE and autoanalyze output the same sampling information in a consolidated log message.
Key changes:
1. Updated AcquireSampleRowsFunc typedef to include 4 new output parameters
2. Modified acquire_sample_rows() and acquire_inherited_sample_rows() to populate these parameters
3. Added sampling statistics output in do_analyze_rel()
4. Updated postgres_fdw and file_fdw implementations
## Example Output
After the patch(adding both ANALYZE VERBOSE and autoanalyze) :
sampling: scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total rows
For inherited tables, statistics are accumulated across all children.
## Design Question
For inherited tables, the current patch shows only the accumulated total.
An alternative approach would be to show per-child statistics followed by the total.
I wanted to align with do_analyze_rel()'s structure to properly support autoanalyze (autovacuum) logging.
However, I haven't found a clean way to preserve per-child output while maintaining this structure.
I would appreciate any advice or suggestions on how to achieve both goals if there's a better approach I'm missing.
I would appreciate your feedback!
Regards,
Вложения
Hi,
Here is the updated version (v2).
This revision fixes the CI failure reported in the previous patch.
No other functional changes.
Regards,
Here is the updated version (v2).
This revision fixes the CI failure reported in the previous patch.
No other functional changes.
Regards,
Вложения
On Sun, Dec 7, 2025 at 2:40 AM 河田達也 <kawatatatsuya0913@gmail.com> wrote:
>
> Hi,
>
> I would like to propose a patch to add sampling statistics to autoanalyze log output, addressing an inconsistency
betweenANALYZE VERBOSE and autoanalyze logging.
>
> ## Problem
>
> Currently, ANALYZE VERBOSE displays sampling statistics, but autoanalyze does not log this information.
> This makes it harder to diagnose issues with automatic statistics collection.
>
> Example (current behavior):
> - ANALYZE VERBOSE: Shows "INFO: "pg_class": scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434
rowsin sample, 434 estimated total rows."
> - autoanalyze: No sampling information
>
> ## Solution
>
> This patch unifies the logging output by moving sampling statistics from acquire_sample_rows() to do_analyze_rel()'s
instrumentationsection. Now
> both ANALYZE VERBOSE and autoanalyze output the same sampling information in a consolidated log message.
+1 to including sampling information in the autoanalyze log message as well.
> Key changes:
> 1. Updated AcquireSampleRowsFunc typedef to include 4 new output parameters
> 2. Modified acquire_sample_rows() and acquire_inherited_sample_rows() to populate these parameters
> 3. Added sampling statistics output in do_analyze_rel()
> 4. Updated postgres_fdw and file_fdw implementations
>
> ## Example Output
>
> After the patch(adding both ANALYZE VERBOSE and autoanalyze) :
> sampling: scanned 14 of 14 pages, containing 434 live rows and 11 dead rows; 434 rows in sample, 434 estimated total
rows
>
> For inherited tables, statistics are accumulated across all children.
>
> ## Design Question
>
> For inherited tables, the current patch shows only the accumulated total.
> An alternative approach would be to show per-child statistics followed by the total.
> I wanted to align with do_analyze_rel()'s structure to properly support autoanalyze (autovacuum) logging.
> However, I haven't found a clean way to preserve per-child output while maintaining this structure.
> I would appreciate any advice or suggestions on how to achieve both goals if there's a better approach I'm missing.
>
> I would appreciate your feedback!
I noticed an issue with the sampling information shown when
analyzing foreign tables. Based on my testing, the reported values seem
incorrect. Is this expected behavior?
For example, running ANALYZE VERBOSE on a regular table and its foreign
table produces different sampling output:
CREATE EXTENSION postgres_fdw ;
CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw ;
CREATE USER MAPPING FOR public SERVER loopback ;
CREATE TABLE t (i int) ;
CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't') ;
INSERT INTO ft SELECT n FROM generate_series(1, 100000) n ;
The sampling output by ANALYZE VERBOSE t:
sampling: scanned 443 of 443 pages, containing 100000 live rows
and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
The sampling output by ANALYZE VERBOSE ft:
sampling: scanned 1000 of 1000 pages, containing 30261 live rows
and 0 dead rows; 30000 rows in sample, 100000 estimated total rows
In particular, the reported number of scanned pages and live rows for
the foreign table look wrong.
Also, the patch moves the sampling information to appear after
buffer usage, WAL usage, etc. In my opinion, it's more intuitive to
report analyze activity before buffer and WAL usage, as ANALYZE VERBOSE
currently does. VACUUM VERBOSE follows the same pattern,
reporting activity details before buffer and WAL usage.
Regards,
--
Fujii Masao
Hi,
Thanks for the work on this, and I agree that the logging between a manual
ANALYZE and autoanalyze should be unified. +1
I have a few comments:
1/ Wouldn’t it be better to combine the new sampling output fields into a
single struct? This is more idiomatic, similar to BufferUsage or WalUsage,
and it would also make the API easier to extend in the future. Right now,
changing the function signature breaks the AcquireSampleRowsFunc ABI,
which is acceptable for a major release, but using a struct would help
avoid future breaks if we ever add more sampling data in later releases.
```
@@ -535,11 +546,15 @@ do_analyze_rel(Relation onerel, const VacuumParams params,
if (inh)
numrows = acquire_inherited_sample_rows(onerel, elevel,
rows, targrows,
-
&totalrows, &totaldeadrows);
+
&totalrows, &totaldeadrows,
+
&totalpages, &scannedpages,
+
&sampleliverows, &sampledeadrows);
else
numrows = (*acquirefunc) (onerel, elevel,
rows, targrows,
-
&totalrows, &totaldeadrows);
+
&totalrows, &totaldeadrows,
+
&totalpages, &scannedpages,
+
&sampleliverows, &sampledeadrows);
```
2/ I think this patch should just focus on unifying the existing logging only.
The " estimated total rows" for a foreign table should be a separate
thread/patch, IMO.
3/ You should also run pgindent.
Regards,
--
Sami Imseih
Amazon Web Services (AWS)
> On Dec 7, 2025, at 11:37, 河田達也 <kawatatatsuya0913@gmail.com> wrote: > > Hi, > > Here is the updated version (v2). > This revision fixes the CI failure reported in the previous patch. > No other functional changes. > > Regards, > > <v2-0001-Add-sampling-statistics-to-autoanalyze-log-output.patch> I echo Fujii-san’s test result, I think that is because: ``` + *scannedpages = relpages; /* use estimate for foreign tables */ ``` You are using estimated value for foreign tables. I also have a concern about assuming 100 rows per page. IMO, if somethingwe don’t know, we should not pretend that we know. Because from the code comment, we read that that’s assumed, butfrom an end user’s point of view, they may consider the number is real. The other nit comment is, as you add 4 parameters to acquire_sample_rows(), they are all output parameters, I think you needto update the header comment to describe them. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
Hi Fujii-san, Sami-san, Chao-san,
Thank you for the valuable feedback!
I plan to work on addressing these comments this weekend and will post an update.
Regards,
Tatsuya Kawata
Thank you for the valuable feedback!
I plan to work on addressing these comments this weekend and will post an update.
Regards,
Tatsuya Kawata