Обсуждение: Preserve index stats during ALTER TABLE ... TYPE ...
Hi hackers,
while working on relfilenode statistics [1], I observed that index stats
are not preserved during ALTER TABLE ... TYPE ....
Indeed, for example:
postgres=# CREATE TABLE test_tab(a int primary key, b int, c int);
CREATE INDEX test_b_idx ON test_tab(b);
-- Force an index scan on test_b_idx
SELECT * FROM test_tab WHERE b = 2;
CREATE TABLE
CREATE INDEX
a | b | c
---+---+---
(0 rows)
postgres=# select indexrelname, idx_scan from pg_stat_all_indexes where indexrelname in ('test_b_idx',
'test_tab_pkey');
indexrelname | idx_scan
---------------+----------
test_tab_pkey | 0
test_b_idx | 1
(2 rows)
postgres=# select idx_scan from pg_stat_all_tables where relname = 'test_tab';
idx_scan
----------
1
(1 row)
postgres=# ALTER TABLE test_tab ALTER COLUMN b TYPE int;
ALTER TABLE
postgres=# select indexrelname, idx_scan from pg_stat_all_indexes where indexrelname in ('test_b_idx',
'test_tab_pkey');
indexrelname | idx_scan
---------------+----------
test_tab_pkey | 0
test_b_idx | 0
(2 rows)
postgres=# select idx_scan from pg_stat_all_tables where relname = 'test_tab';
idx_scan
----------
0
(1 row)
During ALTER TABLE ... TYPE ... on an indexed column, a new index is created and
the old one is dropped.
As you can see, the index stats (linked to the column that has been altered) are
not preserved. I think that they should be preserved (like a REINDEX does).
Note that the issue is the same if a rewrite is involved (ALTER TABLE test_tab
ALTER COLUMN b TYPE bigint).
PFA, a patch to $SUBJECT.
A few remarks:
- We can not use pgstat_copy_relation_stats() because the old index is dropped
before the new one is created, so the patch adds a new PgStat_StatTabEntry
pointer in AlteredTableInfo.
- The stats are saved in ATPostAlterTypeParse() (before the old index is dropped)
and restored in ATExecAddIndex() once the new index is created.
- Note that pending statistics (if any) are not preserved, only the
accumulated stats from previous transactions. I think this is
acceptable since the accumulated stats represent the historical usage patterns we
want to maintain.
- The patch adds a few tests to cover multiple scenarios (with and without
rewrites, and indexes with and without associated constraints).
- I'm not familiar with this area of the code, the patch is an attempt to fix
the issue, maybe there is a more elegant way to solve it.
- The issue exists back to v13, but I'm not sure that's serious enough for
back-patching.
Looking forward to your feedback,
Regards,
[1]: https://postgr.es/m/ZlGYokUIlERemvpB%40ip-10-97-1-34.eu-west-3.compute.internal
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения
Hi,
Thanks for raising this issue and for the patch!
> As you can see, the index stats (linked to the column that has been altered) are
> not preserved. I think that they should be preserved (like a REINDEX does).
I agree.
> - We can not use pgstat_copy_relation_stats() because the old index is dropped
> before the new one is created, so the patch adds a new PgStat_StatTabEntry
> pointer in AlteredTableInfo.
I wonder if it will be good to have a pgstat_save_relation_stats() routine that
gets called in all code paths that will need to restore the stats. This way
pgstat_copy_relation_stats can also be used. This will be cleaner than code
paths that need this having to deal with pgstat_fetch_stat_tabentry?
Have not thought this thoroughly, but it seems like it might be a more general
approach.
> - The patch adds a few tests to cover multiple scenarios (with and without
> rewrites, and indexes with and without associated constraints).
The current patch does not work for partitioned tables because
the "oldId" is that of the parent index which has no stats. So we
are just copying zeros to the new entry.
```
DROP TABLE test_tab;
CREATE TABLE test_tab(a int primary key, b int, c int) partition by range (a);
CREATE TABLE test_tab_p1 PARTITION OF test_tab
FOR VALUES FROM (0) TO (100);
CREATE TABLE test_tab_p2 PARTITION OF test_tab
FOR VALUES FROM (100) TO (200);
CREATE INDEX test_b_idx ON test_tab(b);
-- Force an index scan on test_b_idx
SELECT * FROM test_tab WHERE b = 2;
test=# select indexrelname, idx_scan from pg_stat_all_indexes where
indexrelname like '%test%';
indexrelname | idx_scan
-------------------+----------
test_tab_p1_pkey | 0
test_tab_p2_pkey | 0
test_tab_p1_b_idx | 1
test_tab_p2_b_idx | 1
(4 rows)
test=# ALTER TABLE test_tab ALTER COLUMN b TYPE int;
ALTER TABLE
test=# select indexrelname, idx_scan from pg_stat_all_indexes where
indexrelname like '%test%';
indexrelname | idx_scan
-------------------+----------
test_tab_p1_pkey | 0
test_tab_p2_pkey | 0
test_tab_p1_b_idx | 0
test_tab_p2_b_idx | 0
(4 rows)
```
Regards,
--
Sami Imseih
Amazon Web Services (AWS)
Hi, On Fri, Oct 10, 2025 at 07:37:59AM -0500, Sami Imseih wrote: > Hi, > > Thanks for raising this issue and for the patch! Thanks for looking at it! > > As you can see, the index stats (linked to the column that has been altered) are > > not preserved. I think that they should be preserved (like a REINDEX does). > > I agree. > > > - We can not use pgstat_copy_relation_stats() because the old index is dropped > > before the new one is created, so the patch adds a new PgStat_StatTabEntry > > pointer in AlteredTableInfo. > > I wonder if it will be good to have a pgstat_save_relation_stats() routine that > gets called in all code paths that will need to restore the stats. This way > pgstat_copy_relation_stats can also be used. This will be cleaner than code > paths that need this having to deal with pgstat_fetch_stat_tabentry? pgstat_copy_relation_stats() needs 2 Relation, I'm not sure how a new pgstat_save_relation_stats() could help using pgstat_copy_relation_stats() here. > The current patch does not work for partitioned tables because > the "oldId" is that of the parent index which has no stats. So we > are just copying zeros to the new entry. Doh, of course. I've spend some time on it and now have something working. The idea is to: - store a List of savedIndexStats. The savedIndexStats struct would get the PgStat_StatTabEntry + all the information needed to be able to use CompareIndexInfo() when restoring the stats (so that we can restore each PgStat_StatTabEntry in the right index). - Iterate on all the indexes and populate this new list in AlteredTableInfo in ATPostAlterTypeParse(). - Iterate on all the indexes and use the list above and CompareIndexInfo() to restore the stats in ATExecAddIndex(). Will polish and share next week. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi,
On Fri, Oct 10, 2025 at 03:52:58PM +0000, Bertrand Drouvot wrote:
> The idea is to:
>
> - store a List of savedIndexStats. The savedIndexStats struct would get the
> PgStat_StatTabEntry + all the information needed to be able to use
> CompareIndexInfo() when restoring the stats (so that we can restore each PgStat_StatTabEntry
> in the right index).
>
> - Iterate on all the indexes and populate this new list in AlteredTableInfo in
> ATPostAlterTypeParse().
>
> - Iterate on all the indexes and use the list above and CompareIndexInfo() to
> restore the stats in ATExecAddIndex().
>
> Will polish and share next week.
PFA v2 that handles partitioned tables/indexes.
A few words about its design:
I started by just creating a list of of PgStat_StatTabEntry + all the information
needed to be able to use CompareIndexInfo() when restoring the stats.
But that lead to O(P^2) when restoring the stats (for each new partition index
(P), it was scanning through all saved ones (P)), and could be non negligible.
For example, with 20K partitions and no rewrite:
- 89.64% 0.00% postgres postgres [.] ATController
- ATController
- 79.23% ATRewriteCatalogs
- 64.43% ATExecCmd
- 56.53% ATExecAddIndex
+ 46.34% DefineIndex
+ 10.19% ATExecAddIndex_RestoreStats
+ 5.29% ATExecAlterColumnType
+ 2.60% CommandCounterIncrement
+ 11.91% ATPostAlterTypeCleanup
+ 2.77% relation_open
+ 8.79% ATPrepCmd
+ 1.62% ATRewriteTables
We can see ATExecAddIndex_RestoreStats was not negligible at that time. That was
less of an issue when rewrite was involved:
- 89.35% 0.00% postgres postgres [.] ATController
- ATController
+ 51.24% ATRewriteTables
- 33.89% ATRewriteCatalogs
- 26.98% ATExecCmd
- 22.16% ATExecAddIndex
+ 17.44% DefineIndex
4.71% ATExecAddIndex_RestoreStats
+ 3.58% ATExecAlterColumnType
+ 1.24% CommandCounterIncrement
+ 5.53% ATPostAlterTypeCleanup
+ 1.32% relation_open
+ 4.22% ATPrepCmd
So I added a hash table keyed by partition table OID, with each entry containing
a list of saved index stats for that partition. This way, restoration is now O(P)
instead of O(P²).
With the attached, the perf profile (again 20K partitions and no rewrite) is:
- 89.06% 0.00% postgres postgres [.] ATController
- ATController
- 77.65% ATRewriteCatalogs
- 61.57% ATExecCmd
- 52.63% ATExecAddIndex
+ 51.16% DefineIndex
+ 1.47% ATExecAddIndex_RestoreStats
+ 5.96% ATExecAlterColumnType
+ 2.98% CommandCounterIncrement
+ 13.26% ATPostAlterTypeCleanup
+ 2.73% relation_open
+ 9.59% ATPrepCmd
+ 1.82% ATRewriteTables
As we can see, the ATExecAddIndex_RestoreStats impact is now around 1.5%, which
I think is acceptable given the benefit of preserving historical statistics.
Additional remarks:
- I initially tried using only CompareIndexInfo() for matching, but this fails
when multiple indexes exist on the same column(s). So I added the index name as
the primary matching check with CompareIndexInfo() kept as a sanity check (I think
that it could be removed).
- The new resources are allocated in the PortalContext, it's a short lived one so
the patch does not free them explicitly.
- Much more tests have been added as compared to v1.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
Вложения
On Fri, Oct 10, 2025 at 07:37:59AM -0500, Sami Imseih wrote: >> As you can see, the index stats (linked to the column that has been altered) are >> not preserved. I think that they should be preserved (like a REINDEX does). > > I agree. Hmm. Why should it be always OK to preserve the stats of an index when one of its attributes is changed so as a relation is rewritten? A REINDEX (including CONCURRENTLY), while it initiates a rewrite of the index, does not change the definition of the underlying index. A type alteration, on the contrary, does. Hence, the planner may decide to treat a given index differently (doesn't it? Tuple width or whole-row references come into mind). Keeping the past stats may actually lead to confusing conclusions when overlapping them with some of the new number generated under the new type? Could there be more benefits in always resetting them as we do now? Any thoughts from others? -- Michael
Вложения
Michael Paquier <michael@paquier.xyz> writes:
> On Fri, Oct 10, 2025 at 07:37:59AM -0500, Sami Imseih wrote:
> As you can see, the index stats (linked to the column that has been altered) are
> not preserved. I think that they should be preserved (like a REINDEX does).
> Hmm. Why should it be always OK to preserve the stats of an index
> when one of its attributes is changed so as a relation is rewritten?
Right offhand, this proposal seems utterly unsafe, to the point of
maybe introducing security-grade bugs. I see that the patch compares
opfamilies but that seems insufficient, since "same opfamily" does not
mean "binary compatible". We could easily be restoring stats whose
binary content is incompatible with the new column type.
regards, tom lane
On Thu, Oct 16, 2025 at 01:38:19AM -0400, Tom Lane wrote: > Michael Paquier <michael@paquier.xyz> writes: >> Hmm. Why should it be always OK to preserve the stats of an index >> when one of its attributes is changed so as a relation is rewritten? > > Right offhand, this proposal seems utterly unsafe, to the point of > maybe introducing security-grade bugs. I see that the patch compares > opfamilies but that seems insufficient, since "same opfamily" does not > mean "binary compatible". We could easily be restoring stats whose > binary content is incompatible with the new column type. The point of the thread is about copying the aggregated numbers stored in pgstats. These numbers have a fixed size, for contents in PgStat_StatTabEntry. The point of the patch is about copying these entries in the pgstats hash table across rewrites, so I am not sure to follow your argument. My point was slightly different: I am questioning if a reset does not make more sense in most cases as an attribute type change may cause the planner to choose a different Path, making the new stats generated leading to decisions that are inconsistent when aggregated with the numbers copied across the rewrites. -- Michael
Вложения
Hi, On Thu, Oct 16, 2025 at 02:06:01PM +0900, Michael Paquier wrote: > On Fri, Oct 10, 2025 at 07:37:59AM -0500, Sami Imseih wrote: > >> As you can see, the index stats (linked to the column that has been altered) are > >> not preserved. I think that they should be preserved (like a REINDEX does). > > > > I agree. > > Hmm. Why should it be always OK to preserve the stats of an index > when one of its attributes is changed so as a relation is rewritten? I agree that in this case the stats (namely idx_scan, idx_tup_read and idx_tup_fetch) would represent a mixture of two different index structures. > Hence, the planner may decide > to treat a given index differently (doesn't it? Tuple width or > whole-row references come into mind). I do think so, yes. > Keeping the past stats may > actually lead to confusing conclusions when overlapping them with some > of the new number generated under the new type? Could there be more > benefits in always resetting them as we do now? The issue is that these stats are also exposed at the table level (idx_scan, last_idx_scan, idx_tup_fetch in pg_stat_all_tables). That's valuable information for understanding table access patterns that is currently lost. It would make more sense to reset the index stats if table level stats were tracked independently from the underlying index stats. Also, users already have pg_stat_reset_single_table_counters() if they want to reset the index stats. This patch gives users the choice to preserve stats or reset them. Currently, they have no choice: the stats are always lost. Also, when the rewrite also occurs on the table (type changes) a stat like seq_scan is preserved (because the table Oid does not change, only the relfilenode does). Why would it be ok to preserve seq_scan and not idx_scan? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Hi, On Thu, Oct 16, 2025 at 03:09:24PM +0900, Michael Paquier wrote: > On Thu, Oct 16, 2025 at 01:38:19AM -0400, Tom Lane wrote: > > Michael Paquier <michael@paquier.xyz> writes: > >> Hmm. Why should it be always OK to preserve the stats of an index > >> when one of its attributes is changed so as a relation is rewritten? > > > > Right offhand, this proposal seems utterly unsafe, to the point of > > maybe introducing security-grade bugs. I see that the patch compares > > opfamilies but that seems insufficient, since "same opfamily" does not > > mean "binary compatible". We could easily be restoring stats whose > > binary content is incompatible with the new column type. > > The point of the thread is about copying the aggregated numbers stored > in pgstats. These numbers have a fixed size, for contents in > PgStat_StatTabEntry. The point of the patch is about copying these > entries in the pgstats hash table across rewrites, so I am not sure to > follow your argument. Same here. > My point was slightly different: I am questioning if a reset does not > make more sense in most cases as an attribute type change may cause > the planner to choose a different Path, making the new stats generated > leading to decisions that are inconsistent when aggregated with the > numbers copied across the rewrites. See my reply in [1]. [1]: https://www.postgresql.org/message-id/aPCVvWZjvvC1ZO78%40ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
>> Hence, the planner may decide >> to treat a given index differently (doesn't it? Tuple width or >> whole-row references come into mind). > I do think so, yes. The planner may also treat the index differently after ALTER INDEX ... ALTER COLUMN ... SET STATISTICS ...; ANALYZE, but in I am not sure the planner aspect is a good reason to not preserve cumulative stats for an index. In the case where the table is not rewritten, Isn't that a clear case in which stats should be preserved? > > Keeping the past stats may > > actually lead to confusing conclusions when overlapping them with some > > of the new number generated under the new type? Could there be more > > benefits in always resetting them as we do now? > > The issue is that these stats are also exposed at the table level > (idx_scan, last_idx_scan, idx_tup_fetch in pg_stat_all_tables). > That's valuable information for understanding table access patterns > that is currently lost. > > It would make more sense to reset the index stats if table level > stats were tracked independently from the underlying index stats. This sounds like a good enhancement. This will also take care of the index stats being preserved on a table in the case an index is dropped. But that means we will need some new fields to aggregate index access in PgStat_StatTabEntry, which may not be so good in terms of memory and performance. -- Sami
Hi, On Thu, Oct 16, 2025 at 03:39:59PM -0500, Sami Imseih wrote: > > > > The issue is that these stats are also exposed at the table level > > (idx_scan, last_idx_scan, idx_tup_fetch in pg_stat_all_tables). > > That's valuable information for understanding table access patterns > > that is currently lost. > > > > It would make more sense to reset the index stats if table level > > stats were tracked independently from the underlying index stats. > > This sounds like a good enhancement. This will also take care of the > index stats being preserved on a table in the case an index is dropped. > > But that means we will need some new fields to aggregate index access > in PgStat_StatTabEntry, Yeah, we'd need to add say: total_idx_numscans idx_lastscan total_tuples_idx_fetched to get rid of the pg_stat_get_*() calls on the indexes in pg_stat_all_tables(). That way we don't need to worry about copying the statistics during the alter command. > which may not be so good in > terms of memory and performance. Performance: We could populate those fields at the "table" level when we flush the index stats (similar to what we do currently for some tables stats that populate some database stats at flush time). That would avoid double incrementing. Memory: Adding those 3 extra fields to PgStat_StatTabEntry does not worry me that much given the number of fields already in PgStat_StatTabEntry. The thing that is not ideal is that as PgStat_StatTabEntry is currently used for both tables and indexes stats then we'll add fields that would be only used for the table case. But that's already the case for some other fields and this will be "solved" once we'll resume working on "Split index and table statistics into different types of stats" ([1]) means after relfilenode stats ([2]) are implemented (I'm currently working on it). I prefer this approach as compared to the current proposal (copying the stats during the alter command). Thoughts? [1]: https://www.postgresql.org/message-id/flat/f572abe7-a1bb-e13b-48c7-2ca150546822%40gmail.com [2]: https://www.postgresql.org/message-id/flat/ZlGYokUIlERemvpB@ip-10-97-1-34.eu-west-3.compute.internal Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Thu, Oct 16, 2025 at 03:39:59PM -0500, Sami Imseih wrote: > This sounds like a good enhancement. This will also take care of the > index stats being preserved on a table in the case an index is dropped. > > But that means we will need some new fields to aggregate index access > in PgStat_StatTabEntry, which may not be so good in > terms of memory and performance. Putting aside the should-we-preserve-index-stats-on-relation-rewrite problem for a minute. FWIW, I think that aiming at less memory per entry is better in the long term, because we are that it's going to be cheaper. One thing that's been itching me quite a bit with pgstat_relation.c lately is that PgStat_StatTabEntry is being used by both tables and indexes, but we don't care about the most of its fields for indexes. The ones I can see as used for indexes are: - blocks_hit - blocks_fetched - reset_time - tuples_returned - tuples_fetched - lastscan - numscan This means that we don't care about the business around HOT, vacuum (we could care about the vacuum timings for individual index cleanups), analyze, live/dead tuples. It may be time to do a clean split, even if the current state of business in pgstat.h is a kind of historical thing. -- Michael
Вложения
Hi, On Mon, Oct 20, 2025 at 10:53:37AM +0900, Michael Paquier wrote: > On Thu, Oct 16, 2025 at 03:39:59PM -0500, Sami Imseih wrote: > > This sounds like a good enhancement. This will also take care of the > > index stats being preserved on a table in the case an index is dropped. > > > > But that means we will need some new fields to aggregate index access > > in PgStat_StatTabEntry, which may not be so good in > > terms of memory and performance. > > Putting aside the should-we-preserve-index-stats-on-relation-rewrite > problem for a minute. Okay. > FWIW, I think that aiming at less memory per entry is better in the > long term, because we are that it's going to be cheaper. One thing > that's been itching me quite a bit with pgstat_relation.c lately is > that PgStat_StatTabEntry is being used by both tables and indexes, but > we don't care about the most of its fields for indexes. The ones I > can see as used for indexes are: > - blocks_hit > - blocks_fetched > - reset_time > - tuples_returned > - tuples_fetched > - lastscan > - numscan > > This means that we don't care about the business around HOT, vacuum > (we could care about the vacuum timings for individual index > cleanups), analyze, live/dead tuples. Exactly, and that's one of the reasons why the "Split index and table statistics into different types of stats" work ([1]) started. > It may be time to do a clean split, even if the current state of > business in pgstat.h is a kind of historical thing. Yeah, but maybe it would make more sense to look at this once the relfilenode stats one ([2]) is done? (see [3]). [1]: https://www.postgresql.org/message-id/f572abe7-a1bb-e13b-48c7-2ca150546822%40gmail.com [2]: https://www.postgresql.org/message-id/flat/ZlGYokUIlERemvpB@ip-10-97-1-34.eu-west-3.compute.internal [3]: https://www.postgresql.org/message-id/20230105002733.ealhzubjaiqis6ua%40awork3.anarazel.de Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Mon, Oct 20, 2025 at 06:22:00AM +0000, Bertrand Drouvot wrote: > On Mon, Oct 20, 2025 at 10:53:37AM +0900, Michael Paquier wrote: >> It may be time to do a clean split, even if the current state of >> business in pgstat.h is a kind of historical thing. > > Yeah, but maybe it would make more sense to look at this once the relfilenode > stats one ([2]) is done? (see [3]). Ah, right, that rings a bell now. So as you mention the history of events is that the refactoring related to relfilenodes should happen first. Maybe we should just focus on that for now, then. TBH, I cannot get excited for the moment in making tablecmds.c more complex regarding its stats handling on rewrite without knowing if it could become actually simpler. This is also assuming that we actually do something about it, at the end, which is not something I am sure is worth the extra complications in ALTER TABLE. And perhaps we could get some nice side effects of the other discussion for what you are proposing (first answer points to no, but it's hard to say as well if that would be a definitive answer). -- Michael