Re: relfilenode statistics
| От | Michael Paquier |
|---|---|
| Тема | Re: relfilenode statistics |
| Дата | |
| Msg-id | aRGoGcOdutTHQfpn@paquier.xyz обсуждение исходный текст |
| Ответ на | Re: relfilenode statistics (Michael Paquier <michael@paquier.xyz>) |
| Ответы |
Re: relfilenode statistics
|
| Список | pgsql-hackers |
On Sun, Nov 09, 2025 at 08:33:54AM +0900, Michael Paquier wrote: > Looking at this part of the patch set for now, not looked at the rest > yet. This new stats_1.out is 2k lines long, introduced for the tests > related to rewrites as an effect of 2PC. It seems to me that a split > into a new stats_rewrite would be justified for this case, to reduce > the output duplication. The first patch had an issue with some of the tests checking for dead tuples: if an autovacuum kicks in before querying the stats, we would get a dead tuple number of 0. So I have expanded the tests a bit to avoid autovacuum interactions, which should be enough to avoid noise, did a split into a new file, which should also be fine because we don't rely on a system-wide stats reset, then applied the result. The patch is spending a great deal of effort on three fronts: - making sure that the statistics are copied over after a relation rewrite. - making sure that we assign a "correct" object ID, assigning the fields of RelFileLocator based on a relation ID. Mapped and shared relations make the exercise a bit more difficult. It would be nice to avoid this kind of duplication with other code paths that assign a RelFileLocator. - Partitioned tables, where we don't have a relfilenode but we need to track statistics. The patch relies on the relation oid to assign a key, as far as I've read. Among the three points, the first one is the most invasive in the patch, it seems, and do we actually want to keep the stats across rewrites at all? The main reason of doing the relfilenode move would be to rebuild these stats on a WAL-record basis because the relfile locator is the only thing we know in the startup process, and once rewritten the state of the data is different. relation_needs_vacanalyze() then cares about three fields: - Number of dead tuples, which would be 0 after a rewrite. - ins_since_vacuum, which would be 0 after a rewrite. - mod_since_analyze, for analyze, again 0. I have not checked the recent autovacuum scheduling thread to see if this set changes there. Are these numbers worth the effort of copying over at the end? Was this particular point discussed? I've seen this mentioned once here, but I am wondering what are the arguments in favor of copying the stats data versus not copying it across rewrites: https://www.postgresql.org/message-id/20240607031736.7izmr2yirznvidka%40awork3.anarazel.de -- Michael
Вложения
В списке pgsql-hackers по дате отправления: