Re: relfilenode statistics

Поиск

Список

Период

Сортировка

От	Michael Paquier
Тема	Re: relfilenode statistics
Дата	10 ноября 11:53:45
Msg-id	aRGoGcOdutTHQfpn@paquier.xyz обсуждение исходный текст
Ответ на	Re: relfilenode statistics (Michael Paquier <michael@paquier.xyz>)
Ответы	Re: relfilenode statistics
Список	pgsql-hackers

Дерево обсуждения

On Sun, Nov 09, 2025 at 08:33:54AM +0900, Michael Paquier wrote:
> Looking at this part of the patch set for now, not looked at the rest
> yet.  This new stats_1.out is 2k lines long, introduced for the tests
> related to rewrites as an effect of 2PC.  It seems to me that a split
> into a new stats_rewrite would be justified for this case, to reduce
> the output duplication.

The first patch had an issue with some of the tests checking for dead
tuples: if an autovacuum kicks in before querying the stats, we would
get a dead tuple number of 0.  So I have expanded the tests a bit to
avoid autovacuum interactions, which should be enough to avoid noise,
did a split into a new file, which should also be fine because we
don't rely on a system-wide stats reset, then applied the result.

The patch is spending a great deal of effort on three fronts:
- making sure that the statistics are copied over after a relation
rewrite.
- making sure that we assign a "correct" object ID, assigning
the fields of RelFileLocator based on a relation ID.  Mapped and
shared relations make the exercise a bit more difficult.  It would be
nice to avoid this kind of duplication with other code paths that
assign a RelFileLocator.
- Partitioned tables, where we don't have a relfilenode but we need to
track statistics.  The patch relies on the relation oid to assign a
key, as far as I've read.

Among the three points, the first one is the most invasive in the
patch, it seems, and do we actually want to keep the stats across
rewrites at all?  The main reason of doing the relfilenode move
would be to rebuild these stats on a WAL-record basis because the
relfile locator is the only thing we know in the startup process, and
once rewritten the state of the data is different.
relation_needs_vacanalyze() then cares about three fields:
- Number of dead tuples, which would be 0 after a rewrite.
- ins_since_vacuum, which would be 0 after a rewrite.
- mod_since_analyze, for analyze, again 0.

I have not checked the recent autovacuum scheduling thread to see if
this set changes there.

Are these numbers worth the effort of copying over at the end?  Was
this particular point discussed?  I've seen this mentioned once here,
but I am wondering what are the arguments in favor of copying the
stats data versus not copying it across rewrites:
https://www.postgresql.org/message-id/20240607031736.7izmr2yirznvidka%40awork3.anarazel.de
--
Michael

Вложения

signature.asc

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: relfilenode statistics

Вложения