Re: Partial vacuum versus pg_class.reltuples

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Partial vacuum versus pg_class.reltuples
Дата
Msg-id 17119.1244402651@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Partial vacuum versus pg_class.reltuples  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Partial vacuum versus pg_class.reltuples  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> Am I wrong to be frightened by the implications of updating this value
> only once in a blue moon?

It's not great, but I think it's probably not catastrophic either.
Keep in mind that all we need from reltuples is that the ratio
reltuples/relpages be a reasonable estimate of the density of live
tuples, because what the planner actually uses is GetRelationNumberOfBlocks() * reltuples / relpages.
So for example an append-only table isn't a big problem, even if it's
been quite a while since we updated reltuples and relpages.

There was some mention of having a partial vacuum extrapolate a value of
reltuples and update pg_class with that.  I'm afraid that that could be
a seriously bad idea; because there is no very good reason to suppose
that the subset of recently-modified pages forms a good sample of the
whole table as far as live-tuple density goes.

[ thinks a bit and reads the code some more ... ]  There is a
considerably safer alternative, which is to let ANALYZE update the
reltuples estimate based on the pages it sampled; which should be a
considerably less biased sample than the pages a partial vacuum would
have looked at.  And we have already got the code doing that, either
in a standalone ANALYZE or an ANALYZE attached to a vacuum that turned
out to be partial.

So actually I think we are in better shape than I realized, so far as
the heap reltuples numbers go.  The case that's a bit nasty is where
we are propagating the heap reltuples number to the index reltuples
number for a GIN index.  (Remember this only matters for a partial
index.)  As the code stands today, what we'll be propagating is the
reltuples estimate from the most recent ANALYZE, not the ANALYZE that
we might be about to conduct.  This is not great; but considering that
we are completely ignoring the first-order problem of the partial index
predicate's selectivity, quibbling about a second-order effect like the
estimate being out of date is pretty pointless.

> Do we have any reasonable manual way of forcing
> VACUUM to scan the entire heap?

You could use VACUUM FREEZE, for instance.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: pg_migrator issue with contrib
Следующее
От: Stefan Kaltenbrunner
Дата:
Сообщение: Re: pg_migrator issue with contrib