Re: Remaining case where reltuples can become distorted across multiple VACUUM operations

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Remaining case where reltuples can become distorted across multiple VACUUM operations
Дата
Msg-id CAH2-WznSniwVEQv7pT03SGytNj6EKw1-YOv4N6WZgYuWrwmZvQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Remaining case where reltuples can become distorted across multiple VACUUM operations  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы Re: Remaining case where reltuples can become distorted across multiple VACUUM operations  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Список pgsql-hackers
On Mon, Aug 8, 2022 at 9:17 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Because if a subset of the pages of a relation contains more tuples
> than your current total expected tuples in the table, you should
> update your expectations regardless of which blocks or which number of
> blocks you've scanned - the previous stored value is a strictly worse
> estimation than your last measurement.

The previous stored value could be -1, which represents the idea that
we don't know the tuple density yet. So it doesn't necessarily follow
that the new estimate is strictly better, even in this exact scenario.

> A 33-block relation with first 32 1-tuple pages is still enough to
> have a last page with 250 tuples, which would be ignored in that
> scheme and have a total tuple count of 33 or so.

The simple fact is that there is only so much we can do with the
limited information/context that we have. Heuristics are not usually
free of all bias. Often the bias is the whole point -- the goal can be
to make sure that we have the bias that we know we can live with, and
not the opposite bias, which is much worse. Details of which are
usually very domain specific.

I presented my patch with a very simple test case -- a very clear
problem. Can you do the same for this scenario?

I accept that it is possible that we'll keep an old reltuples which is
provably less accurate than doing something with the latest
information from vacuumlazy.c. But the conditions under which this can
happen are *very* narrow. I am not inclined to do anything about it
for that reason.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: 2022-08-11 release announcement draft
Следующее
От: Mark Wong
Дата:
Сообщение: Re: Asking for feedback on Pgperffarm