Re: vacuum -vs reltuples on insert only index

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: vacuum -vs reltuples on insert only index
Дата
Msg-id CAH2-Wz=cY7hOt7pZNLumvf5Ug0RzRXkPKWjSSx=m=v7o98pF_A@mail.gmail.com
обсуждение исходный текст
Ответ на vacuum -vs reltuples on insert only index  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Ответы Re: vacuum -vs reltuples on insert only index  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Fri, Oct 23, 2020 at 8:51 AM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:
> Before 0d861bbb70, btvacuumpage was adding to relation stats the number of
> leaving lines in the block using:
>
>   stats->num_index_tuples += maxoff - minoff + 1;
>
> After 0d861bbb70, it is set using new variable nhtidslive:
>
>   stats->num_index_tuples += nhtidslive
>
> However, nhtidslive is only incremented if callback (IndexBulkDeleteCallback)
> is set, which seems not to be the case on select-only workload.

I agree that that's a bug.

> A naive fix might be to use "maxoff - minoff + 1" when callback==NULL.

The problem with that is that we really should use nhtidslive (or
something like it), and we're not really willing to do the work to get
that information when callback==NULL. We could use "maxoff - minoff +
1" in the way you suggest, but that will be only ~30% of what
nhtidslive would be in pages where deduplication is maximally
effective (which is not at all uncommon -- you only need about 10 TIDs
per distinct value for the space savings to saturate like this).

GIN does this for cleanup (but not for delete, which has a real count
available):

/*
 * XXX we always report the heap tuple count as the number of index
 * entries.  This is bogus if the index is partial, but it's real hard to
 * tell how many distinct heap entries are referenced by a GIN index.
 */
stats->num_index_tuples = Max(info->num_heap_tuples, 0);
stats->estimated_count = info->estimated_count;

I suspect that we need to move in this direction within nbtree. I'm a
bit concerned about the partial index problem, though. I suppose maybe
we could do it the old way (which won't account for posting list
tuples) during cleanup as you suggest, but only use the final figure
when it turns out to have been a partial indexes. For other indexes we
could do what GIN does here.

Anybody else have thoughts on this?

--
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: James Coleman
Дата:
Сообщение: [var]char versus character [varying]
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: new heapcheck contrib module