Re: [PERFORM] encouraging index-only scans

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [PERFORM] encouraging index-only scans
Дата
Msg-id CA+Tgmoaf_+kXH6g2syaqaUETJx7=UY47badxv1P16rpPM-oeKw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PERFORM] encouraging index-only scans  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: [PERFORM] encouraging index-only scans  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> The reason I suggested keeping track of the xids of unremovable tuples
> is that the current logic doesn't handle that at all. We just
> unconditionally set n_dead_tuples to zero after a vacuum even if not a
> single row could actually be cleaned out. Which has the effect that we
> will not start a vacuum until enough bloat (or after changing this, new
> inserts) has collected to start vacuum anew. Which then will do twice
> the work.
>
> Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
> much good either - we would just immediately trigger a new vacuum the
> next time we check, even if the xmin horizon is still the same.

One idea would be to store the xmin we used for the vacuum somewhere.
Could we make that part of the pgstats infrastructure?  Or store it in
a new pg_class column?  Then we could avoid re-triggering until it
advances.  Or, maybe better, we could remember the oldest XID that we
weren't able to remove due to xmin considerations and re-trigger when
the horizon passes that point.

>> However, I do have one concern: it might lead to excessive
>> index-vacuuming.  Right now, we skip the index vac step only if there
>> ZERO dead tuples are found during the heap scan.  Even one dead tuple
>> (or line pointer) will cause an index vac cycle, which may easily be
>> excessive.  So I further propose that we introduce a threshold for
>> index-vac; so that we only do index vac cycle if the number of dead
>> tuples exceeds, say 0.1% of the table size.
>
> Yes, that's a pretty valid concern. But we can't really do it that
> easily. a) We can only remove dead line pointers when we know there's no
> index pointing to it anymore. Which we only know after the index has
> been removed. b) We cannot check the validity of an index pointer if
> there's no heap tuple for it. Sure, we could check whether we're
> pointing to a dead line pointer, but the random io costs of that are
> prohibitive.
> Now, we could just mark line pointers as dead and not mark that page as
> all-visible and pick it up again on the next vacuum cycle. But that
> would suck long-term.
>
> I think the only real solution here is to store removed tuples tids
> (i.e. items where we've marked as dead) somewhere. Whenever we've found
> sufficient tuples to-be-removed from indexes we do phase 2.

I don't really agree with that.  Yes, we could make that change, and
yes, it might be better than what we're doing today, but it would be
complex and have its own costs.  And it doesn't mean that lesser steps
are without merit.  A vacuum pass over the heap buys us a LOT of space
for reuse even without touching the indexes: we don't reclaim the line
pointers, but we do reclaim the space for the tuples themselves, which
is a big deal.  So being able to do that more frequently without
causing problems has a lot of value, I think.  The fact that we get to
set all-visible bits along the way makes future vacuums cheaper, and
makes index scans work better, so that's good too.  And the first
vacuum to find a dead tuple will dirty the page to truncate it to a
dead line pointer, while any subsequent revisits prior to the index
vac cycle will only examine the page without dirtying it.  All in all,
just leaving the page to be caught be a future vacuum doesn't seem
that bad to me, at least for a first cut.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: record identical operator
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: Assertions in PL/PgSQL