Re: Single pass vacuum - take 1

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: Single pass vacuum - take 1
Дата
Msg-id CABOikdM80mzK4e9gXZa+30dmEF1Xu6UpH6Jr=7sN9xhoze5D_Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Single pass vacuum - take 1  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Single pass vacuum - take 1
Список pgsql-hackers


On Thu, Jul 21, 2011 at 12:17 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Jul 14, 2011 at 12:43 PM, Heikki Linnakangas
> I think you can sidestep that
> if you check that the page's vacuum LSN <= vacuum LSN in pg_class, instead
> of equality.

I don't think that works, because the point of storing the LSN in
pg_class is to verify that the vacuum completed the index cleanup
without error.  The fact that a newer vacuum accomplished that goal
does not mean that all older ones did.


The way we force the subsequent vacuum to also look at the pages scanned and pruned by previous failed vacuum, all the pages that have dead-vacuum line pointers would have a new stamp once the vacuum finishes successfully and the pg_class would have the same stamp.
 
> Ignoring the issue stated in previous paragraph, I think you wouldn't
> actually need an 64-bit LSN. A smaller counter is enough, as wrap-around
> doesn't matter. In fact, a single bit would be enough. After a successful
> vacuum, the counter on each heap page (with dead line pointers) is N, and
> the value in pg_class is N. There are no other values on the heap, because
> vacuum will have cleaned them up. When you begin the next vacuum, it will
> stamp pages with N+1. So at any stage, there is only one of two values on
> any page, so a single bit is enough. (But as I said, that doesn't hold if
> vacuum skips some pages thanks to the visibility map)

If this can be made to work, it's a very appealing idea.

I thought more about it and for a moment believed that we can do this with just a bit since we rescan the  pages with dead and dead-vacuum line pointers after an aborted vacuum, but concluded that a bit or a small counter is not good enough since other backends might be running with a stale value and would get fooled into believing that they can collect the dead-vacuum line pointers before the index pointers are actually removed. We can still use a 32-bit counter though since the wrap-around for that is practically very large for any backend to still run with such a stale counter (you would need more than 1 billion vacuums on the same table in between for you to hit this).
 
The patch as
submitted uses lp_off to store a single bit, to distinguish between
vacuum and dead-vacuumed, but we could actually have (for greater
safety and debuggability) a 15-byte counter that just wraps around
from 32,767 to 1.  (Maybe it would be wise to reserve a few counter
values, or a few bits, or both, for future projects.)  That would
eliminate the need to touch PageRepairFragmentation() or use the
special space, since all the information would be in the line pointer
itself.  Not having to rearrange the page to reclaim dead line
pointers is appealing, too.


Not sure if I get you here. We need a mechanism to distinguish between dead and dead-vacuum line pointers. How would the counter (which I assume you mean 15-bit and not byte) help solve that ? Or are you just suggesting replacing LSN with the counter in the page header ?
 
> Is there something in place to make sure that pruning uses an up-to-date
> relindxvacxlogid/off value? I guess it doesn't matter if it's out-of-date,
> you'll just miss the opportunity to remove some dead tuples.

This seems like a tricky problem, because it could cause us to
repeatedly fail to remove the same dead line pointers, which would be
poor.  We could do something like this: after updating pg_class,
vacuum send an interrupt to any backend which holds RowExclusiveLock
or higher on that relation.  The interrupt handler just sets a flag.
If that backend does heap_page_prune() and sees the flag set, it knows
that it needs to recheck pg_class.  This is a bit grotty and doesn't
completely close the race condition (the signal might not arrive in
time), but it ought to make it narrow enough not to matter in
practice.


I am not too excited about adding that complexity to the code. Even if a backend does not have up-to-date value, it will fail to collect the dead-vacuum pointers, but soon either it will catch up or some other backend will remove them or the next vacuum will take care of it.

Thanks,
Pavan
 
--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavan Deolasee
Дата:
Сообщение: Re: Single pass vacuum - take 1
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Remove O(N^2) performance issue with multiple SAVEPOINTs.