Re: Avoiding second heap scan in VACUUM

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: Avoiding second heap scan in VACUUM
Дата
Msg-id 2e78013d0805282127g27c9e8c0re25010bcbd221753@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Avoiding second heap scan in VACUUM  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: Avoiding second heap scan in VACUUM  (Simon Riggs <simon@2ndquadrant.com>)
Re: Avoiding second heap scan in VACUUM  (Simon Riggs <simon@2ndquadrant.com>)
Список pgsql-hackers
On Thu, May 29, 2008 at 2:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>
> I'm not happy that the VACUUM waits. It might wait a very long time and
> cause worse overall performance than the impact of the second scan.
>

Lets not get too paranoid about the wait. It's a minor detail in the
whole theory. I would suggest that the benefit of avoiding second scan
would be huge. Remember, its just not a scan, it also dirties those
blocks again, forcing them write to disk. Also, if you really have a
situation where vacuum needs to wait for very long, then you are
already in trouble. The long running transactions would prevent
vacuuming many tuples.

I think we can easily tweak the "wait" so that it doesn't wait
indefinitely. If the "wait" times out, vacuum can still proceed, but
it can mark the DEAD line pointers as DEAD_RECLAIMED. It would then
have a choice of making a second pass and reclaiming the DEAD line
pointers (like it does today).


>
> So the idea is to have one pass per VACUUM, but make that one pass do
> the first pass of *this* VACUUM and the second pass of the *last*
> VACUUM.
>
> We mark the xid of the VACUUM in pg_class as you suggest, but we do it
> after VACUUM has completed the pass.
>

The trick is to correctly know if the last vacuum removed the index
pointers or not. There could be several ways to do that. But you need
to explain in detail how it would work in cases of vacuum failures and
database crash.

> In single pass we mark DEAD line pointers as RECENTLY_DEAD. If the last
> VACUUM xid is old enough we mark RECENTLY_DEAD as UNUSED, as well,
> during this first pass. If last xid is not old enough we do second pass
> to remove them.
>

Lets not call them RECENTLY_DEAD :-) DEAD is already stricter than
that. We need something even more strong. That's why I used
DEAD_RECLAIMED, to note that the line pointer is DEAD and the index
pointer may have been removed as well.


> That has the effect that large tables that are infrequently VACUUMed
> will need only a single scan. Smaller tables that require almost
> continual VACUUMing will probably do two scans, but who cares?
>

Yeah, I think we need to target the large table case. The second pass
is obviously much more costly for large tables. I think the timed-wait
answers your concern.

Thanks,
Pavan


-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Estimating recursive query cost
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: BUG #4204: COPY to table with FK has memory leak