Re: Dead Space Map for vacuum

Поиск
Список
Период
Сортировка
От ITAGAKI Takahiro
Тема Re: Dead Space Map for vacuum
Дата
Msg-id 20070117142353.5AC5.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Dead Space Map for vacuum  ("Simon Riggs" <simon@2ndquadrant.com>)
Список pgsql-hackers
I can see that there are two issues in the design of Dead Space Map
in the recent discussions: 1. information accuracy of dead spaces 2. memory management

I'll write up the discussion about the 1st for now.

----
We need to increase page-tracking status for effective vacuum. 1 bit per
block is not enough.

"Simon Riggs" <simon@2ndquadrant.com> wrote:
> I would suggest that we tracked whether a block has had 0, 1 or 1+
> updates/deletes against it. When a block has 1+ it can then be
> worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is
> really the minimum space worth reclaiming on any block.

The suggestion is to classify pages by vacuum priority. There are 3 tracking
status in the model. [A1] Clean (all tuples in the page are frozen) [A2] Low priority to vacuum [A3] High priority to
vacuum

In another discussion, there is a idea to avoid aggressive freezing.
Normal VACUUM scans only pages marked in the B3 bitmap. [B1] Clean [B2] Unfrozen (some tuples need to be frozen) [B3]
Unvacuumed(some tuples need to be vacuumed)
 


Both of the above have only 3 status, so that we can describe all of them
in 2 bits. I would suggest the 4 status DSM model: [C1] Clean [C2] Unfrozen (all tuples are possible to be frozen, but
notyet) [C3] Low priority to vacuum [C4] High priority to vacuum
 

INSERT or after-UPDATE tuples are marked with C3 status -- they need
only to be frozen on commit. In the other hand, DELETE or before-UPDATE
tuples are marked with C4 status -- to be vacuumed on commit.
If transaction becomes ROLLBACK, the necessity of freeze/vacuum will
be inverted, but we can suppose COMMIT is more than ROLLBACK.

We can lower the priority C4 to C3 for the pages that has too small free
spaces to reuse, as the original idea by Simon. We can refer to C3 status
to find the page has had 0 or 1 dead tuples then. Marking either C3 or C4
is an optimizing issue.


We need to add new two VACUUM modes, that use Dead Space Map.
Almost users and autovacuum use only the mode 5.

1.VACUUM FULL       (scan all pages)
2.VACUUM FREEZE ALL (scan all pages)
3.VACUUM ALL        (scan all pages)
4.VACUUM FREEZE     (scan C2,3,4)
5.VACUUM            (scan only C4 basically)

VACUUM downgrades the status of scanned pages from C4 to other.
If any dead tuples, VACUUM tries to freeze all tuples in the page
and change its status to C1(Clean), because it become dirty at all,
freezing is almost free (no additional I/Os). When unfrozen or
unvacuumed tuples remain, the status becomes C2 or C3.

Normal VACUUM (with DSM) scans only pages marked with C4 status basically,
but it may be good to vacuum other pages in some cases; in maintenance
windows, in the case we can retrive several pages in one disk read, etc.
This is also an optimizing issue.


Any ideas?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Takayuki Tsunakawa"
Дата:
Сообщение: Re: Idea for fixing the Windows fsync problem
Следующее
От: Stefan Kaltenbrunner
Дата:
Сообщение: Re: Idea for fixing the Windows fsync problem