Re: HOT documentation README

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: HOT documentation README
Дата
Msg-id 2e78013d0709120107o1a35a7d3y98da165d4f89ef7f@mail.gmail.com
обсуждение исходный текст
Ответ на Re: HOT documentation README  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: HOT documentation README  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Список pgsql-patches


On 9/12/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:

VACUUM FULL
-----------

To make vacuum full work, any DEAD tuples in the middle of an update
chain need to be removed (see comments at the top of
heap_prune_hotchain_hard() for details). Vacuum full performs a more
aggressive pruning that not only removes dead tuples at the beginning of
an update chain, but scans the whole chain and removes any intermediate
dead tuples as well.  It also removes redirected line pointers by making
them directly point to the first tuple in the HOT chain. This causes
a user-visible change in the tuple's CTID, but since VACUUM FULL has
always moved tuple CTIDs, that should not break anything.

XXX any extra complexity here needs justification --- a lot of it.


We hard prune the chains and also clear up redirect line pointers
because doing so is safe within VACUUM FULL and it reduces addition
complexity in the actual VACUUM FULL work.
 
When we move tuples and tuple chains, we don't try to preserve
their HOT properties. So when tuples in a HOT chain are moved,
we reset their HEAP_ONLY_TUPLE and HEAP_HOT_UPDATED flags
and each tuple has its own index entry. This requires us to some
more book keeping work in terms on number of indexed tuples expected
etc because they are checked at the end of the index scan.


Statistics
----------

XXX: How do HOT-updates affect statistics? How often do we need to run
autovacuum and autoanalyze?



Auotovacuum needs to be run much less frequently with HOT. This is because
defragmentation reclaims dead space in a page, thus reducing total dead
space in a table. Right now we don't update FSM information about the page
after defragmenting it, so a UPDATE on a different page can still cause
relation extension even though there is free space in some other page.
The rational for not updating FSM is to let subsequent UPDATEs on the page
to use the freed up space. But one can argue that we should let the free
space to be used for other UPDATEs/INSERTs after leaving fillfactor worth
of space.

Another significant change regarding autovacuum is that we now track
the total dead space in the table instead of number of dead tuples. This
seems like a better approach because it takes into account varying tuple sizes
into account. The tracked dead space is increased whenever we update/delete
a tuple (or insert is aborted) and reduced when a page is defragmented.
autovacuum_vacuum_scale_factor considers the percentage of dead space
to the size of the relation whereas autovacuum_vacuum_threshold
considers the absolute amount of dead space in terms of blocks.

Every UPDATE (HOT or COLD) contributes to the autoanalyze stats and
defragmentation/pruning has no effect on autoanalyze. IOW autoanalyze
would work just the way it does today. One change that is worh mentioning
and discussing is that we don't follow HOT chains while fetching tuples during
autoanalyze and autoanalyze would consider all such tuples as DEAD.
In the worst case when all the tuples in the table are reachable  via
redirected line pointers, this would confuse autoanalyze since it would
consider all tuples in the table as DEAD.

I think we should change this to follow HOT chain in analyze. Since we
fetch using SnapshotNow, if there is a live tuple at the end of the
chain, analyze would use that. Otherwise the tuple is considered as
DEAD.

Thanks,
Pavan



--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

В списке pgsql-patches по дате отправления:

Предыдущее
От: "Pavan Deolasee"
Дата:
Сообщение: Re: HOT documentation README
Следующее
От: "Pavan Deolasee"
Дата:
Сообщение: Re: HOT patch - version 15