Re: Free space management within heap page

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Free space management within heap page
Дата
Msg-id 45B5DEBF.3090200@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Free space management within heap page  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Ответы Re: Free space management within heap page  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Re: Free space management within heap page  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Список pgsql-hackers
ITAGAKI Takahiro wrote:
> "Pavan Deolasee" <pavan.deolasee@gmail.com> wrote:
> 
>>> The overwhelming vast majoirty of tuples are going to be in one or more
>>> indexes. Which means nearly all tuples are going to fall into this
>>> category. So where's the benefit?
>> The line pointers can not reused, but the space consumed by the tuple can be.
>> So the benefit is in utilizing that space for newer tuples and thus reduce the
>> bloat.
> 
> I think your idea is same as the following TODO Item, that I suggested before.
> 
> * Consider shrinking expired tuples to just their headers.
>         http://archives.postgresql.org/pgsql-patches/2006-03/msg00142.php
>         http://archives.postgresql.org/pgsql-patches/2006-03/msg00166.php

Yeah, same idea. You suggested in that thread that we should keep the 
headers because of line pointer bloat, but I don't see how that's 
better. You're still going to get some line pointer bloat, but not able 
to reclaim as much free space.

In that thread, Tom mentioned that we may need to keep the header 
because the dead tuple might be part of an update chain. Reading back 
the discussion on the vacuum bug, I can't see how removing the header 
would be a problem, but maybe I'm missing something.

>> One assumption I am  making here is that its sufficient to mark the line pointer
>> "unused" (reset LP_USED flag) even though there is an index entry pointing to
>> the tuple. During index scan, we anyways check for ItemIdIsUsed() before
>> proceeding further. I know it might break the ctid chain, but does that really
>> matter ? I don't see any reason why somebody would need to follow ctid chain
>> past a dead tuple.
> 
> Keeping only line pointers itself is not a problem, but it might lead
> bloating of line pointers. If a particular tuple in a page is replaced
> repeatedly, the line pointers area bloats up to 1/4 of the page.

Where does the 1/4 figure come from?

> We need to work around the problem.

If a row is updated many times until vacuum comes along, what currently 
happens is that we end up with a bunch of pages full of dead tuples. 
With the truncation scheme, we could fit way more dead tuples on each 
page, reducing the need to vacuum. If a row is for example 40 bytes 
long, including header (a quite narrow one), you could fit 10 line 
pointers to the space of one row, which means that you could ideally 
multiply your vacuum interval by a factor of 10x. That's a huge benefit, 
though indexes would still bloat unless selects marking index pointers 
as dead keep the bloat in control.

The problem is that if a tuple is updated say hundreds of times before 
vacuum, but then it's not updated anymore, you'll have a page full of 
useless line pointers that are not reclaimed. Clearly we should start 
reclaiming line pointers, but we can only do that for unused line 
pointers after the last used one.

Would it be enough cap the number of dead line pointers with a simple 
rule like "max 20% of line pointers can be dead"? I'd be happy with that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Free space management within heap page
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: 10 weeks to feature freeze (Pending Work)