Re: Reducing tuple overhead

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Reducing tuple overhead
Дата
Msg-id CANP8+jK=rZw6XrweSjZmsmkU30yef+HCw2KOnEf1cS3PhWVerQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Reducing tuple overhead  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Reducing tuple overhead  (Robert Haas <robertmhaas@gmail.com>)
Re: Reducing tuple overhead  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On 25 April 2015 at 01:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sat, Apr 25, 2015 at 1:58 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>
> On 4/23/15 10:40 PM, Amit Kapila wrote:
>>
>> I agree with you and what I think one of the major reasons of bloat is that
>> Index segment doesn't have visibility information due to which clearing of
>> Index needs to be tied along with heap.  Now if we can move transaction
>> information at page level, then we can even think of having it in Index
>> segment as well and then Index can delete/prune it's tuples on it's own
>> which can reduce the bloat in index significantly and there is a benefit
>> to Vacuum as well.
>
>
> I don't see how putting visibility at the page level helps indexes at all. We could already put XMIN in indexes if we wanted, but it won't help, because...
>

We can do that by putting transaction info at tuple level in index as
well but that will be huge increase in size of index unless we devise
a way to have variable index tuple header rather than a fixed.  

>> Now this has some downsides as well like Delete
>> needs to traverse Index segment as well to Delete mark the tuples, but
>> I think the upsides of reducing bloat can certainly outweigh the downsides.
>
>
> ... which isn't possible. You can not go from a heap tuple to an index tuple.

We will have the access to index value during delete, so why do you
think that we need linkage between heap and index tuple to perform
Delete operation?  I think we need to think more to design Delete
.. by CTID, but that should be doable. 

I see some assumptions here that need to be challenged.

We can keep xmin and/or xmax on index entries. The above discussion assumes that the information needs to be updated synchronously. We already store visibility information on index entries using the lazily updated killtuple mechanism, so I don't see much problem in setting the xmin in a similar lazy manner. That way when we use the index if xmax is set we use it, if it is not we check the heap. (And then you get to freeze indexes as well ;-( )
Anyway, I have no objection to making index AM pass visibility information to indexes that wish to know the information, as long as it is provided lazily.

The second assumption is that if we had visibility information in the index that it would make a difference to bloat. Since as I mention, we already do have visibility information, I don't see that adding xmax or xmin would make any difference at all to bloat. So -1 to adding it **for that reason**.


A much better idea is to work out how to avoid index bloat at cause. If we are running an UPDATE and we cannot get a cleanup lock, we give up and do a non-HOT update, causing the index to bloat. It seems better to wait for a short period to see if we can get the cleanup lock. The short period is currently 0, so lets start there and vary the duration of wait upwards proportionally as the index gets more bloated.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Disabling trust/ident authentication configure option
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: json_populate_record issue - TupleDesc reference leak