Re: Reducing tuple overhead

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Reducing tuple overhead
Дата
Msg-id CAA4eK1K41rLseG5kJTNYpxdneoCWHxLOrNN9sLq0cA4Us5cymg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Reducing tuple overhead  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Thu, Apr 30, 2015 at 5:35 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> On 25 April 2015 at 01:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Sat, Apr 25, 2015 at 1:58 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> >
>> > On 4/23/15 10:40 PM, Amit Kapila wrote:
>> >>
>> >> I agree with you and what I think one of the major reasons of bloat is that
>> >> Index segment doesn't have visibility information due to which clearing of
>> >> Index needs to be tied along with heap.  Now if we can move transaction
>> >> information at page level, then we can even think of having it in Index
>> >> segment as well and then Index can delete/prune it's tuples on it's own
>> >> which can reduce the bloat in index significantly and there is a benefit
>> >> to Vacuum as well.
>> >
>> >
>> > I don't see how putting visibility at the page level helps indexes at all. We could already put XMIN in indexes if we wanted, but it won't help, because...
>> >
>>
>> We can do that by putting transaction info at tuple level in index as
>> well but that will be huge increase in size of index unless we devise
>> a way to have variable index tuple header rather than a fixed.  
>>
>> >> Now this has some downsides as well like Delete
>> >> needs to traverse Index segment as well to Delete mark the tuples, but
>> >> I think the upsides of reducing bloat can certainly outweigh the downsides.
>> >
>> >
>> > ... which isn't possible. You can not go from a heap tuple to an index tuple.
>>
>> We will have the access to index value during delete, so why do you
>> think that we need linkage between heap and index tuple to perform
>> Delete operation?  I think we need to think more to design Delete
>> .. by CTID, but that should be doable.
>
>
> I see some assumptions here that need to be challenged.
>
> We can keep xmin and/or xmax on index entries. The above discussion assumes that the information needs to be updated synchronously. We already store visibility information on index entries using the lazily updated killtuple mechanism, so I don't see much problem in setting the xmin in a similar lazy manner. That way when we use the index if xmax is set we use it, if it is not we check the heap. (And then you get to freeze indexes as well ;-( )
> Anyway, I have no objection to making index AM pass visibility information to indexes that wish to know the information, as long as it is provided lazily.
>

Providing such an information lazily can help to an extent, but I think
it won't help much in bloat reduction. For example, when an
insert tries to insert a row in index page and found that there is no
space, it can't kill or overwrite any tuple (that is actually dead unless
updated lazily by that time) which is I think one of the main reasons for
index bloat.

> The second assumption is that if we had visibility information in the index that it would make a difference to bloat. Since as I mention, we already do have visibility information, I don't see that adding xmax or xmin would make any difference at all to bloat. So -1 to adding it **for that reason**.
>

The visibility information is only updated when such an index item
is accessed (lazy updation) and by that time already the new space
for index insertion would be used whereas if the information is provided
synchronously the dead space could be reclaimed much earlier (for
insertions on page which has dead tuples) and will reduce the chances
of bloat.

>
> A much better idea is to work out how to avoid index bloat at cause. If we are running an UPDATE and we cannot get a cleanup lock, we give up and do a non-HOT update, causing the index to bloat. It seems better to wait for a short period to see if we can get the cleanup lock. The short period is currently 0, so lets start there and vary the duration of wait upwards proportionally as the index gets more bloated.
>

I think this is a separate and another good way of avoiding the
bloat, but independent of this having something like what we
discussed above will even reduce the chances of bloat for a
non-HOT update in a scenario described by you.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Abhijit Menon-Sen
Дата:
Сообщение: Re: initdb -S and tablespaces
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: INSERT ... ON CONFLICT UPDATE/IGNORE 4.0