Re: Locking B-tree leafs immediately in exclusive mode

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Locking B-tree leafs immediately in exclusive mode
Дата
Msg-id CAPpHfdsX-ubk0LNd3a5bWtoAWKTmPwMwJvxijEGofKwpJkyShA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Locking B-tree leafs immediately in exclusive mode  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Locking B-tree leafs immediately in exclusive mode  (Claudio Freire <klaussfreire@gmail.com>)
Re: Locking B-tree leafs immediately in exclusive mode  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Thu, Jun 14, 2018 at 1:01 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, Jun 11, 2018 at 9:30 AM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > On Mon, Jun 11, 2018 at 1:06 PM Simon Riggs <simon@2ndquadrant.com> wrote:
> >> It's a good idea. How does it perform with many duplicate entries?
>
> I agree that this is a good idea. It independently occurred to me to
> do this. The L&Y algorithm doesn't take a position on this at all,
> supposing that *no* locks are needed at all (that's why I recommend
> people skip L&Y and just read the Lanin & Shasha paper -- L&Y is
> unnecessarily confusing). This idea seems relatively low risk.

Great.  Unfortunately, we don't have access to 72-cores machine
anymore.  But I'll do benchmarks on smaller 40-cores machine, which we
have access to.  I hope there should be some win from this patch too.

> > Our B-tree is currently maintaining duplicates unordered.  So, during insertion
> > we can traverse rightlinks in order to find page, which would fit new
> > index tuple.
> > However, in this case we're traversing pages in exclusive lock mode, and
> > that happens already after re-lock.  _bt_search() doesn't do anything with that.
> > So, I assume there shouldn't be any degradation in the case of many
> > duplicate entries.
>
> BTW, I have a prototype patch that makes the keys at the leaf level
> unique. It is actually an implementation of nbtree suffix truncation
> that sometimes *adds* a new attribute in pivot tuples, rather than
> truncating away non-distinguishing attributes -- it adds a special
> heap TID attribute when it must. The patch typically increases fan-in,
> of course, but the real goal was to make all entries at the leaf level
> truly unique, as L&Y intended (we cannot do it without suffix
> truncation because that will kill fan-in). My prototype has full
> amcheck coverage, which really helped me gain confidence in my
> approach.

Could you, please, clarify what do you mean by "fan-in"?  As I
understood, higher "fan-in" means more children on single non-leaf
page, and in turn "better branching".  Is my understanding correct?

If my understanding is correct, then making leaf level truly unique
without suffix truncation will kill fan-in, because additional heap
TID attribute will increase pivot tuple size.  However, that doesn't
look like inevitable shortcoming, because we could store heap TID in
t_tid of pivot index tuples.  Yes, we've used t_tid offset for storing
number of attributes in truncated tuples.  But heap TID is going to be
virtually the last attribute of tuple.  So, pivot tuples containing
heap TID are not going to be suffix-truncated anyway.  So, for such
tuples we can just don't set INDEX_ALT_TID_MASK, and they would be
assumed to have all the attributes.

So, my idea that it's not necessary to couple suffix truncation with
making leaf tuples unique.  Regarding just making leaf tuples unique,
I understand that it wouldn't be always win.  When we have a lot of
duplicates, current implementation searches among the pages containing
those duplicates for the first page containing enough of free space.
While with unique index, we would have to always insert into
particular page.  Thus, current implementation gives us better filling
of pages, and (probably) faster insertion.  But, unique index would
give us much better performance of retail tuple delete and update
(currently we don't support both).  But I believe that future
pluggable table access methods will use both.

Therefore, my idea is that there is a tradeoff between making btree
index unique or non-unique.  I think we will need an option for that.
I'm not yet sure if this option should be user-visible or
auto-decision based on table access method used.

> There are still big problems with my patch, though. It seems to hurt
> performance more often than it helps when there is a lot of
> contention, so as things stand I don't see a path to making it
> commitable. I am still going to clean it up some more and post it to
> -hackers, though. I still think it's quite interesting. I am not
> pursuing it much further now because it seems like my timing is bad --
> not because it seems like a bad idea. I can imagine something like
> zheap making this patch important again. I can also imagine someone
> seeing something that I missed.

I think that joint community efforts are necessary, when single person
don't see path to make patch committable.  So, I'm looking forward
seeing your patch posted.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: why partition pruning doesn't work?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: WAL prefetch