Re: WIP: Covering + unique indexes.

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: WIP: Covering + unique indexes.
Дата
Msg-id CAPpHfdscXb8Rya=yBKTV_8qCnw=6CxcBLNH99t7j5THyPwDQpg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: Covering + unique indexes.  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: WIP: Covering + unique indexes.  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
Hi, Peter!

On Sat, Mar 31, 2018 at 8:39 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Fri, Mar 30, 2018 at 4:08 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
>> I'll try it.  But I'm afraid that it's not as easy as you expect.
>
>
> So, I have some implementation of storage of number of attributes inside
> index tuple itself.  I made it as additional patch on top of previous
> patchset.
> I attach the whole patchset in order to make commitfest.cputube.org happy.

Looks like 0004-* became mangled. Can you send a version that is not
mangled, please?
 
Oh, sorry for that.  I forgot to remove some .orig and .rej files and they
accidentally appear in patch.  Correct verion is attached.

> I decided not to use 13th bit of IndexTuple flags.  Instead I use only high
> bit
> of offset which is also always free on regular tuples.  In fact, we already
> use
> assumption that there is at most 11 significant bits of index tuple offset
> in
> GIN (see ginpostinglist.c).

So? GIN doesn't have the same legacy at all. The GIN posting lists
*don't* have regular heap TID pointers at all. They started out
without them, and still don't have them.

Yes, GIN never stored heap TID pointers in t_tid of index tuple.  But GIN
assumes that heap TID pointer has at most 11 significant bits during
posting list encoding.

> However, I find following arguments against implementing this feature
> in covering indexes.
>
>  * We write number of attributes present into tuple.  But how to prove that
> it's correct.  I add appropriate checks to amcheck.  But I don't think all
> the
> users runs amcheck frequent enough.  Thus, in order to be sure that it's
> correct we should check number of attributes is written correct everywhere
> in the B-tree code.

Use an assertion. Problem solved.
 
I don't think we should use assertions, because they are typically disabled on
production PostgreSQL builds.  But we can have some explicit check in some
common path.  In the attached patch I've such check to _bt_compare().  Probably,
together with amcheck, that would be sufficient.

>  * Offset number is used now for parent refind (see BTEntrySame() macro).
> In the attached patch, this condition is relaxed.  But I don't think I
> really like
> that.  This shoud be thought out very carefully...

It's safe, although I admit that that's a bit hard to see.
Specifically, look at this code in _bt_insert_parent():

        /*
         * Find the parent buffer and get the parent page.
         *
         * Oops - if we were moved right then we need to change stack item! We
         * want to find parent pointing to where we are, right ?    - vadim
         * 05/27/97
         */
        ItemPointerSet(&(stack->bts_btentry.t_tid), bknum, P_HIKEY);
        pbuf = _bt_getstackbuf(rel, stack, BT_WRITE);

Vadim doesn't seem too sure of why he did it that way. What's clear is
that the offset on all internal pages is always P_HIKEY (that is, 1),
because this is the one and only place where new IndexTuples get
generated for internal pages. That's unambiguous. So how could
BTEntrySame() truly need to care about offset? How could there ever be
an internal page offset that wasn't just P_HIKEY? You can look
yourself, using pg_hexedit or pageinspect.

The comments above BTTidSame()/BTEntrySame() are actually wrong,
including "New Comments". Vadim wanted to make TIDs part of the
keyspace [1], beginning in around 1997. The idea was that we'd have
truly unique keys by including TID, as L&Y intended, but that never
happened. Instead, we got commit 9e85183bf in 2000, which among many
other things changed the L&Y invariant to deal with duplicates. I
think that Tom should have changed BTTidSame() to not care about
offset number in that same commit from 2000.

I actually think that Vadim was correct to want to make heap TID a
unique-ifier, and that that's the best long term solution [2].
Unfortunately, the code that he committed in the late 1990s didn't
really help -- how could it help without including the *entire* heap
TID? This BTTidSame() offset thing seems to be related to some weird
logic for duplicates that Tom killed in 9e85183bf, if it ever made
sense. Note that _bt_getstackbuf(), the only code that uses
BTEntrySame(), does not look at the offset directly -- because it's
always P_HIKEY.

Anyway...

OK, thank for the explanation.  I agree that check of offset is redundant here.
 
>  * Now, hikeys are copied together with original t_tid's.  That makes it
> possible
> to find the origin of this hikey.  If we override offset in t_tid, that
> becomes not
> always possible.

....that just leaves the original high key at the leaf level, as you
say here. You're right that there is theoretically a loss of forensic
information from actually storing something in the offset at the leaf
level, and storing something interesting in the offset during the
first phase of a page split (not the second, where the aforementioned
_bt_insert_parent() function gets called). I don't think it's worth
worrying about, though.

The fact is that that information can go out of date almost
immediately, whereas high keys usually last forever. The only reason
that there is a heap TID in the high key is because we'd have to add
special code to remove it; not because it has any real value. I find
it very hard to imagine it being used in a forensic situation. If you
actually wanted to do this, the key itself is probably enough -- you
probably wouldn't need the TID.

I don't know,  When I wrote my own implementation of B-tree and debug
it, I found saving hikeys "as is" to be very valuable for debugging.
However, B-trees in PostgreSQL are quite mature, and probably
don't need so much debug information.
 
> * When index tuple is truncated, then pageinspect probably shouldn't show
> offset for it, because it meaningless.  Should it rather show number of
> attributes in separate column?  Anyway that should be part of suffix
> truncation
> patch.  Not part of covering indexes patch, especially added at the last
> moment.

Nobody asked you to write a suffix truncation patch. That has
complexity above and beyond what the covering index patch needs. I
just expect it to be compatible with an eventual suffix truncation
patch, which you've now shown is quite possible. It is clearly a
complimentary technique.
 
OK, but change of on-disk tuple format also changes what people
see in pageinspect.  Right now, they see "1" as offset for tuples in intenal
page and hikeys.  After patch, they would see some large values
(assuming we set some of hi bits) in offset.  I'm not sure it's OK.
We probably should change display of index tuples in pageinspect.

> * I don't really see how does covering indexes without storing number of
> index tuple attributes in the tuple itself blocks future work on suffix
> truncation.

It makes it harder. Your new version gives amcheck a way of
determining the expected number of attributes. That's the main reason
to have it, more so than the suffix truncation issue.
 
I'm sorry, I do not understand.  New version of amcheck determines
the expected number of attributes and compares that to the numer of
attributes stored in the offset number.  But I can get *expected* number of
attributes even wihtout storing them also in the offset number...

Suffix truncation matters a lot too, though.
 
Sure, that's great feature.

> So, taking into account the arguments of above, I propose to give up with
> idea to stick covering indexes and suffix truncation features together.
> That wouldn't accelerate appearance one feature after another, but rather
> likely would RIP both of them...

I think that the thing that's more likely to kill this patch is the
fact that after the first year, it only ever got discussed in the
final CF. That's not something that happened because of my choices. I
made several offers of my time. I did not create this urgency.

I'm sorry my comment was only about particular feature which I'm not
biggest fan of (that doesn't mean I'm strictly against of it).

I'd like to note that I really appreciate your attention to this patch
as well as other patches.  

Anyway, let me know whay do you think about
0004-Covering-natts-v10.patch attached.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: legrand legrand
Дата:
Сообщение: Re: Diagonal storage model
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: Diagonal storage model