Re: WIP: Covering + unique indexes.

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: WIP: Covering + unique indexes.
Дата
Msg-id CAPpHfdtv7PXnt3Se4Etr94f13Q=VYBzXg5X0Vvdw7FtA9_aZ3A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: Covering + unique indexes.  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: WIP: Covering + unique indexes.  (Peter Geoghegan <pg@bowt.ie>)
Re: WIP: Covering + unique indexes.  (Erik Rijkers <er@xs4all.nl>)
Список pgsql-hackers
Hi!

Thank you for review!  Revised patchset is attached.

On Wed, Apr 4, 2018 at 6:08 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Tue, Apr 3, 2018 at 7:02 AM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Great, I'm looking forward your feedback.

I took a look at V11 (0001-Covering-core-v11.patch,
0002-Covering-btree-v11.patch, 0003-Covering-amcheck-v11.patch,
0004-Covering-natts-v11.patch) today.

* What's a pivot tuple?

This is the same thing as what I call a "separator key", I think --
you're talking about the set of IndexTuples including all high keys
(including leaf level high keys), as well as internal items
(downlinks). I think that it's a good idea to have a standard word
that describes this set of keys, to formalize the two categories
(pivot tuples vs. tuples that point to the heap itself). Your word is
just as good as mine, so we can go with that.
 
Good, let's use "pivot tuple" term.

Let's put this somewhere central. Maybe in the nbtree README, and/or
nbtree.h. Also, verify_nbtree.c should probably get some small
explanation of pivot tuples. offset_is_negative_infinity() is a nice
place to mention pivot tuples, since that already has a bit of
high-level commentary about them.

I've added some explanation to nbtree README, nbtree.h and
offset_is_negative_infinity().
 
* Compiler warning:

/home/pg/postgresql/root/build/../source/src/backend/catalog/index.c:
In function ‘index_create’:
/home/pg/postgresql/root/build/../source/src/backend/catalog/index.c:476:45:
warning: ‘opclassTup’ may be used uninitialized in this function
[-Wmaybe-uninitialized]
   if (keyType == ANYELEMENTOID && opclassTup->opcintype == ANYARRAYOID)
                                             ^
/home/pg/postgresql/root/build/../source/src/backend/catalog/index.c:332:19:
note: ‘opclassTup’ was declared here
   Form_pg_opclass opclassTup;
                   ^

Thank yo for pointing, fixed.
 
* Your new amcheck tests should definitely use the new
"heapallindexed" option. There were a number of bugs I can remember
seeing in earlier versions of this patch that that would catch
(probably not during regression tests, but let's at least do that
much).

Good point.  Tests with "heapallindexed" were added.  I also find that it's useful to
check both index built by sorting, and index built by insertions, because there are
different ways of forming tuples.

* The modified amcheck contrib regression tests don't actually pass. I
see these unexpected errors:

10037/2018-04-03 16:31:12 PDT ERROR:  wrong number of index tuple
attributes for index "bttest_multi_idx"
10037/2018-04-03 16:31:12 PDT DETAIL:  Index tid=(290,2) points to
index tid=(289,2) page lsn=0/162407A8.
10037/2018-04-03 16:31:12 PDT ERROR:  wrong number of index tuple
attributes for index "bttest_multi_idx"
10037/2018-04-03 16:31:12 PDT DETAIL:  Index tid=(290,2) points to
index tid=(289,2) page lsn=0/162407A8.
 
Right.  Sorry, that appears that I've posted patch with non-working regression tests.
Not they seem to pass.

* I see that we use "- 1" with attribute number, like this:

> +/* Get number of attributes in B-tree index tuple */
> +#define BtreeTupGetNAtts(itup, index)  \
> +   ( \
> +       (itup)->t_info & INDEX_ALT_TID_MASK ? \
> +       ( \
> +           AssertMacro((ItemPointerGetOffsetNumber(&(itup)->t_tid) & BT_RESERVED_OFFSET_MASK) == 0), \
> +           ItemPointerGetOffsetNumber(&(itup)->t_tid) & BT_N_KEYS_OFFSET_MASK - 1 \
> +       ) \
> +       : \
> +       IndexRelationGetNumberOfAttributes(index) \
> +   )

Is this left behind from before you decided to adopt
INDEX_ALT_TID_MASK? Is it your intention here to encode
InvalidOffsetNumber() without tripping up assertions? Or is it
something else?

Maybe we should follow the example of GinItemPointerGetOffsetNumber(),
and use ItemPointerGetOffsetNumberNoCheck() instead of
ItemPointerGetOffsetNumber(). What do you think? That would allow us
to get rid of the -1 thing, which might be nice. Just because we use
ItemPointerGetOffsetNumberNoCheck() in places that use an alternative
offset representation does not mean we need to use it in existing
places. If existing places had a regression tests failure because of
this, that would probably be due to a real bug. No?
 
Ok.  I've tried to remove both assertions and "+1" hack.  That works
for me.  However, I've to touch a lot of places, not sure if that's a problem.

* ISTM that the "points to index tid=(289,2)" part of the message just
shown would be a bit clearer if I didn't have to know that 2 actually
means 1 when we talk about the pointed-to offset (yeah, it will
probably become unclear in the future when we start using the reserved
offset status bits, but why not make the low bits of offset
simple/logical way?). Your new amcheck error message should spell it
out (it should say the number of attributes indicated by the offset,
if any) -- regardless of what we do about the "must apply - 1 to
offset" question.

Right, since error is related to number of attributes, then we should report
observed number of attributes explicitly here.

* "Minus infinity" items do not have the new status bit
INDEX_ALT_TID_MASK set in at least some cases. They should.

* _bt_sortaddtup() should not do "trunctuple.t_info =
sizeof(IndexTupleData)", since that destroys useful information. Maybe
that's the reason for the last bug?

* Ditto for _bt_pgaddtup().
 
Yes, "minus infinity" items hadn't new status bit INDEX_ALT_TID_MASK set.
And that's because errors in _bt_sortaddtup() and _bt_pgaddtup() that you
pointed.  Fixed. thanks.

* Why expose _bt_pgaddtup() so that nbtsort.c/_bt_buildadd() can call
it? The only reason we have _bt_sortaddtup() is because we cannot
trust P_RIGHTMOST() within _bt_pgaddtup() when called in the context
of CREATE INDEX (from nbtsort.c/_bt_buildadd()). There is no real
change needed, because _bt_sortaddtup() knows that it's inserting on a
non-rightmost page both without this patch, and when this patch needs
to truncate and then add the high key back.

It's clear that you can just use _bt_sortaddtup() (and leave
_bt_pgaddtup() private) because _bt_sortaddtup() is only different to
_bt_pgaddtup() when !P_ISLEAF(), but we only call _bt_pgaddtup() when
P_ISLEAF(). Or have I missed something?
 
Agreed.  I also see no point of exposing _bt_pgaddtup().  I've replaced it
with _bt_sortaddtup(), and it appears to work.

* For inserts, this patch performs an extra truncation step on the
same high key that we'd use with a plain (non-covering/include) index.
That's pretty clean. But it seems more complicated for
nbtsort.c/_bt_buildadd(). I think that a comment should say that we
cannot just rearrange item pointers for high key on the old page when
we also truncate, because overwriting the P_HIKEY position ItemId with
the old page's former final ItemId (whose tuple ended up becoming the
first tuple on new/right page) fails to actually save any space. We
need to truly shift around IndexTuples on the page in order to save
space (both PageIndexTupleDelete() and PageAddItem() end up shifting
both the ItemId array and some IndexTuple space).

Also, maybe say that the performance here really isn't so bad, because
we reclaim IndexTuple space close to the middle of the hole in the
page with our PageIndexTupleDelete(), and then use almost the *same*
space within PageAddItem(). There is not actually that much physical
shifting around for IndexTuples. It turns out that it's not that
different. (You can probably find a better, more succinct way of
putting this -- I'm tired now.)

I wrote some comment there.  Please, check it.
 
* I suggest that you teach _bt_check_natts() to expect zero attributes
for "minus infinity" items. It looks like amcheck contrib regression
tests don't pass because you don't look for that (P_FIRSTDATAKEY() is
the "minus infinity" item on internal pages).

Sure, thank you for catching.
 
* bt_target_page_check() should also have a !P_ISLEAF() check, since
with a covering index every tuple will have INDEX_ALT_TID_MASK. This
should call _bt_check_natts() for each item, including the "minus
infinity" items.

Yes, every item including the "minus infinity" should be checked for
number of attributes.  However, I didn't get how that related to !P_ISLEAF().
In order to check "minus infinity" item I've just pull _bt_check_natts() up
before offset_is_negative_infinity() check.

Regarding !P_ISLEAF(), I think we should check every item on both
leaf and non-leaf pages.  I that is how code now works unless I'm missing
something.

* "minus infinity" items don't have the right number of attributes
set, in at least some cases that I saw. The number matched other
internal items, and wasn't 0 or whatever. Maybe the
ItemPointerGetOffsetNumberNoCheck() idea would leave things so that it
actually could be 0 safely, rather than natts + 1 as you said, which
would be nice.)

Yes, "minus infinity" items didn't have number of attributes set, because
_bt_sortaddtup() and _bt_pgaddtup() didn't handle it as you pointed
above.
 
* I would reorder the comment to match the order of the code:

> +   /*
> +    * Pivot tuples stored in non-leaf pages and hikeys of leaf pages should
> +    * have nkeyatts number of attributes.  While regular tuples of leaf pages
> +    * should have natts number of attributes.
> +    */
> +   if (P_ISLEAF(opaque) && offnum >= P_FIRSTDATAKEY(opaque))
> +       return (BtreeTupGetNAtts(itup, index) == natts);
> +   else
> +       return (BtreeTupGetNAtts(itup, index) == nkeyatts);
 
Thanks for pointing.  Since there are now three cases including handling of
"minus infinity" items, comment is now split to three.

* Please add BT_N_KEYS_OFFSET_MASK + INDEX_MAX_KEYS static assertion.
Maybe add it to _bt_check_natts().
 
Done.

* README-SSI says:

    * The effects of page splits, overflows, consolidations, and
removals must be carefully reviewed to ensure that predicate locks
aren't "lost" during those operations, or kept with pages which could
get re-used for different parts of the index.

Do we need to worry about that here? I guess not, because this is just
like having many duplicates. But a note just above the _bt_doinsert()
call to CheckForSerializableConflictIn() might be a good idea.

I don't seerelations between this patchset and SSI.  We just
change representation of some index tuples in pages.  However,
we didn't change the the order of page modification, the order
of page lookup and so on.  Yes, we change size of some tuples,
but B-tree already worked with tuples of variable sizes.  So, the fact
that tuples now have different size shouldn't affect SSI.  Right now,
I'm not sure if CheckForSerializableConflictIn() just above the
_bt_doinsert() is good idea.  But even if so, I think that should be
a subject of separate patch.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: comments around heap_lock_tuple confus{ing,ed} around deletedtuples
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS