Re: Making all nbtree entries unique by having heap TIDs participatein comparisons

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Дата
Msg-id CAH2-Wz=fnMJ8-z4iAyL2X_x-giiOg82+RCRS-PXSeW3P+OM5tQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Список pgsql-hackers
Hi Alexander,

On Fri, Jan 4, 2019 at 7:40 AM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> I'm starting to look at this patchset.  Not ready to post detail
> review, but have a couple of questions.

Thanks for taking a look!

> Yes, it shouldn't be too hard, but it seems like we have to keep two
> branches of code for different handling of duplicates.  Is that true?

Not really. If you take a look at v9, you'll see the approach I've
taken is to make insertion scan keys aware of which rules apply (the
"heapkeyspace" field field controls this). I think that there are
about 5 "if" statements for that outside of amcheck. It's pretty
manageable.

I like to imagine that the existing code already has unique keys, but
nobody ever gets to look at the final attribute. It works that way
most of the time -- the only exception is insertion with user keys
that aren't unique already. Note that the way we move left on equal
pivot tuples, rather than right (rather than following the pivot's
downlink) wasn't invented by Postgres to deal with the lack of unique
keys. That's actually a part of the Lehman and Yao design itself.
Almost all of the special cases are optimizations rather than truly
necessary infrastructure.

> I didn't get the point of this paragraph.  Does it might happen that
> first right tuple is under tuple size restriction, but new pivot tuple
> is beyond that restriction?  If so, would we have an error because of
> too long pivot tuple?  If not, I think this needs to be explained
> better.

The v9 version of the function _bt_check_third_page() shows what it
means (comments on this will be improved in v10, too). The old limit
of 2712 bytes still applies to pivot tuples, while a new, lower limit
of 2704 bytes applied for non-pivot tuples. This difference is
necessary because an extra MAXALIGN() quantum could be needed to add a
heap TID to a pivot tuple during truncation in the worst case. To
users, the limit is 2704 bytes, because that's the limit that actually
needs to be enforced during insertion.

We never actually say "1/3 of a page means 2704 bytes" in the docs,
since the definition was always a bit fuzzy. There will need to be a
compatibility note in the release notes, though.
-- 
Peter Geoghegan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Arrays of domain returned to client as non-builtin oiddescribing the array, not the base array type's oid