Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location
Дата
Msg-id CAM3SWZRnK1KC56BuZ2B7Ny6GR9HY02D91vQAKa8mxQKo6SHEsA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location  (Claudio Freire <klaussfreire@gmail.com>)
Ответы Re: [WIP] [B-Tree] Keep indexes sorted by heap physical location
Список pgsql-hackers
On Mon, Nov 21, 2016 at 9:42 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> I realized today, quite suddenly, why I disliked your original patch:
>> it didn't go far enough with embracing the idea of
>> heap-item-pointer-as-key. In other words, I didn't like that you
>> didn't change anything in the internal pages.
>
> But it did. In fact it only changed internal pages.

Oh, okay.

>> Maybe you should put
>> heap TIDs someplace in new internal pages, so that even there we treat
>> them as part of the key.
>
> That is indeed what's happening (both in the old and new version).

Good.

> The new version also opens up to the possibility of omitting the heap
> TID in inner pages, if they're redundant (ie: if two consecutive keys
> are different already, the heap TID part of the key is redundant,
> since it's not necessary to make tree descent decisions).

Makes sense; this is just another aspect of suffix truncation.

>> This may significantly benefit UPDATE-heavy
>> workloads where HOT doesn't work out so well. Particularly for
>> non-unique indexes, we currently have to wade through a lot of bloat
>> from the leaf level, rather than jumping straight to the correct leaf
>> page when updating or inserting.
>
> That is improved in the patch as well. The lookup for an insertion
> point for a heavily bloated (unique or not) index is done efficiently,
> instead of the O(N^2) way it used before.

Cool.

> It's done even for unique indexes. Locking in that case becomes
> complex, true, but I believe it's not an insurmountable problem.

I actually don't think that that would be all that complicated. It's
just one case where you have to mostly match the existing behavior.

>> Or, you might want to make
>> sure that B-Tree tuple insertions still find that "first page" to
>> check, despite still generally treating heap item pointer as part of
>> the key proper (in unique indexes, it can still behave like NULL, and
>> not participate in uniqueness checking, while still essentially being
>> part of the key/sort order).
>
> It behaves like that (even though it's not coded like that)

Why not? What do you mean?

What I'm interested in evaluating is an implementation where an index
on (foo) has a sort order/key space that is precisely the same as an
index on (foo, ctid), with zero exceptions.

>> It also occurs to me that our performance for updates in the event of
>> many NULL values in indexes is probably really bad, and could be
>> helped by this. You'll want to test all of this with a workload that
>> isn't very sympathetic to HOT, of course -- most of these benefits are
>> seen when HOT doesn't help.
>
> I haven't really tested with an overabundance of NULLs, I'll add that
> to the tests. I have tested low-cardinality indexes though, but only
> for correctness.

I think we'll have to model data distribution to evaluate the idea
well. After all, if there is a uniform distribution of values anyway,
you're going to end up with many dirty leaf pages. Whereas, in the
more realistic case where updates are localized, we can hope to better
amortize the cost of inserting index tuples, and recycling index
tuples.

> What do you mean with "first class part"?
>
> It's not included in the scankey for a variety of reasons, not the
> least of which is not breaking the interface for current uses, and
> because the tid type is quite limited in its capabilities ATM. Also,
> the heap TID is usually there on most AM calls that care about it (ie:
> insertions have it, of course), so adding it to the scankey also
> seemed redundant.

I mean that insertions into indexes that are low cardinality should be
largely guided by TID comparisons.

> If not adding to the scankey, what do you mean by "first class" then?

Just what I said about the sort order of an index on "(foo)" now
precisely matching what we'd get for "(foo, ctid)". There are a couple
of tricky issues with that that you'd have to look out for, like
making sure that the high key continues to hold a real TID, which at a
glance looks like something that just happens already. I'm not sure
that we preserve the item pointer TID in all cases, since the current
assumption is that a high key's TID is just filler -- maybe we can
lose that at some point.

You should use amcheck to specifically verify that that happens
reliably in all cases. Presumably, its use of an insertion scankey
would automatically see the use of TID as a tie-breaker with patched
Postgres amcheck verification, and so amcheck will work for this
purpose unmodified.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: dblink get_connect_string() passes FDW option "updatable" to the connect string, connection fails.
Следующее
От: Robert Haas
Дата:
Сообщение: Re: condition variables