Re: Making all nbtree entries unique by having heap TIDs participatein comparisons

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Дата
Msg-id 6d83ec20-2a55-c051-0159-e92d39bdfe20@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On 19/09/2018 20:23, Peter Geoghegan wrote:
> Attached is v5,

So.  I don't know much about the btree code, so don't believe anything I
say.

I was very interested in the bloat test case that you posted on
2018-07-09 and I tried to understand it more.  The current method for
inserting a duplicate value into a btree is going to the leftmost point
for that value and then move right until we find some space or we get
"tired" of searching, in which case just make some space right there.
The problem is that it's tricky to decide when to stop searching, and
there are scenarios when we stop too soon and repeatedly miss all the
good free space to the right, leading to bloat even though the index is
perhaps quite empty.

I tried playing with the getting-tired factor (it could plausibly be a
reloption), but that wasn't very successful.  You can use that to
postpone the bloat, but you won't stop it, and performance becomes terrible.

You propose to address this by appending the tid to the index key, so
each key, even if its "payload" is a duplicate value, is unique and has
a unique place, so we never have to do this "tiresome" search.  This
makes a lot of sense, and the results in the bloat test you posted are
impressive and reproducible.

I tried a silly alternative approach by placing a new duplicate key in a
random location.  This should be equivalent since tids are effectively
random.  I didn't quite get this to fully work yet, but at least it
doesn't blow up, and it gets the same regression test ordering
differences for pg_depend scans that you are trying to paper over. ;-)

As far as the code is concerned, I agree with Andrey Lepikhov that one
more abstraction layer that somehow combines the scankey and the tid or
some combination like that would be useful, instead of passing the tid
as a separate argument everywhere.

I think it might help this patch move along if it were split up a bit,
for example 1) suffix truncation, 2) tid stuff, 3) new split strategies.
 That way, it would also be easier to test out each piece separately.
For example, how much space does suffix truncation save in what
scenario, are there any performance regressions, etc.  In the last few
versions, the patches have still been growing significantly in size and
functionality, and most of the supposed benefits are not readily visible
in tests.

And of course we need to think about how to handle upgrades, but you
have already started a separate discussion about that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: heap_sync seems rather oblivious to partitioned tables(wal_level=minimal)
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: On-disk compatibility for nbtree-unique-key enhancement