Re: Making all nbtree entries unique by having heap TIDs participatein comparisons

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Дата
Msg-id CAH2-Wz=+ztshnJA39rRUi4yv90o17s3PdqRLOZ1q1tdkDw-OAw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Making all nbtree entries unique by having heap TIDs participatein comparisons  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Mon, Jan 28, 2019 at 1:41 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Thanks for going to the trouble of implementing what you have in mind!
>
> I agree that the code that I wrote within nbtsplitloc.c is very
> subtle, and I do think that I have further work to do to make its
> design clearer. I think that this high level description of the goals
> of the algorithm is inaccurate in subtle but important ways, though.
> Hopefully there will be a way of making it more understandable that
> preserves certain important characteristics.

Heikki and I had the opportunity to talk about this recently. We found
an easy way forward. I believe that the nbtsplitloc.c algorithm itself
is fine -- the code will need to be refactored, though.

nbtsplitloc.c can be refactored to assemble a list of legal split
points up front, before deciding which one to go with in a separate
pass (using one of two "alternative modes", as before). I now
understand that Heikki simply wants to separate the questions of "Is
this candidate split point legal?" from "Is this known-legal candidate
split point good/ideal based on my current criteria?". This seems like
a worthwhile goal to me. Heikki accepts the need for multiple
modes/passes, provided recursion isn't used in the implementation.

It's clear to me that the algorithm should start off trying to split
towards the middle of the page (or towards the end in the rightmost
case), while possibly making a small compromise on the exact split
point to maximize the effectiveness of suffix truncation. We must
change strategy entirely if and only if the middle of the page (or
wherever we'd like to split initially) is found to be completely full
of duplicates -- that's where the need for a second pass comes in.
This should almost never happen in most applications. Even when it
happens, we only care about not splitting inside a group of
duplicates. That's not the same thing as caring about maximizing the
number of attributes truncated away. Those two things seem similar,
but are actually very different.

It might have sounded like Heikki and I disagreed on the design of the
algorithm at a high level, or what its goals ought to be. That is not
the case, though. (At least not so far.)

-- 
Peter Geoghegan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Haribabu Kommi
Дата:
Сообщение: Re: Libpq support to connect to standby server as priority
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] PATCH: multivariate histograms and MCV lists