Split interval (used by nbtree suffix truncation) and posting list tuples

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Split interval (used by nbtree suffix truncation) and posting list tuples
Дата
Msg-id CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
We choose a split point in nbtsplitloc.c primarily based on evenly
dividing space among left and right halves of the split, while giving
secondary consideration to suffix truncation (there is other logic
that kicks in when there are many duplicates, which isn't related to
what I want to talk about). See my commit fab25024 from Postgres 12
for background information.

The larger the split interval, the more unbalanced page splits are
allowed to be, which is a cost that we may be willing to pay to
truncate more suffix attributes in the new high key. Split interval
represents a trade-off between two competing considerations (a
potential cost versus a potential benefit). Unfortunately, commit
fab25024 defined "split interval" based on logic that makes the
assumption that tuples on the same page are more or less of a uniform
size. That assumption was questionable when suffix truncation went in,
but now that deduplication exists the assumption seems quite risky. I
am concerned that suffix truncation will be far more aggressive than
appropriate when the delta-optimal split point is near large posting
list tuples, that are size outliers on the page. The logic in
nbtsplitloc.c might accept a much more unbalanced page split than is
truly reasonable because the average width of tuples on the page isn't
so large, even though the tuples around where we want to split the
page are very large.

Fortunately it's pretty easy to nail this down. We can determine the
default strategy split interval based on the cost that we actually
care about (not a fuzzy proxy of that cost): a leftfree and rightfree
space tolerance from the space optimal candidate split point,
expressed in bytes (actually, expressed as a proportion of the total
space that is used for data items on the page, which is converted into
bytes to form a tolerance). That's the way it's done in the attached
patch.

I plan to commit this patch next week, barring any objections.

-- 
Peter Geoghegan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Build errors in VS
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Poll: are people okay with function/operator table redesign?