pgsql: Consider outliers in split interval calculation.

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема pgsql: Consider outliers in split interval calculation.
Дата
Msg-id E1jQwFb-0001CQ-QI@gemulon.postgresql.org
обсуждение исходный текст
Список pgsql-committers
Consider outliers in split interval calculation.

Commit 0d861bbb, which introduced deduplication to nbtree, added some
logic to take large posting list tuples into account when choosing a
split point.  We subtract firstright posting list overhead from the
projected new high key size when calculating leftfree/rightfree values
for an affected candidate split point.  Posting list tuples aren't
special to nbtsplitloc.c, but taking them into account like this makes a
huge difference in practice.  Posting list tuples are frequently tuple
size outliers.

However, commit 0d861bbb missed a closely related issue: split interval
itself is calculated based on the assumption that tuples on the page
being split are roughly equisized.  That assumption was acceptable back
when commit fab25024 taught the logic for choosing a split point about
suffix truncation, but it's pretty questionable now that very large
tuple sizes are common.  This oversight led to unbalanced page splits in
low cardinality multi-column indexes when deduplication was used: page
splits that don't give sufficient weight to how unbalanced the split is
when the interval happens to include some large posting list tuples (and
when most other tuples on the page are not so large).

Nail this down by calculating an initial split interval in a way that's
attuned to the actual cost that we want to keep under control (not a
fuzzy proxy for the cost): apply a leftfree + rightfree evenness test to
each candidate split point that actually gets included in the split
interval (for the default strategy).  This replaces logic that used a
percentage of all legal split points for the page as the basis of the
initial split interval.

Discussion: https://postgr.es/m/CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1542e16f2ca257656717ab8ea263bc310f1b6d51

Modified Files
--------------
src/backend/access/nbtree/nbtsplitloc.c | 104 +++++++++++++++++++++++++-------
1 file changed, 83 insertions(+), 21 deletions(-)


В списке pgsql-committers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: pgsql: Allow matchingsel() to be used with operators that might return
Следующее
От: Alvaro Herrera
Дата:
Сообщение: pgsql: Fix detaching partitions with cloned row triggers