Обсуждение: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)

Поиск
Список
Период
Сортировка
Hi,

Currently, nbtree code compares each and every column of an index
tuple during the binary search on the index page. With large indexes
that have many duplicate prefix column values (e.g. an index on (bool,
bool, uuid) ) that means a lot of wasted time getting to the right
column.

The attached patch improves on that by doing per-page dynamic prefix
truncation: If we know that on both the right and left side there are
index tuples where the first two attributes are equal to the scan key,
we skip comparing those attributes at the current index tuple and
start with comparing attribute 3, saving two attribute compares. We
gain performance whenever comparing prefixing attributes is expensive
and when there are many tuples with a shared prefix - in unique
indexes this doesn't gain much, but we also don't lose much in this
case.

This patch was originally suggested at [0], but it was mentioned that
they could be pulled out into it's own thread. Earlier, the
performance gains were not clearly there for just this patch, but
after further benchmarking this patch stands on its own for
performance: it sees no obvious degradation of performance, while
gaining 0-5% across various normal indexes on the cc-complete sample
dataset, with the current worst-case index shape getting a 60%+
improved performance on INSERTs in the tests at [0].

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

PS. Best served with the downlink right separator/HIKEY optimization
(separate patch to be submitted soon), and specialization over at [0].

[0] https://www.postgresql.org/message-id/CAEze2WiqOONRQTUT1p_ZV19nyMA69UU2s0e2dp+jSBM=j8snuw@mail.gmail.com

Вложения
Hi

út 31. 10. 2023 v 22:12 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
Hi,

Currently, nbtree code compares each and every column of an index
tuple during the binary search on the index page. With large indexes
that have many duplicate prefix column values (e.g. an index on (bool,
bool, uuid) ) that means a lot of wasted time getting to the right
column.

The attached patch improves on that by doing per-page dynamic prefix
truncation: If we know that on both the right and left side there are
index tuples where the first two attributes are equal to the scan key,
we skip comparing those attributes at the current index tuple and
start with comparing attribute 3, saving two attribute compares. We
gain performance whenever comparing prefixing attributes is expensive
and when there are many tuples with a shared prefix - in unique
indexes this doesn't gain much, but we also don't lose much in this
case.

This patch was originally suggested at [0], but it was mentioned that
they could be pulled out into it's own thread. Earlier, the
performance gains were not clearly there for just this patch, but
after further benchmarking this patch stands on its own for
performance: it sees no obvious degradation of performance, while
gaining 0-5% across various normal indexes on the cc-complete sample
dataset, with the current worst-case index shape getting a 60%+
improved performance on INSERTs in the tests at [0].

+1

This can be nice functionality. I had a customer with a very slow index scan - the main problem was a long common prefix like prg010203xxxx.

Regards

Pavel


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

PS. Best served with the downlink right separator/HIKEY optimization
(separate patch to be submitted soon), and specialization over at [0].

[0] https://www.postgresql.org/message-id/CAEze2WiqOONRQTUT1p_ZV19nyMA69UU2s0e2dp+jSBM=j8snuw@mail.gmail.com
On Wed, 1 Nov 2023 at 07:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> út 31. 10. 2023 v 22:12 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
>> This patch was originally suggested at [0], but it was mentioned that
>> they could be pulled out into it's own thread. Earlier, the
>> performance gains were not clearly there for just this patch, but
>> after further benchmarking this patch stands on its own for
>> performance: it sees no obvious degradation of performance, while
>> gaining 0-5% across various normal indexes on the cc-complete sample
>> dataset, with the current worst-case index shape getting a 60%+
>> improved performance on INSERTs in the tests at [0].
>
>
> +1

Thanks for showing interest.

> This can be nice functionality. I had a customer with a very slow index scan - the main problem was a long common
prefixlike prg010203xxxx. 

I'll have to note that this patch doesn't cover cases where e.g. text
attributes have large shared prefixes, but are still unique: the
dynamic prefix compression in this patch is only implemented at the
tuple attribute level; it doesn't implement type aware dynamic prefix
compression inside the attributes. So, a unique index on a column of
int32 formatted like '%0100i' would not materially benefit from this
patch.

While would certainly be possible to add some type-level prefix
truncation in the framework of this patch, adding that would require
significant code churn in btree compare operators, because we'd need
an additional return argument to contain a numerical "shared prefix",
and that is not something I was planning to implement in this patch.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)





st 1. 11. 2023 v 11:32 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
On Wed, 1 Nov 2023 at 07:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> út 31. 10. 2023 v 22:12 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
>> This patch was originally suggested at [0], but it was mentioned that
>> they could be pulled out into it's own thread. Earlier, the
>> performance gains were not clearly there for just this patch, but
>> after further benchmarking this patch stands on its own for
>> performance: it sees no obvious degradation of performance, while
>> gaining 0-5% across various normal indexes on the cc-complete sample
>> dataset, with the current worst-case index shape getting a 60%+
>> improved performance on INSERTs in the tests at [0].
>
>
> +1

Thanks for showing interest.

> This can be nice functionality. I had a customer with a very slow index scan - the main problem was a long common prefix like prg010203xxxx.

I'll have to note that this patch doesn't cover cases where e.g. text
attributes have large shared prefixes, but are still unique: the
dynamic prefix compression in this patch is only implemented at the
tuple attribute level; it doesn't implement type aware dynamic prefix
compression inside the attributes. So, a unique index on a column of
int32 formatted like '%0100i' would not materially benefit from this
patch.

While would certainly be possible to add some type-level prefix
truncation in the framework of this patch, adding that would require
significant code churn in btree compare operators, because we'd need
an additional return argument to contain a numerical "shared prefix",
and that is not something I was planning to implement in this patch.

Thanks for the explanation. 

Pavel


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)
On Wed, Nov 1, 2023 at 2:42 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> Hi,
>
> Currently, nbtree code compares each and every column of an index
> tuple during the binary search on the index page. With large indexes
> that have many duplicate prefix column values (e.g. an index on (bool,
> bool, uuid) ) that means a lot of wasted time getting to the right
> column.
>
> The attached patch improves on that by doing per-page dynamic prefix
> truncation: If we know that on both the right and left side there are
> index tuples where the first two attributes are equal to the scan key,
> we skip comparing those attributes at the current index tuple and
> start with comparing attribute 3, saving two attribute compares. We
> gain performance whenever comparing prefixing attributes is expensive
> and when there are many tuples with a shared prefix - in unique
> indexes this doesn't gain much, but we also don't lose much in this
> case.
>
> This patch was originally suggested at [0], but it was mentioned that
> they could be pulled out into it's own thread. Earlier, the
> performance gains were not clearly there for just this patch, but
> after further benchmarking this patch stands on its own for
> performance: it sees no obvious degradation of performance, while
> gaining 0-5% across various normal indexes on the cc-complete sample
> dataset, with the current worst-case index shape getting a 60%+
> improved performance on INSERTs in the tests at [0].

+1 for the idea, I have some initial comments while reading through the patch.

1.
Commit message refers to a non-existing reference '(see [0])'.


2.
+When we do a binary search on a sorted set (such as a BTree), we know that a
+tuple will be smaller than its left neighbour, and larger than its right
+neighbour.

I think this should be 'larger than left neighbour and smaller than
right neighbour' instead of the other way around.

3.
+With the above optimizations, dynamic prefix truncation improves the worst
+case complexity of indexing from O(tree_height * natts * log(tups_per_page))
+to O(tree_height * (3*natts + log(tups_per_page)))

Where do the 3*natts come from?  Is it related to setting up the
dynamic prefix at each level?

4.
+ /*
+ * All tuple attributes are equal to the scan key, only later attributes
+ * could potentially not equal the scan key.
+ */
+ *compareattr = ntupatts + 1;

Can you elaborate on this more? If all tuple attributes are equal to
the scan key then what do those 'later attributes' mean?


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Fri, 19 Jan 2024 at 05:55, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Nov 1, 2023 at 2:42 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> >
> > Hi,
> >
> > Currently, nbtree code compares each and every column of an index
> > tuple during the binary search on the index page. With large indexes
> > that have many duplicate prefix column values (e.g. an index on (bool,
> > bool, uuid) ) that means a lot of wasted time getting to the right
> > column.
> >
> > The attached patch improves on that by doing per-page dynamic prefix
> > truncation: If we know that on both the right and left side there are
> > index tuples where the first two attributes are equal to the scan key,
> > we skip comparing those attributes at the current index tuple and
> > start with comparing attribute 3, saving two attribute compares. We
> > gain performance whenever comparing prefixing attributes is expensive
> > and when there are many tuples with a shared prefix - in unique
> > indexes this doesn't gain much, but we also don't lose much in this
> > case.
> >
> > This patch was originally suggested at [0], but it was mentioned that
> > they could be pulled out into it's own thread. Earlier, the
> > performance gains were not clearly there for just this patch, but
> > after further benchmarking this patch stands on its own for
> > performance: it sees no obvious degradation of performance, while
> > gaining 0-5% across various normal indexes on the cc-complete sample
> > dataset, with the current worst-case index shape getting a 60%+
> > improved performance on INSERTs in the tests at [0].
>
> +1 for the idea, I have some initial comments while reading through the patch.

Thank you for the review.

> 1.
> Commit message refers to a non-existing reference '(see [0])'.

Noted, I'll update that.

> 2.
> +When we do a binary search on a sorted set (such as a BTree), we know that a
> +tuple will be smaller than its left neighbour, and larger than its right
> +neighbour.
>
> I think this should be 'larger than left neighbour and smaller than
> right neighbour' instead of the other way around.

Noted, will be fixed, too.

> 3.
> +With the above optimizations, dynamic prefix truncation improves the worst
> +case complexity of indexing from O(tree_height * natts * log(tups_per_page))
> +to O(tree_height * (3*natts + log(tups_per_page)))
>
> Where do the 3*natts come from?  Is it related to setting up the
> dynamic prefix at each level?

Yes: We need to establish prefixes for both a tuple that's ahead of
the to-be-compared tuple, and one that's after the to-be-compared
tuple. Assuming homogenous random distribution of scan key accesses
across the page (not always the case, but IMO a reasonable starting
point) this would average to 3 unprefixed compares before you have
established both a higher and a lower prefix.

> 4.
> + /*
> + * All tuple attributes are equal to the scan key, only later attributes
> + * could potentially not equal the scan key.
> + */
> + *compareattr = ntupatts + 1;
>
> Can you elaborate on this more? If all tuple attributes are equal to
> the scan key then what do those 'later attributes' mean?

In inner pages, tuples may not have all key attributes, as some may
have been truncated away in page splits. So, tuples that have at least
the same prefix as this (potentially truncated) tuple will need to be
compared starting at the first missing attribute of this tuple, i.e.
ntupatts + 1.

Kind regards,

Matthias van de Meent



On Wed, 24 Jan 2024 at 13:02, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> > 1.
> > Commit message refers to a non-existing reference '(see [0])'.
>
> Noted, I'll update that.
>
> > 2.
> > +When we do a binary search on a sorted set (such as a BTree), we know that a
> > +tuple will be smaller than its left neighbour, and larger than its right
> > +neighbour.
> >
> > I think this should be 'larger than left neighbour and smaller than
> > right neighbour' instead of the other way around.
>
> Noted, will be fixed, too.

Attached is version 15 of this patch, with the above issues fixed.
It's also rebased on top of 655dc310 of this morning, so that should
keep good for some time again.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Вложения