Обсуждение: Speed up collation cache

Поиск

Список

Период

Сортировка

Speed up collation cache

От

Jeff Davis

Дата:

14 июня 2024 г., 23:46:39

The blog post here (thank you depesz!):

https://www.depesz.com/2024/06/11/how-much-speed-youre-leaving-at-the-table-if-you-use-default-locale/

showed an interesting result where the builtin provider is not quite as
fast as "C" for queries like:

   SELECT * FROM a WHERE t = '...';

The reason is that it's calling varstr_cmp() many times, which does a
lookup in the collation cache for each call. For sorts, it only does a
lookup in the collation cache once, so the effect is not significant.

The reason looking up "C" is faster is because there's a special check
for C_COLLATION_OID, so it doesn't even need to do the hash lookup. If
you create an equivalent collation like:

   CREATE COLLATION libc_c(PROVIDER = libc, LOCALE = 'C');

it will perform the same as a collation with the builtin provider.

Attached is a patch to use simplehash.h instead, which speeds things up
enough to make them fairly close (from around 15% slower to around 8%).

The patch is based on the series here:

https://postgr.es/m/f1935bc481438c9d86c2e0ac537b1c110d41a00a.camel@j-davis.com

which does some refactoring in a related area, but I can make them
independent.

We can also consider what to do about those special cases:

  * add a special case for PG_C_UTF8?
  * instead of a hardwired set of special collation IDs, have a single-
element "last collation ID" to check before doing the hash lookup?
  * remove the special cases entirely if we can close the performance
gap enough that it's not important?

(Note: the special case in lc_ctpye_is_c() is currently required for
correctness because hba.c uses C_COLLATION_OID for regexes before the
syscache is initialized. That can be fixed pretty easily a couple
different ways, though.)

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Вложения

v2-0007-Change-collation-cache-to-use-simplehash.h.patch

Re: Speed up collation cache

От

Peter Eisentraut

Дата:

19 июня 2024 г., 08:10:09

On 15.06.24 01:46, Jeff Davis wrote:
>    * instead of a hardwired set of special collation IDs, have a single-
> element "last collation ID" to check before doing the hash lookup?

I'd imagine that method could be very effective.

Re: Speed up collation cache

От

John Naylor

Дата:

20 июня 2024 г., 10:07:23

On Sat, Jun 15, 2024 at 6:46 AM Jeff Davis <pgsql@j-davis.com> wrote:
> Attached is a patch to use simplehash.h instead, which speeds things up
> enough to make them fairly close (from around 15% slower to around 8%).

+#define SH_HASH_KEY(tb, key)   hash_uint32((uint32) key)

For a static inline hash for speed reasons, we can use murmurhash32
here, which is also inline.

Re: Speed up collation cache

От

Jeff Davis

Дата:

26 июля 2024 г., 21:00:31

On Thu, 2024-06-20 at 17:07 +0700, John Naylor wrote:
> On Sat, Jun 15, 2024 at 6:46 AM Jeff Davis <pgsql@j-davis.com> wrote:
> > Attached is a patch to use simplehash.h instead, which speeds
> > things up
> > enough to make them fairly close (from around 15% slower to around
> > 8%).
>
> +#define SH_HASH_KEY(tb, key)   hash_uint32((uint32) key)
>
> For a static inline hash for speed reasons, we can use murmurhash32
> here, which is also inline.

Thank you, that brings it down a few more percentage points.

New patches attached, still based on the setlocale-removal patch
series.

Setup:

  create collation libc_c (provider=libc, locale='C');
  create table collation_cache_test(t text);
  insert into collation_cache_test
    select g::text||' '||g::text
      from generate_series(1,200000000) g;

Queries:

  select * from collation_cache_test where t < '0' collate "C";
  select * from collation_cache_test where t < '0' collate libc_c;

The two collations are identical except that the former benefits from
the optimization for C_COLLATION_OID, and the latter does not, so these
queries measure the overhead of the collation cache lookup.

Results (in ms):

              "C"   "libc_c"   overhead
   master:    6350     7855     24%
   v4-0001:   6091     6324      4%

(Note: I don't have an explanation for the difference in performance of
the "C" locale -- probably just some noise in the test.)

Considering that simplehash brings the worst case overhead under 5%, I
don't see a big reason to use the single-element cache also.

Regards,
    Jeff Davis

Вложения

v4-0006-Change-collation-cache-to-use-simplehash.h.patch

Re: Speed up collation cache

От

Andreas Karlsson

Дата:

27 июля 2024 г., 22:14:56

On 7/26/24 11:00 PM, Jeff Davis wrote:
> Results (in ms):
> 
>                "C"   "libc_c"   overhead
>     master:    6350     7855     24%
>     v4-0001:   6091     6324      4%

I got more overhead in my quick benchmarking when I ran the same 
benchmark. Also tried your idea with caching the last lookup (PoC patch 
attached) and it basically removed all overhead, but I guess it will not 
help if you have two different non.default locales in the same query.

             "C"   "libc_c" overhead
before:     6695  8376     25%
after:      6605  7340     11%
cache last: 6618  6677      1%

But even without that extra optimization I think this patch is worth 
merging and the patch is small, simple and clean and easy to understand 
and a just a clear speed up. Feels like a no brainer. I think that it is 
ready for committer.

And then we can discuss after committing if an additional cache of the 
last locale is worth it or not.

Andreas

Вложения

0001-WIP-Ugly-caching-of-last-locale.patch

Re: Speed up collation cache

От

Jeff Davis

Дата:

28 июля 2024 г., 20:02:20

On Sun, 2024-07-28 at 00:14 +0200, Andreas Karlsson wrote:
> But even without that extra optimization I think this patch is worth
> merging and the patch is small, simple and clean and easy to
> understand
> and a just a clear speed up. Feels like a no brainer. I think that it
> is
> ready for committer.

Committed, thank you.

> And then we can discuss after committing if an additional cache of
> the
> last locale is worth it or not.

Yeah, I'm holding off on that until refactoring in the area settles,
and we'll see if it's still worth it.

Regards,
    Jeff Davis

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Speed up collation cache

Speed up collation cache

Вложения

Re: Speed up collation cache

Re: Speed up collation cache

Re: Speed up collation cache

Вложения

Re: Speed up collation cache

Вложения

Re: Speed up collation cache