Обсуждение: Speed up collation cache
The blog post here (thank you depesz!): https://www.depesz.com/2024/06/11/how-much-speed-youre-leaving-at-the-table-if-you-use-default-locale/ showed an interesting result where the builtin provider is not quite as fast as "C" for queries like: SELECT * FROM a WHERE t = '...'; The reason is that it's calling varstr_cmp() many times, which does a lookup in the collation cache for each call. For sorts, it only does a lookup in the collation cache once, so the effect is not significant. The reason looking up "C" is faster is because there's a special check for C_COLLATION_OID, so it doesn't even need to do the hash lookup. If you create an equivalent collation like: CREATE COLLATION libc_c(PROVIDER = libc, LOCALE = 'C'); it will perform the same as a collation with the builtin provider. Attached is a patch to use simplehash.h instead, which speeds things up enough to make them fairly close (from around 15% slower to around 8%). The patch is based on the series here: https://postgr.es/m/f1935bc481438c9d86c2e0ac537b1c110d41a00a.camel@j-davis.com which does some refactoring in a related area, but I can make them independent. We can also consider what to do about those special cases: * add a special case for PG_C_UTF8? * instead of a hardwired set of special collation IDs, have a single- element "last collation ID" to check before doing the hash lookup? * remove the special cases entirely if we can close the performance gap enough that it's not important? (Note: the special case in lc_ctpye_is_c() is currently required for correctness because hba.c uses C_COLLATION_OID for regexes before the syscache is initialized. That can be fixed pretty easily a couple different ways, though.) -- Jeff Davis PostgreSQL Contributor Team - AWS
Вложения
On 15.06.24 01:46, Jeff Davis wrote: > * instead of a hardwired set of special collation IDs, have a single- > element "last collation ID" to check before doing the hash lookup? I'd imagine that method could be very effective.
On Sat, Jun 15, 2024 at 6:46 AM Jeff Davis <pgsql@j-davis.com> wrote: > Attached is a patch to use simplehash.h instead, which speeds things up > enough to make them fairly close (from around 15% slower to around 8%). +#define SH_HASH_KEY(tb, key) hash_uint32((uint32) key) For a static inline hash for speed reasons, we can use murmurhash32 here, which is also inline.