Обсуждение: Re: Speed up ICU case conversion by using ucasemap_utf8To*()
On Fri, 2024-12-20 at 06:20 +0100, Andreas Karlsson wrote:
> SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
> "sv-SE-x-icu") FROM generate_series(1, 1000000) i);
>
> master: ~540 ms
> Patched: ~460 ms
> glibc: ~410 ms
It looks like you are opening and closing the UCaseMap object each
time. Why not save it in pg_locale_t? That should speed it up even more
and hopefully beat libc.
Also, to support older ICU versions consistently, we need to fix up the
locale name to support "und"; cf. pg_ucol_open(). Perhaps factor out
that logic?
Regards,
Jeff Davis
On 12/20/24 8:24 PM, Jeff Davis wrote:
> On Fri, 2024-12-20 at 06:20 +0100, Andreas Karlsson wrote:
>> SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
>> "sv-SE-x-icu") FROM generate_series(1, 1000000) i);
>>
>> master: ~540 ms
>> Patched: ~460 ms
>> glibc: ~410 ms
>
> It looks like you are opening and closing the UCaseMap object each
> time. Why not save it in pg_locale_t? That should speed it up even more
> and hopefully beat libc.
Fixed. New benchmarks are:
SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
"sv-SE-x-icu") FROM generate_series(1, 1000000) i);
master: ~570 ms
Patched: ~340 ms
glibc: ~400 ms
So it does indeed seem like we got a further speedup and now are faster
than glibc.
> Also, to support older ICU versions consistently, we need to fix up the
> locale name to support "und"; cf. pg_ucol_open(). Perhaps factor out
> that logic?
Fixed.
Andreas
Вложения
Hi Andreas, On the mailing list, I've noticed this patch. I tested its functionality and it works really well. I have a few minor, non-criticalcomments to share. In the `pg_ucasemap_open` function, the error message `casemap lookup failed:` doesn't seem ideal. This is because we'reopening the `UCaseMap` here, rather than performing a "lookup" operation. In the comment `Additional makes sure we get the right options for case folding.`, the word Additional seems inappropriate— `Additionally` would be a better replacement. -- Regards, Man Zeng www.openhalo.org
On 12/31/25 3:36 AM, zengman wrote: > On the mailing list, I've noticed this patch. I tested its functionality and it works really well. I have a few minor,non-critical comments to share. Thanks for trying it out! > In the `pg_ucasemap_open` function, the error message `casemap lookup failed:` doesn't seem ideal. This is because we'reopening the `UCaseMap` here, rather than performing a "lookup" operation. Fixed. > In the comment `Additional makes sure we get the right options for case folding.`, the word Additional seems inappropriate— `Additionally` would be a better replacement. Fixed. Andreas