Re: Speed up ICU case conversion by using ucasemap_utf8To*()
От | vignesh C |
---|---|
Тема | Re: Speed up ICU case conversion by using ucasemap_utf8To*() |
Дата | |
Msg-id | CALDaNm1yY_Jth4TkfLJr88hKEgtC6vPfomNnfPnYebe0QtQECQ@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: Speed up ICU case conversion by using ucasemap_utf8To*()
|
Список | pgsql-hackers |
On Fri, 20 Dec 2024 at 10:50, Andreas Karlsson <andreas@proxel.se> wrote: > > Hi, > > Jeff pointed out to me that the case conversion functions in ICU have > UTF-8 specific versions which means we can call those directly if the > database encoding is UTF-8 and skip having to convert to and from UChar. > > Since most people today run their databases in UTF-8 I think this > optimization is worth it and when measuring on short to medium length > strings I got a 15-20% speed up. It is still slower than glibc in my > benchmarks but the gap is smaller now. > > SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE > "sv-SE-x-icu") FROM generate_series(1, 1000000) i); > > master: ~540 ms > Patched: ~460 ms > glibc: ~410 ms > > I have also attached a clean up patch for the non-UTF-8 code paths. I > thought about doing the same for the new UTF-8 code paths but it turned > out to be a bit messy due to different function signatures for > ucasemap_utf8ToUpper() and ucasemap_utf8ToLower() vs ucasemap_utf8ToTitle(). I noticed that Jeff's comments from [1] have not yet been addressed, I have changed the commitfest entry status to "Waiting on Author", please address them and update it to "Needs Review". [1] - https://www.postgresql.org/message-id/72c7c2b5848da44caddfe0f20f6c7ebc7c0c6e60.camel@j-davis.com Regards, Vignesh
В списке pgsql-hackers по дате отправления: