Re: Optimization for lower(), upper(), casefold() functions.
От | Alexander Borisov |
---|---|
Тема | Re: Optimization for lower(), upper(), casefold() functions. |
Дата | |
Msg-id | e9057c71-93ec-4a21-b02e-979c7da2b665@gmail.com обсуждение исходный текст |
Ответ на | Re: Optimization for lower(), upper(), casefold() functions. (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Optimization for lower(), upper(), casefold() functions.
|
Список | pgsql-hackers |
19.02.2025 01:02, Jeff Davis пишет: > On Tue, 2025-02-11 at 23:08 +0300, Alexander Borisov wrote: >> I tried the approach via a range table. The result was worse than >> without the table. With branching in a function, the result is >> better. >> >> Patch v3 — ranges binary search by branches. >> Patch v4 — ranges binary search by table. > > Thoughts on v3: > > It looks like the top 5 bits of the offset are unused. What if we used > those bits for flags to indicate: > > HAS_LOWER > HAS_UPPER > HAS_FOLD > HAS_SPECIAL > HAS_TITLE > > That way, we only need to look in the corresponding table if it > actually has an entry other than the codepoint itself. > > It doesn't leave a lot of room if the tables get larger, but if we are > worried about that, we could eliminate HAS_TITLE, because I don't think > the performance for INITCAP() is as important as LOWER/UPPER/CASEFOLD. > > Regards, > Jeff Davis > It seems to be micro-optimizations. In the sense that the code is very clear and understandable now. These micro-optimizations make the code more complex, and there will be no performance gain. In proposing the patch for v3, I struck a balance between improving performance and reducing binary size, without sacrificing code clarity. We can go into micro-optimizations now and lose the essence of the proposed improvements. And ahead of us we have Unicode Normalization Forms. I'm willing to try different approaches, but there's confidence that we've found a great optimization right now. -- Alexander Borisov
В списке pgsql-hackers по дате отправления: