Re: Improve the performance of Unicode Normalization Forms.
От | Alexander Borisov |
---|---|
Тема | Re: Improve the performance of Unicode Normalization Forms. |
Дата | |
Msg-id | 7859e5ef-a574-4199-a69b-6fee26711521@gmail.com обсуждение исходный текст |
Ответ на | Re: Improve the performance of Unicode Normalization Forms. (Alexander Borisov <lex.borisov@gmail.com>) |
Ответы |
Re: Improve the performance of Unicode Normalization Forms.
|
Список | pgsql-hackers |
Hi, Jeff, hackers! As promised, refactoring the C code for Unicode Normalization Forms. In general terms, here's what has changed: 1. Recursion has been removed; now data is generated using a Perl script. 2. Memory is no longer allocated for uint32 for the entire size, but uint8 is allocated for the entire size for the CCC cache, which boosts performance significantly. 3. The code for the unicode_normalize() function has been completely rewritten. I am confident that we have achieved excellent results. Jeff's test: Without patch: Normalization from NFC to NFD with PG: 009.121 Normalization from NFC to NFKD with PG: 009.048 Normalization from NFD to NFC with PG: 014.525 Normalization from NFD to NFKC with PG: 014.380 Whith patch: Normalization from NFC to NFD with PG: 001.580 Normalization from NFC to NFKD with PG: 001.634 Normalization from NFD to NFC with PG: 002.979 Normalization from NFD to NFKC with PG: 003.050 Test with ICU (with path and ICU): Normalization from NFC to NFD with PG: 001.580 Normalization from NFC to NFD with ICU: 001.880 Normalization from NFC to NFKD with PG: 001.634 Normalization from NFC to NFKD with ICU: 001.857 Normalization from NFD to NFC with PG: 002.979 Normalization from NFD to NFC with ICU: 001.144 Normalization from NFD to NFKC with PG: 003.050 Normalization from NFD to NFKC with ICU: 001.260 pgbench: The files were sent via pgbench. The files contain all code points that need to be normalized. NFC: Patch: tps = 9701.568161 Without: tps = 6820.828104 NFD: Patch: tps = 2707.155148 Without: tps = 1745.949174 NFKC: Patch: tps = 9893.952804 Without: tps = 6697.358888 NFKD: Patch: tps = 2580.785909 Without: tps = 1521.058417 To ensure fairness in testing with ICU, I corrected Jeff's patch; we calculate the size of the final buffer, and I placed ICU in the same position. I'm talking about: Get size: length = unorm_normalize(u_input, -1, form, 0, NULL, 0, &status); Normalize: unorm_normalize(u_input, -1, form, 0, u_result, length, &status); Otherwise, it turned out that we were giving the ICU some huge buffer, and it was writing to it. And we ourselves calculate what buffer we need. -- Regards, Alexander Borisov
Вложения
В списке pgsql-hackers по дате отправления: