Re: speed up unicode decomposition and recomposition

Поиск

Список

Период

Сортировка

От	John Naylor
Тема	Re: speed up unicode decomposition and recomposition
Дата	15 октября 2020 г. 20:59:38
Msg-id	CAFBsxsFFCbooybVsWDj_wzDLEKN04YqCRssQ5Li0bUyWvR9eVA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: speed up unicode decomposition and recomposition (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы	Re: speed up unicode decomposition and recomposition (Michael Paquier <michael@paquier.xyz>) Re: speed up unicode decomposition and recomposition ("Daniel Verite" <daniel@manitou-mail.org>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, Oct 15, 2020 at 1:30 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

At Wed, 14 Oct 2020 23:06:28 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in
> John Naylor <john.naylor@enterprisedb.com> writes:
> > With those points in mind and thinking more broadly, I'd like to try harder
> > on recomposition. Even several times faster, recomposition is still orders
> > of magnitude slower than ICU, as measured by Daniel Verite [1].
>
> Huh. Has anyone looked into how they do it?

I'm not sure it is that, but it would be that.. It uses separate
tables for decomposition and composition pointed from a trie?

I think I've seen a trie recommended somewhere, maybe the official website. That said, I was able to get the hash working for recomposition (split into a separate patch, and both of them now leave frontend alone), and I'm pleased to say it's 50-75x faster than linear search in simple tests. I'd be curious how it compares to ICU now. Perhaps Daniel Verite would be interested in testing again? (CC'd)

select count(normalize(t, NFC)) from (
select md5(i::text) as t from
generate_series(1,100000) as i
) s;

master patch
18800ms 257ms

select count(normalize(t, NFC)) from (
select repeat(U&'\00E4\00C5\0958\00F4\1EBF\3300\1FE2\3316\2465\322D', i % 3 + 1) as t from
generate_series(1,100000) as i
) s;

master patch
13000ms 254ms

John Naylor

EnterpriseDB: http://www.enterprisedb.com

The Enterprise PostgreSQL Company

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Justin Pryzby
Дата: 15 октября 2020 г., 20:57:25
Сообщение: Re: CREATE TABLE .. PARTITION OF fails to preserve tgenabled for inherited row triggers

Следующее

От: Tom Lane
Дата: 15 октября 2020 г., 21:17:21
Сообщение: Re: plan cache doesn't clean plans with references to dropped procedures

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: speed up unicode decomposition and recomposition

Вложения

Предыдущее

Следующее