Re: Patch: add conversion from pg_wchar to multibyte

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: Patch: add conversion from pg_wchar to multibyte
Дата	1 мая 2012 г. 21:46:33
Msg-id	CAPpHfduU=dm8hJFWrYUeSG-H6YYPGu1pjQD-CDU9D10_4Bwn_w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Patch: add conversion from pg_wchar to multibyte ("Erik Rijkers" <er@xs4all.nl>)
Список	pgsql-hackers

Дерево обсуждения

Hi Erik

On Sun, Apr 29, 2012 at 4:12 PM, Erik Rijkers <er@xs4all.nl> wrote:

Perhaps I'm too early with these tests, but FWIW I reran my earlier test program against three
instances. (the patches compiled fine, and make check was without problem).

-- 3 instances:
HEAD port 6542
trgm_regex port 6547 HEAD + trgm-regexp patch (22 Nov 2011) [1]
trgm_regex_wchar2mb port 6549 HEAD + trgm-regexp + wchar2mb patch (23 Apr 2012) [2]

Actually wchar2mb patch doesn't affect behaviour of trgm-regexp. It provide correct way to do some work of encoding conversion which last published version of trgm-regexp does internally. So "HEAD + trgm-regexp patch" and "HEAD + trgm-regexp + wchar2mb patch" should behave similarly.

[1] http://archives.postgresql.org/pgsql-hackers/2011-11/msg01297.php
[2] http://archives.postgresql.org/pgsql-hackers/2012-04/msg01095.php

-- table sizes:
azjunk4 10^4 rows 1 MB
azjunk5 10^5 rows 11 MB
azjunk6 10^6 rows 112 MB
azjunk7 10^7 rows 1116 MB

for table creation/structure, see:
[3] http://archives.postgresql.org/pgsql-hackers/2012-01/msg01094.php

Results for three instances with 4 repetitions per instance are attached.

Although the regexes I chose are somewhat arbitrary, it does show some of the good, the bad and
the ugly of the patch(es). (Also: I've limited the tests to a range of 'workable' regexps, i.e.
avoiding unbounded regexps)

Thank you for testing!

Such synthetical tests are very valuable for finding corner cases of the patch, bugs etc.

But also, it would be nice to do some tests on reallife datasets with reallife regexps in order to see real benefit of this approach of indexing and do some comparison with other approaches. May be you or somebody else could obtain such datasets?

Also, I did some optimizations in algorithm. Automaton analysis stage should become less CPU and memory consuming. I'll publish new version soon.

------

With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Hannu Krosing
Дата: 01 мая 2012 г., 21:29:31
Сообщение: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

Следующее

От: Alexander Korotkov
Дата: 01 мая 2012 г., 22:03:01
Сообщение: Re: Patch: add conversion from pg_wchar to multibyte

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Patch: add conversion from pg_wchar to multibyte

Предыдущее

Следующее