Re: Patch: add conversion from pg_wchar to multibyte

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Patch: add conversion from pg_wchar to multibyte
Дата
Msg-id CAPpHfduU=dm8hJFWrYUeSG-H6YYPGu1pjQD-CDU9D10_4Bwn_w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Patch: add conversion from pg_wchar to multibyte  ("Erik Rijkers" <er@xs4all.nl>)
Список pgsql-hackers
Hi Erik


On Sun, Apr 29, 2012 at 4:12 PM, Erik Rijkers <er@xs4all.nl> wrote:
Perhaps I'm too early with these tests, but FWIW I reran my earlier test program against three
instances.  (the patches compiled fine, and make check was without problem).

-- 3 instances:
HEAD                 port 6542
trgm_regex           port 6547  HEAD + trgm-regexp patch (22 Nov 2011) [1]
trgm_regex_wchar2mb  port 6549  HEAD + trgm-regexp + wchar2mb patch (23 Apr 2012) [2]

Actually wchar2mb patch doesn't affect behaviour of trgm-regexp. It provide correct way to do some work of encoding conversion which last published version of trgm-regexp does internally. So "HEAD + trgm-regexp patch" and "HEAD + trgm-regexp + wchar2mb patch" should behave similarly.
 
[1] http://archives.postgresql.org/pgsql-hackers/2011-11/msg01297.php
[2] http://archives.postgresql.org/pgsql-hackers/2012-04/msg01095.php

-- table sizes:
 azjunk4  10^4 rows     1 MB
 azjunk5  10^5 rows    11 MB
 azjunk6  10^6 rows   112 MB
 azjunk7  10^7 rows  1116 MB

for table creation/structure, see:
[3] http://archives.postgresql.org/pgsql-hackers/2012-01/msg01094.php

Results for three instances with 4 repetitions per instance are attached.

Although the regexes I chose are somewhat arbitrary, it does show some of the good, the bad and
the ugly of the patch(es).  (Also: I've limited the tests to a range of 'workable' regexps, i.e.
avoiding unbounded regexps)

Thank you for testing!
Such synthetical tests are very valuable for finding corner cases of the patch, bugs etc.
But also, it would be nice to do some tests on reallife datasets with reallife regexps in order to see real benefit of this approach of indexing and do some comparison with other approaches. May be you or somebody else could obtain such datasets?

Also, I did some optimizations in algorithm. Automaton analysis stage should become less CPU and memory consuming. I'll publish new version soon.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: Patch: add conversion from pg_wchar to multibyte