Hi Erik
On Sun, Apr 29, 2012 at 4:12 PM, Erik Rijkers
<er@xs4all.nl> wrote:
Perhaps I'm too early with these tests, but FWIW I reran my earlier test program against three
instances. (the patches compiled fine, and make check was without problem).
-- 3 instances:
HEAD port 6542
trgm_regex port 6547 HEAD + trgm-regexp patch (22 Nov 2011) [1]
trgm_regex_wchar2mb port 6549 HEAD + trgm-regexp + wchar2mb patch (23 Apr 2012) [2]
Actually wchar2mb patch doesn't affect behaviour of trgm-regexp. It provide correct way to do some work of encoding conversion which last published version of trgm-regexp does internally. So "HEAD + trgm-regexp patch" and "HEAD + trgm-regexp + wchar2mb patch" should behave similarly.
[1] http://archives.postgresql.org/pgsql-hackers/2011-11/msg01297.php
[2] http://archives.postgresql.org/pgsql-hackers/2012-04/msg01095.php
-- table sizes:
azjunk4 10^4 rows 1 MB
azjunk5 10^5 rows 11 MB
azjunk6 10^6 rows 112 MB
azjunk7 10^7 rows 1116 MB
for table creation/structure, see:
[3] http://archives.postgresql.org/pgsql-hackers/2012-01/msg01094.php
Results for three instances with 4 repetitions per instance are attached.
Although the regexes I chose are somewhat arbitrary, it does show some of the good, the bad and
the ugly of the patch(es). (Also: I've limited the tests to a range of 'workable' regexps, i.e.
avoiding unbounded regexps)
Thank you for testing!
Such synthetical tests are very valuable for finding corner cases of the patch, bugs etc.
But also, it would be nice to do some tests on reallife datasets with reallife regexps in order to see real benefit of this approach of indexing and do some comparison with other approaches. May be you or somebody else could obtain such datasets?
Also, I did some optimizations in algorithm. Automaton analysis stage should become less CPU and memory consuming. I'll publish new version soon.
------
With best regards,
Alexander Korotkov.