Re: pg_trgm version 1.2

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: pg_trgm version 1.2
Дата
Msg-id CAPpHfdskHSDixhh6-95vDefNwUuwTRYwnNs0xfRCE6BtSxUw+Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_trgm version 1.2  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
On Wed, Jul 15, 2015 at 12:31 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Jul 7, 2015 at 6:33 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:


See Tom Lane's comment about downgrade scripts. I think just remove it is a right solution.

The new patch removes the downgrade path and the ability to install the old version.

(If anyone wants an easy downgrade path for testing, they can keep using the prior patch--no functional changes)

It also added a comment before the trigramsMatchGraph call.

I retained the palloc and the loop to promote the ternary array to a binary array.  While I also think it is tempting to get rid of that by abusing the type system and would do it that way in my own standalone code, it seems contrary to way the project usually does things.  And I couldn't measure a difference in performance.

Ok!
 
....

 
Let's consider '^(?!.*def).*abc' regular expression as an example. It matches strings which contains 'abc' and don't contains 'def'.

# SELECT 'abc' ~ '^(?!.*def).*abc';
 ?column?
----------
 t
(1 row)

# SELECT 'def abc' ~ '^(?!.*def).*abc';
 ?column?
----------
 f
(1 row)

# SELECT 'abc def' ~ '^(?!.*def).*abc';
 ?column?
----------
 f
(1 row)

Theoretically, our trigram regex processing could support negative matching of 'def' trigram, i.e. trigramsMatchGraph(abc = true, def = false) = true but trigramsMatchGraph(abc = true, def = true) = false. Actually, it doesn't because trigramsMatchGraph() implements a monotonic function. I just think it should be stated explicitly.

Do you think it is likely to change to stop being monotonic and so support the (def=GIN_TRUE) => false case?

^(?!.*def)   seems like a profoundly niche situation.  (Although one that I might actually start using myself now that I know it isn't just a Perl-ism).

It doesn't make any difference to this patch, other than perhaps how to word the comments.

Yes, it was just about the comments.

I also run few tests on real-life dataset: set of dblp paper titles. As it was expected, there is huge speedup when pattern is long enough.

# SELECT count(*) FROM dblp_titles WHERE s ILIKE '%Multidimensional Data%';
 count
-------
   318
(1 row)

Time: 29,114 ms

# SELECT count(*) FROM dblp_titles WHERE s ~* '.*Multidimensional Data.*';
 count
-------
   318
(1 row)

Time: 26,273 ms

# SELECT count(*) FROM dblp_titles WHERE s ILIKE '%Multidimensional Data%';
 count
-------
   318
(1 row)

Time: 249,725 ms

# SELECT count(*) FROM dblp_titles WHERE s ~* '.*Multidimensional Data.*';
 count
-------
   318
(1 row)

Time: 218,627 ms

For me, patch is in the pretty good shape. I'm going to mark it "Ready for committer".

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: All-zero page in GIN index causes assertion failure
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: First Aggregate Funtion?