Re: Fuzzy substring searching with the pg_trgm extension
| От | Teodor Sigaev |
|---|---|
| Тема | Re: Fuzzy substring searching with the pg_trgm extension |
| Дата | |
| Msg-id | 56BC7EF4.2030903@sigaev.ru обсуждение исходный текст |
| Ответ на | Re: Fuzzy substring searching with the pg_trgm extension (Alvaro Herrera <alvherre@2ndquadrant.com>) |
| Список | pgsql-hackers |
>>> The behavior of this function is surprising to me.
>>>
>>> select substring_similarity('dog' , 'hotdogpound') ;
>>>
>>> substring_similarity
>>> ----------------------
>>> 0.25
>>>
>> Substring search was desined to search similar word in string:
>> contrib_regression=# select substring_similarity('dog' , 'hot dogpound') ;
>> substring_similarity
>> ----------------------
>> 0.75
>>
>> contrib_regression=# select substring_similarity('dog' , 'hot dog pound') ;
>> substring_similarity
>> ----------------------
>> 1
>
> Hmm, this behavior looks too much like magic to me. I mean, a substring
> is a substring -- why are we treating the space as a special character
> here?
Because it isn't a regex for substring search. Since implementing, pg_trgm
works over words in string.
contrib_regression=# select similarity('block hole', 'hole black'); similarity
------------ 0.571429
contrib_regression=# select similarity('block hole', 'black hole'); similarity
------------ 0.571429
It ignores spaces between words and word's order.
I agree, that substring_similarity is confusing name, but actually it search
most similar word in second arg to first arg and returns their similarity.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
В списке pgsql-hackers по дате отправления: