Обсуждение: sound index

Поиск
Список
Период
Сортировка

sound index

От
"Nikolay Samokhvalov"
Дата:
hello.

does anybody know any solutions to the problem of searching
words/phrases, which are close to each other by sounding? e.g. soundex
index or smth.

problem I have: tag suggestion mechanism, similar to google suggest,
which is intended to suggest names of people (search field "person's
name" in web form). it would be great if it does its work smarter than
simple LIKE.

also, i'd be happy to listen opinions from people who have experience
of usage of such things like soundex.

--
Best regards,
Nikolay

Re: sound index

От
Martijn van Oosterhout
Дата:
On Tue, Apr 11, 2006 at 05:28:12AM -0700, Nikolay Samokhvalov wrote:
> hello.
>
> does anybody know any solutions to the problem of searching
> words/phrases, which are close to each other by sounding? e.g. soundex
> index or smth.

Check out contrib/fuzzystrmatch. It has a number of such algorithms.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Вложения

Re: sound index

От
Scott Ribe
Дата:
> also, i'd be happy to listen opinions from people who have experience
of usage
> of such things like soundex.


Soundex is grossly outdated. It was designed for manual use by 19th century
census takers, and I'm always surprised to see it still used. Metaphone
(google search gets good results) does a much better job of matching names,
and double metaphone does even better although having each word mapped to
possibly 2 equivalents might complicate your logic depending on your
queries.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: sound index

От
Scott Ribe
Дата:
>> also, i'd be happy to listen opinions from people who have experience
> of usage
>> of such things like soundex.
>
>
> Soundex is grossly outdated. It was designed for manual use by 19th century
> census takers, and I'm always surprised to see it still used. Metaphone
> (google search gets good results) does a much better job of matching names,
> and double metaphone does even better although having each word mapped to
> possibly 2 equivalents might complicate your logic depending on your
> queries.

I remember now that over the years I found a few places where Metaphone
needed improvement. Double Metaphone seemed to incorporate all my revisions,
so the best approach would be to start with it, and if your system can't
accommodate the notion of multiple equivalents, then just use the primary.

--
Scott Ribe
scott_ribe@killerbytes.com
http://www.killerbytes.com/
(303) 722-0567 voice



Re: sound index

От
Teodor Sigaev
Дата:
Have a look at contrib/pg_trgm

Nikolay Samokhvalov wrote:
> hello.
>
> does anybody know any solutions to the problem of searching
> words/phrases, which are close to each other by sounding? e.g. soundex
> index or smth.
>
> problem I have: tag suggestion mechanism, similar to google suggest,
> which is intended to suggest names of people (search field "person's
> name" in web form). it would be great if it does its work smarter than
> simple LIKE.
>
> also, i'd be happy to listen opinions from people who have experience
> of usage of such things like soundex.
>
> --
> Best regards,
> Nikolay
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: sound index

От
Alex Mayrhofer
Дата:
Teodor Sigaev wrote:

>> also, i'd be happy to listen opinions from people who have experience
>> of usage of such things like soundex.

I'm using metaphone() together with levenshtein() to search a place name
gazetteer database and order the results. That works reasonably well and
gives interesting results ("places with similar names"). However, it does
not cover "partial" matches (it does just compare the whole string, and does
not find multi-word names when just a single word is entered, eg. it would
not find "santa cruz" when you just enter "cruz").

Regarding db structure: I've specifically added a column which contains  the
metaphone string (loaded with "UPDATE places set pname_metaphone =
metaphone(pname, 11)") - this row is obviously indexed (and, with functional
indices, actuall redundant ;). i'm then using "SELECT * from places where
pname_metaphone = metaphone('searchstring', 11)" to retrieve similar names.
levenshtein is used to order those rows by string distance.

try it at http://nona.net/features/map/

I haven't attemted yet to combine tsearch2 and metaphone results - that
would probably be the PerfectSolution(tm).

hope that helps

Alex