Re: Fuzzy string matching of product names

Поиск
Список
Период
Сортировка
От Bill Moran
Тема Re: Fuzzy string matching of product names
Дата
Msg-id 20100405161818.a301e038.wmoran@potentialtech.com
обсуждение исходный текст
Ответ на Re: Fuzzy string matching of product names  (Peter Geoghegan <peter.geoghegan86@gmail.com>)
Список pgsql-general
In response to Peter Geoghegan <peter.geoghegan86@gmail.com>:

> > http://www.postgresql.org/docs/8.4/static/fuzzystrmatch.html
> >
> > --
> > Bill Moran
> > http://www.potentialtech.com
> > http://people.collaborativefusion.com/~wmoran/
>
> Fuzzystrmatch is generally used to compare two single words for how
> similar they sound. How can that actually be applied to get the
> functionality that I've described?

Well, it really depends on your particular situation and what you
want to support.  You could break the name down into individual
words and generate metaphones, then use like to match on metaphone:

'The Candlestick Corporation, Limited' -> 'TE CDSK CPRN LMTD'

Searching for "candlestick" -> WHERE metaphone column like '%CDSK%'

Or you could create an array column that has all the metaphones in
it and use an ANY() or ALL() match to find ones that apply.

Exactly how you implement depends on how far you want to go.  Do you
want to support OR matches, AND matches, or both?  Can the words be
out of order?

You could also use Levenshtein as a percentage function to find matches,
even on long strings with multiple words.  Since Levenshtein gives you
the number of alterations between two strings, using that as a percentage
of the total string length gives you a pretty good gauge of how close
they are overall, and would allow you to set a threshold, or possibly even
list results by relevance.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

В списке pgsql-general по дате отправления:

Предыдущее
От: John R Pierce
Дата:
Сообщение: Re: windows 7 compatiblity?
Следующее
От: Leif Biberg Kristensen
Дата:
Сообщение: Re: Fuzzy string matching of product names