Re: Fuzzy string matching of product names

Поиск
Список
Период
Сортировка
От Brian Modra
Тема Re: Fuzzy string matching of product names
Дата
Msg-id k2t5a9699851004051223l8bd966d2yf81ae71b1610a4a4@mail.gmail.com
обсуждение исходный текст
Ответ на Fuzzy string matching of product names  (Peter Geoghegan <peter.geoghegan86@gmail.com>)
Ответы Re: Fuzzy string matching of product names  (George Silva <georger.silva@gmail.com>)
Список pgsql-general
On 05/04/2010, Peter Geoghegan <peter.geoghegan86@gmail.com> wrote:
> Hello,
>
> At the moment, users of my application, which runs on 8.4.3, may
> search for products in a way that is implemented roughly like this:
>
> SELECT * FROM products WHERE description ILIKE '%%usr_string%%';
>
> This works reasonably well. However, I thought it would be a nice
> touch to give my users leeway to spell product names incorrectly when
> searching, or to not have to remember if a product is entered as "coca
> cola", "CocaCola" or "Coca-cola". At the moment, they don't have to
> worry about case sensitivity because I use ILIKE - I'd like to
> preserve that. I'd also like to not have it weigh against them heavily
> when they don't search for a specific product, but just a common
> substring. For example, if they search for "coca-cola", there may be a
> number of different coca-cola products: "CocaCola 330ml can",
> "Coca-Cola 2 litre bottle", but no actual plain "cocacola". That ought
> to not matter too much - all cocacola products should be returned.
>
> This isn't important enough for me to be willing to add a big
> dependency to my application. I'd really prefer to limit myself to the
> contrib modules. pg_trgm and fuzzystrmatch look very promising, but
> it's not obvious how I can use either to achieve what I want.
> Postgres's built-in regex support may have a role to play too.
>
> I can live with it not being indexable, because typically there are
> only tens of thousands of products in a production system.
>
> Could someone suggest an approach that is reasonably simple and
> reasonably generic ?

What I do is to create another column that has a simplified version of
the string in it.
(I created a function to simplify strings, and when the source column
is changed or inserted, I also update the "simplified" column.
Then when searching, I use the same function to "simplify" the search
string and use "=" to test against the "simplified" column.

E.g.
if the table has a column called "name" that you want to search, you
create a name_simplified column, and fill it as so:
update your_table set name_simplified=yourSimplifyFunction(name);

Then to search:
select * from your_table where simplified_name =
yourSimplifyFunction('Coca-Cola');

This is really fast, because the match is using the index rather than
a sequential scan.

>
> Thanks,
> Peter Geoghegan
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


--
Brian Modra   Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

В списке pgsql-general по дате отправления:

Предыдущее
От: Bill Moran
Дата:
Сообщение: Re: Fuzzy string matching of product names
Следующее
От: Garry Saddington
Дата:
Сообщение: Re: windows 7 compatiblity?