Обсуждение: General guidance: Levenshtein distance versus other similarity algorithms

Поиск

Список

Период

Сортировка

General guidance: Levenshtein distance versus other similarity algorithms

От

Rachel Owsley

Дата:

23 июля 2012 г., 17:17:12

Hi,

I am hoping you can give me some guidance here. I’m using postgresql 9.1.

Basically, I’m trying to create a query on a table of businesses that will return all similar matches to a business name. This is a huge table, and there is a lot of variation in names. The length of the string can be up to 255. I’ve used regex, but there are always some variations of the name that are missed when I do a regex. So I decided to look at distance measures.

Has anyone compared the fuzzstrmatch package to pgsimilarity?

Would the levenshtein function in postgresql be the best way to go here? If so, should I use levenshtein in the contribution package or install the pgsimilarity package? Has anyone tried both implementations?

This would be my query:

Select * from table

WHERE levenshtein (column_name,’Name of the business’) <= 3

ORDER BY levenshtein (column_name, ‘Name of the business’)

Limit 10;

Thank you so much for your help.

Rachel

Re: General guidance: Levenshtein distance versus other similarity algorithms

От

Merlin Moncure

Дата:

25 июля 2012 г., 15:31:50

On Mon, Jul 23, 2012 at 11:55 AM, Rachel Owsley
<Rachel.Owsley@edointeractive.com> wrote:
> Hi,
>
> I am hoping you can give me some guidance here. I’m using postgresql 9.1.
>
> Basically, I’m trying to create a query on a table of businesses that will
> return all similar matches to a business name. This is a huge table, and
> there is a lot of variation in names. The length of the string can be up to
> 255. I’ve used regex, but there are always some variations of the name that
> are missed when I do a regex. So I decided to look at distance measures.
>
> Has anyone compared the fuzzstrmatch package to pgsimilarity?
>
> Would the levenshtein function in postgresql be the best way to go here? If
> so, should I use levenshtein in the contribution package or install the
> pgsimilarity package? Has anyone tried both implementations?

Another option that works with 9.1 is the pg_trgm module
(http://www.postgresql.org/docs/9.1/static/pgtrgm.html).   It works
very well for 9.1 and has the advantage of having built-in gist and
gin operator support.

Can't speak on pg_similarity, haven't used it.

merlin

Re: General guidance: Levenshtein distance versus other similarity algorithms

От

Rachel Owsley

Дата:

25 июля 2012 г., 17:15:51

Thanks, Merlin. I will give that one a try.

-----Original Message-----
From: Merlin Moncure [mailto:mmoncure@gmail.com]
Sent: Wednesday, July 25, 2012 1:32 PM
To: Rachel Owsley
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] General guidance: Levenshtein distance versus other similarity algorithms

On Mon, Jul 23, 2012 at 11:55 AM, Rachel Owsley <Rachel.Owsley@edointeractive.com> wrote:
> Hi,
>
> I am hoping you can give me some guidance here. I'm using postgresql 9.1.
>
> Basically, I'm trying to create a query on a table of businesses that
> will return all similar matches to a business name. This is a huge
> table, and there is a lot of variation in names. The length of the
> string can be up to 255. I've used regex, but there are always some
> variations of the name that are missed when I do a regex. So I decided to look at distance measures.
>
> Has anyone compared the fuzzstrmatch package to pgsimilarity?
>
> Would the levenshtein function in postgresql be the best way to go
> here? If so, should I use levenshtein in the contribution package or
> install the pgsimilarity package? Has anyone tried both implementations?

Another option that works with 9.1 is the pg_trgm module
(http://www.postgresql.org/docs/9.1/static/pgtrgm.html).   It works
very well for 9.1 and has the advantage of having built-in gist and gin operator support.

Can't speak on pg_similarity, haven't used it.

merlin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: General guidance: Levenshtein distance versus other similarity algorithms

General guidance: Levenshtein distance versus other similarity algorithms

Re: General guidance: Levenshtein distance versus other similarity algorithms

Re: General guidance: Levenshtein distance versus other similarity algorithms