Re: Merge rows based on Levenshtein distance

Поиск
Список
Период
Сортировка
От David G Johnston
Тема Re: Merge rows based on Levenshtein distance
Дата
Msg-id 1417481381890-5828847.post@n5.nabble.com
обсуждение исходный текст
Ответ на Merge rows based on Levenshtein distance  (mongoose <akarargyris@gmail.com>)
Ответы Re: Merge rows based on Levenshtein distance  (mongoose <akarargyris@gmail.com>)
Список pgsql-general
mongoose wrote
> I am new to PostgreSQL and I have the following table:
>
> Name, City
> "Alex", "Washington"
> "Aleex1", "Washington"
> "Bob", "NYC"
> "Booob", "NYC"
>
> I want to "merge" similar rows based on levenshtein distance between names
> so that I have the following table:
>
> id, Name, City
> 1,"Alex", "Washington"
> 1,"Aleex1", "Washington"
> 2,"Bob", "NYC"
> 2,"Booob", "NYC"
>
> How could I do that on PostgreSQL? Is there an SQL command for this?
> Thnsls

So you have a table of N names and you want to evaluate (N-1)^2 pairs and
then use the output of the levenshtein calculation to group them together.

SELECT
l_names.name_value,
r_names.name_value, leven[...](l_names.name_value, r_names.name_value) AS
pair_group
FROM table_of_names AS l_names
CROSS JOIN table_of_names AS r_names
WHERE l_names.name_value <> r_names.name_value
;

Feel free to add "group by city" or "WHERE substring(l_names.name_value, 0,
1) = substring(r_names.name_value, 0, 1)" since it seems you need more than
just a name-distance to generate the desired groups.  You'd likely want to
add the same "substring" call to the SELECT-list and "GROUP BY" clauses...

David J.




--
View this message in context:
http://postgresql.nabble.com/Merge-rows-based-on-Levenshtein-distance-tp5828841p5828847.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Irreversible SET ROLE
Следующее
От: Bryn Jeffries
Дата:
Сообщение: Re: Irreversible SET ROLE