Re: Postgresql GROUP BY "SIMILAR" but not equal values

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: Postgresql GROUP BY "SIMILAR" but not equal values
Дата
Msg-id CAF-3MvN9WLFhZA-XDdZNxTKCuJREYLLe1uMQ+6keEdL9Z5gMjw@mail.gmail.com
обсуждение исходный текст
Ответ на Postgresql GROUP BY "SIMILAR" but not equal values  (alexandros_e <alexandros.ef@gmail.com>)
Ответы Re: Postgresql GROUP BY "SIMILAR" but not equal values  (Sergey Konoplev <gray.ru@gmail.com>)
Re: Postgresql GROUP BY "SIMILAR" but not equal values  (alexandros_e <alexandros.ef@gmail.com>)
Список pgsql-general
On 6 February 2014 16:18, alexandros_e <alexandros.ef@gmail.com> wrote:
> Let's say I have this table foo
>
> ID|G1|T1|
> 1|2|ABC|
> 1|2|ABCD|
> 1|2|DEF|
> 1|2|DEFG|
>
> SELECT * FROM foo
> GROUP BY ID,G1,T1

> Is there a way in SQL or PostgreSQL in general to group by values than are
> not exactly the same but are quite similar (like 'ABC' and 'ABCD') based on
> some distance function (levenshtein for example) if the distance is within
> some threshold (i.e., 1)

Perhaps there is: You can calculate the levenshtein distance between
those values using a self-join and then GROUP BY the result of that
expression and limit the results with HAVING.

For example:
SELECT foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1)
FROM foo foo1
INNER JOIN foo foo2 ON (foo2.ID = foo1.ID AND foo2.G1 = foo1.G1)
GROUP BY foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1)
HAVING levenshtein(foo1.T1, foo2.T1) > 1

Is that what you're looking for?

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Help with connection issue - started today
Следующее
От: Sergey Konoplev
Дата:
Сообщение: Re: Postgresql GROUP BY "SIMILAR" but not equal values