Re: Postgresql GROUP BY "SIMILAR" but not equal values

Поиск
Список
Период
Сортировка
От alexandros_e
Тема Re: Postgresql GROUP BY "SIMILAR" but not equal values
Дата
Msg-id 1391704024728-5790876.post@n5.nabble.com
обсуждение исходный текст
Ответ на Re: Postgresql GROUP BY "SIMILAR" but not equal values  (Alban Hertroys <haramrae@gmail.com>)
Список pgsql-general
Conceptually, Tom (as always) is right. But Alban's hack help.

DROP TABLE foo;
CREATE TABLE IF NOT EXISTS foo(ID INTEGER,G1 INTEGER, T1 TEXT, ID2 SERIAL
PRIMARY KEY);

INSERT INTO foo(ID,G1,T1) VALUES(1,2,'ABC');
INSERT INTO foo(ID,G1,T1) VALUES(1,2,'ABCD');
INSERT INTO foo(ID,G1,T1) VALUES(1,2,'ABDC');
INSERT INTO foo(ID,G1,T1) VALUES(1,2,'DEF');
INSERT INTO foo(ID,G1,T1) VALUES(1,2,'DEFH');

/* A little editing to remove duplicates a to b and b to a */
SELECT foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1),foo2.T1
FROM foo foo1
INNER JOIN foo foo2 ON (foo2.ID = foo1.ID AND foo2.G1 = foo1.G1)
WHERE foo1.ID2<foo2.ID2
GROUP BY foo1.ID, foo1.G1, foo1.T1, levenshtein(foo1.T1, foo2.T1),foo2.T1
HAVING levenshtein(foo1.T1, foo2.T1) <2;

RETURNS

ID|G1|foo1.T1|foo2.T1
1;2;"ABC";1;"ABCD"
1;2;"ABC";1;"ABDC"
1;2;"ABCD";2;"ABDC"
1;2;"DEF";1;"DEFH"

Then it requires a second grouping but as Tom suggested it would be hard to
somehow group all similar cases together because then it becomes a
clustering problem. With a second grouping we will have 3 records instead of
4, so it is better than the initial case by 25%.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Postgresql-GROUP-BY-SIMILAR-but-not-equal-values-tp5790860p5790876.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


В списке pgsql-general по дате отправления:

Предыдущее
От: Bret Stern
Дата:
Сообщение: Re: Help with connection issue - started today
Следующее
От: Sergey Konoplev
Дата:
Сообщение: Re: Ordering Results by a Supplied Order