Re: Grouping By Similarity (using pg_trgm)?

Поиск
Список
Период
Сортировка
От Cory Tucker
Тема Re: Grouping By Similarity (using pg_trgm)?
Дата
Msg-id CAG_=8kDLfVBXjFB_4h5i8Hx-WqVMzL0yG9_8sSOQCVuVA6zuLA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Grouping By Similarity (using pg_trgm)?  ("David G. Johnston" <david.g.johnston@gmail.com>)
Ответы Re: Grouping By Similarity (using pg_trgm)?
Список pgsql-general
That produces pretty much the same results as the CROSS JOIN I was using before.  Because each "my_value" in the table are different, if I group on just their value then I will always have the full result set and a bunch of essentially duplicated results.

Any other ideas/options?

On Thu, May 14, 2015 at 12:08 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

On Thu, May 14, 2015 at 11:58 AM, Cory Tucker <cory.tucker@gmail.com> wrote:
[pg version 9.3 or 9.4]

Suppose I have a simple table:

create table data (
  my_value  TEXT NOT NULL
);
CREATE INDEX idx_my_value ON data USING gin(my_value gin_trgm_ops);


Now I would like to essentially do group by to get a count of all the values that are sufficiently similar.  I can do it using something like a CROSS JOIN to join the table on itself, but then I still am getting all the rows with duplicate counts.  

Is there a way to do a group by query and only return a single "my_value" column and a count of the number of times other values are similar while also not returning the included similar values in the output, too?


​Concept below - not bothering to lookup the functions/operators for pg_trgm:

SELECT my_value_src, count(*)
FROM (SELECT my_value AS my_value_src FROM data) src
JOIN (SELECT my_value AS my_value_compareto FROM data) comparedto
ON ( func(my_value_src, my_value_compareto) < # )
GROUP BY my_value_src

​David J.

В списке pgsql-general по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: Grouping By Similarity (using pg_trgm)?
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Grouping By Similarity (using pg_trgm)?