Re: Need magic for identifieing double adresses

Поиск
Список
Период
Сортировка
От John DeSoi
Тема Re: Need magic for identifieing double adresses
Дата
Msg-id CFB262B9-6831-49EA-938C-CBB1B3B36A8D@pgedit.com
обсуждение исходный текст
Ответ на Need magic for identifieing double adresses  (Andreas <maps.on@gmx.net>)
Список pgsql-general
On Sep 15, 2010, at 10:40 PM, Andreas wrote:

> I need to clean up a lot of contact data because of a merge of customer lists that used to be kept separate.
> I allready know that there are double entries within the lists and they do overlap, too.
>
> Relevant fields could be  name, street, zip, city, phone
>
> Is there a way to do something like this with postgresql ?
>
> I fear this will need still a lot of manual sorting and searching even when potential peers get automatically
identified.

I recently started working with the pg_trgm contrib module for matching songs based on titles and writers. This is
especiallydifficult because the writer credits end up in one big field with every possible variation on order and
namingconventions. So far I have been pleased with the results. For example, the algorithm correctly matched these two
songtitles: 

FONTAINE DI ROMA AKA FOUNTAINS OF ROME

FOUNTAINS OF ROME A/K/A FONTANE DI ROMA

Trigrams can be indexed, so it is relatively fast to find an initial set of candidates.

There is a nice introductory article here:

http://www.postgresonline.com/journal/categories/59-pgtrgm



John DeSoi, Ph.D.





В списке pgsql-general по дате отправления:

Предыдущее
От: Carlos Mennens
Дата:
Сообщение: Re: Alter Table Command Rearranges Rows
Следующее
От: Thom Brown
Дата:
Сообщение: Referring to function parameter in function