Re: De-duplicating rows

Поиск
Список
Период
Сортировка
От Frank Bax
Тема Re: De-duplicating rows
Дата
Msg-id 4A609310.4050003@sympatico.ca
обсуждение исходный текст
Ответ на Re: De-duplicating rows  (Richard Huxton <dev@archonet.com>)
Список pgsql-sql
Richard Huxton wrote:
> Christophe wrote:
>> Now, since this database has been production since 7.2 days, cruft has 
>> crept in: in particular, there are duplicate email addresses, some 
>> with mismatched attributes.  The policy decision by the client is that 
>> the correct row is the one with the earliest timestamp.
> 
> Something like (untested):
> 
> CREATE TEMPORARY TABLE earliest_duplicates AS
> SELECT
>   email AS tgt_email,
>   min(create_date) AS tgt_date
> FROM mytable
> GROUP BY email
> HAVING count(*) > 1;
> 
> DELETE FROM mytable USING earliest duplicates
> WHERE email=tgt_email AND create_date > tgt_date;



If it is possible that two rows exist for the same email/date; then you 
will likely need to deal with these manually.

If you rerun the above SELECT after running the delete you should 
identify these rows.


В списке pgsql-sql по дате отправления:

Предыдущее
От: Richard Huxton
Дата:
Сообщение: Re: De-duplicating rows
Следующее
От: Scott Marlowe
Дата:
Сообщение: Re: De-duplicating rows