Richard Huxton wrote:
> Christophe wrote:
>> Now, since this database has been production since 7.2 days, cruft has
>> crept in: in particular, there are duplicate email addresses, some
>> with mismatched attributes. The policy decision by the client is that
>> the correct row is the one with the earliest timestamp.
>
> Something like (untested):
>
> CREATE TEMPORARY TABLE earliest_duplicates AS
> SELECT
> email AS tgt_email,
> min(create_date) AS tgt_date
> FROM mytable
> GROUP BY email
> HAVING count(*) > 1;
>
> DELETE FROM mytable USING earliest duplicates
> WHERE email=tgt_email AND create_date > tgt_date;
If it is possible that two rows exist for the same email/date; then you
will likely need to deal with these manually.
If you rerun the above SELECT after running the delete you should
identify these rows.