De-duplicating rows

Поиск
Список
Период
Сортировка
От Christophe
Тема De-duplicating rows
Дата
Msg-id 87E8E647-4CAD-4491-B687-FA763E0EBE7E@thebuild.com
обсуждение исходный текст
Ответы Re: De-duplicating rows  (Richard Huxton <dev@archonet.com>)
Re: De-duplicating rows  (Scott Marlowe <scott.marlowe@gmail.com>)
Список pgsql-sql
The Subject: is somewhat imprecise, but here's what I'm trying to do.   
For some reason, my brain is locking up over it.

I'm moving a 7.2 (yes) database to 8.4.  In the table in question, the  
structure is along the lines of:
serial_number    SERIAL, PRIMARY KEYemail        TEXTcreate_date    TIMESTAMPattr1        typeattr2        typeattr3
   type...
 

(The point of the "attr" fields is that there are many more columns  
for each row.)

The new structure removes the "serial_number" field, and uses "email"  
as the primary key, but is otherwise unchanged:
email        TEXT, PRIMARY KEYcreate_date    TIMESTAMPattr1        typeattr2        typeattr3        type...

Now, since this database has been production since 7.2 days, cruft has  
crept in: in particular, there are duplicate email addresses, some  
with mismatched attributes.  The policy decision by the client is that  
the correct row is the one with the earliest timestamp.  (The  
timestamps are widely distributed; it's not the case that there is a  
single timestamp above which all the duplicates live.)  Thus, ideally,  
I want to select exactly one row per "email", picking the row with the  
earliest timestamp in the case that there is more than one row with  
that email.

Any suggestions on how to write such a SELECT?  Of course, I could do  
this with an application against the db, but a single SELECT would be  
great if possible.

TIA!


В списке pgsql-sql по дате отправления:

Предыдущее
От: Tim Landscheidt
Дата:
Сообщение: Comparing a string against an XPath result set
Следующее
От: Richard Huxton
Дата:
Сообщение: Re: De-duplicating rows