Re: Removing duplicates
От | Jeff Self |
---|---|
Тема | Re: Removing duplicates |
Дата | |
Msg-id | 1014740376.12600.20.camel@personnel_test обсуждение исходный текст |
Ответ на | Removing duplicates (Matthew Hagerty <matthew@brwholesale.com>) |
Список | pgsql-sql |
On Tue, 2002-02-26 at 10:10, Matthew Hagerty wrote: > Greetings, > > I have a customer database (name, address1, address2, city, state, zip) and > I need a query (or two) that will give me a mailing list with the least > amount of duplicates possible. I know that precise matching is not > possible, i.e. "P.O. Box 123" will never match "PO Box 123" without some > data massaging, but if I can isolate even 50% of any duplicates, that would > help greatly. It depends on which fields you are wanting to check for. If you are referring to just addresses then, you can do a query using 'Distinct' on address1. This will give you only one record where address1 is P.O. Box 123. Of course this doesn't help with those that are PO Box 123. You could always dump your data and write some Perl scripts to alter the addresses so that they become alike. You could also use the 'Like' keyword in your sql statement. "Select * from customer where address1 like '%Box 123%'; After you run this query, you might want to modify the addresses so that they are the same. > Also, any suggestions on which parameters to check the duplicates for? My > first thoughts were to make sure there were no two addresses the same in > the same zip code. Any insight (or examples) would be greatly appreciated. These constraints should be in the table definitions themselves, which would prevent these duplications from taking place. -- Jeff Self Information Technology Analyst Department of Personnel City of Newport News 2400 Washington Ave. Newport News, VA 23607 757-926-6930
В списке pgsql-sql по дате отправления: