identifying duplicates in table with redundancies
От | Tarlika Elisabeth Schmitz |
---|---|
Тема | identifying duplicates in table with redundancies |
Дата | |
Msg-id | 20100923223940.5019475e@dick.coachhouse обсуждение исходный текст |
Список | pgsql-sql |
I loaded data from a spread into a interim table so I can analyze the quality of the data. The table contains an entry for every student (250K records) and his trainer. Eventually, I want to extract a unique list of trainers from it. But first of all I want to check for duplicates: 1) multiples trainer names for same trainer id 2) multiple trainer ids for same trainer name I cobbled together the SQL and it does the job but it seems rather convoluted. I would like to know how I can improve it. CREATE TABLE student ( id INTEGER NOT NULL, name VARCHAR(256) NOT NULL, trainer_id INTEGER, trainer_name VARCHAR(256), ); ==== EXAMPLE DATA 22 John 1 Macdonald 23 Jane 1 MacDonald 24 Paul 1 MacDonald 25 Dick 2 Smith 26 Bill 3 Smith 27 Kate 3 Smith ==== -- outputs trainer ids which appear under different names select trainer_id, trainer_name from ( -- different id/name combinations select distinct on (trainer_name) trainer_id, trainer_name from student wheretrainer_id in ( -- trainer ids with appearing with different names select distinct on (id) id from ( -- distinct trainer id-name select distinct on (trainer_id,trainer_name) trainer_idas id, trainer_name as name from student ) as trainer group by trainer.id having count(trainer.name) > 1 ) ) as y order by trainer_id -- Best Regards, Tarlika Elisabeth Schmitz
В списке pgsql-sql по дате отправления: