identifying duplicates in table with redundancies

Поиск

Список

Период

Сортировка

От	Tarlika Elisabeth Schmitz
Тема	identifying duplicates in table with redundancies
Дата	23 сентября 2010 г. 19:35:40
Msg-id	20100923223940.5019475e@dick.coachhouse обсуждение исходный текст
Список	pgsql-sql

Дерево обсуждения

I loaded data from a spread into a interim table so I can analyze the
quality of the data.

The table contains an entry for every student (250K records) and his
trainer. Eventually, I want to extract a unique list of trainers from
it. But first of all I want to check for duplicates:

1) multiples trainer names for same trainer id
2) multiple trainer ids for same trainer name

I cobbled together the SQL and it does the job but it seems rather
convoluted. I would like to know how I can improve it. 

CREATE TABLE student (
id INTEGER NOT NULL,
name VARCHAR(256) NOT NULL,
trainer_id INTEGER,
trainer_name VARCHAR(256),
);

====
EXAMPLE DATA

22 John 1 Macdonald
23 Jane 1 MacDonald
24 Paul 1 MacDonald
25 Dick 2 Smith
26 Bill 3 Smith
27 Kate 3 Smith
====

-- outputs trainer ids which appear under different names
select trainer_id, trainer_name from 
(   -- different id/name combinations   select distinct on (trainer_name) trainer_id, trainer_name   from student
wheretrainer_id in   (       -- trainer ids with appearing with different names       select distinct on (id) id
from       (           -- distinct trainer id-name           select distinct on (trainer_id,trainer_name)
trainer_idas id,            trainer_name as name from student       ) as trainer       group by trainer.id       having
count(trainer.name) > 1    ) 
 
) as y
order by trainer_id

-- 

Best Regards,
Tarlika Elisabeth Schmitz

В списке pgsql-sql по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

identifying duplicates in table with redundancies