Re: identifying duplicates in table with redundancies

Поиск
Список
Период
Сортировка
От Tarlika Elisabeth Schmitz
Тема Re: identifying duplicates in table with redundancies
Дата
Msg-id 20100928213600.4db60ba4@dick.coachhouse
обсуждение исходный текст
Ответ на Re: identifying duplicates in table with redundancies  ("Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt>)
Ответы Re: identifying duplicates in table with redundancies
Список pgsql-sql
On Tue, 28 Sep 2010 11:34:31 +0100
"Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt> wrote:

>----- Original Message ----- 
>From: "Tarlika Elisabeth Schmitz" <postgresql@numerixtechnology.de>
>To: <pgsql-sql@postgresql.org>
>Sent: Monday, September 27, 2010 5:54 PM
>Subject: Re: [SQL] identifying duplicates in table with redundancies
>
>
>> On Fri, 24 Sep 2010 18:12:18 +0100
>> Oliver d'Azevedo Christina <oliveiros.cristina@gmail.com> wrote:
>>
>>>>> SELECT DISTINCT trainer_id,trainer_name
>>>>> FROM (
>>>>> SELECT trainer_name   -- The field you want to test for duplicates
>>>>> FROM (
>>>>> SELECT DISTINCT "trainer_id","trainer_name"
>>>>> FROM student
>>>>> ) x
>>>>> GROUP BY "trainer_name"    -- the field you want to test for
>>>>> duplicates
>>>>> HAVING (COUNT(*) > 1)
>>>>> ) z
>>>>> NATURAL JOIN student y
>>
>>
>>
>> What indices would you recommend for this operation?
>
>But, on this query in particular I would recomend an indice on
>trainer_name, as this field will be used on the join and on the group
>by. For the other query, the one you get by substituting trainer_name
>by trainer_id, place an index on trainer_id.
>Also, these indexes may help speed up the order by clause, if you use
>one.
>
>If you have a table with lots of data you can try them around and see
>how performance varies (and don't forget there's also EXPLAIN ANALYZE)


Strangely, these indices did not do anything.
Without, the query took about 8500ms. Same with index.

The table has 250000 records. 11000 have trainer_name = null. Only
13000 unique trainer_names.

It is not hugely important as these queries are not time-critical.
This is only a helper table, which I use to analyze the date prior to
populating the destination tables with the data.

Regards,
Tarlika


В списке pgsql-sql по дате отправления:

Предыдущее
От: msi77
Дата:
Сообщение: Re: is there a distinct function for comma lists ?
Следующее
От: "Thomas Andres"
Дата:
Сообщение: Shema with template