Обсуждение: Select duplicated values

Поиск
Список
Период
Сортировка

Select duplicated values

От
jeffrey
Дата:
Lets say that the primary key column is A.  I am trying to select all
the rows with duplicated values in columns B, C, and D.

I am not too experienced in SQL syntax, and I've used the following:
select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1  )

I'm looking for a better way, since I am just adding the three columns
together right now.

Jeffrey

Re: Select duplicated values

От
Edson Richter
Дата:
I think you should not "add columns", but concatenate them.

Instead

select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1  )

use "B || '/' || C || '/' || D"

select A from table_name where B || '/' || C || '/' || D in (select B || '/' || C || '/' || D from table_name
group by 1 having count(*)>1  )

You may need to apply some conversion (cast) depending of data type and output format.

Regards,

Edson Carlos Ericksson Richter
SimKorp Informática Ltda
Fone:(51) 3366-7964
Celular:(51)9318-9766/(51) 8585-0796
Embedded Image


Em 21-11-2011 20:23, jeffrey escreveu:
Lets say that the primary key column is A.  I am trying to select all
the rows with duplicated values in columns B, C, and D.

I am not too experienced in SQL syntax, and I've used the following:
select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1  )

I'm looking for a better way, since I am just adding the three columns
together right now.

Jeffrey

Вложения

Re: Select duplicated values

От
Pete Yunker
Дата:
Another option is to perform a self-join on columns B, C, and D (filtering out the 'same' record where a=a) instead of using the sub-select.  This may yield better performance depending on the size of the table.  Also, I don't believe the concatenation / sub-select will work if all of B, C, and D are all NULL.

SELECT x.a
FROM my_table x
INNER JOIN my_table y
ON ((x.b = y.b) or (x.b IS NULL AND y.b IS NULL))
AND ((x.c = y.c) or (x.c IS NULL AND y.c IS NULL))
AND ((x.d = y.d) or (x.d IS NULL AND y.d IS NULL))
AND x.a <> y.a

Another alternative to handling the NULL values is to COALESCE them to a value that would never exist in columns B, C, or D.  I don't know the datatypes you are using, so I'll just use 'junk' for now.

SELECT x.a
FROM my_table x
INNER JOIN my_table y
ON COALESCE(x.b,'junk') = COALESCE(y.b,'junk')
AND COALESCE(x.c,'junk') = COALESCE(y.c,'junk')
AND COALESCE(x.d,'junk') = COALESCE(y.d,'junk')
AND x.a <> y.a


---
Pete Yunker
Vice President of Data Products
Home Junction, Inc.

On Nov 21, 2011, at 5:23 PM, jeffrey wrote:

Lets say that the primary key column is A.  I am trying to select all
the rows with duplicated values in columns B, C, and D.

I am not too experienced in SQL syntax, and I've used the following:
select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1  )

I'm looking for a better way, since I am just adding the three columns
together right now.

Jeffrey

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Select duplicated values

От
David Johnston
Дата:
On Nov 21, 2011, at 17:23, jeffrey <johjeffrey@hotmail.com> wrote:

> Lets say that the primary key column is A.  I am trying to select all
> the rows with duplicated values in columns B, C, and D.
>
> I am not too experienced in SQL syntax, and I've used the following:
> select A from table_name where B+C+D in (select B+C+D from table_name
> group by B+C+D having count(*)>1  )
>
> I'm looking for a better way, since I am just adding the three columns
> together right now.
>
> Jeffrey
>

Do you really want these to evaluate to equal (B, C, D)?

(1, 0, 1) = (0, 1, 1); they both sum to 2 but both B and C are unequal

There is no need to do anything other than list each field individually in both the SELECT and the GROUP BY.

SELECT B, C, D
FROM table
GROUP BY B, C, D
HAVING count(*) > 1;

Whatever possessed you to consider that you had to add them to get what you need? (assuming you don't want my examples
tobe considered equal) Besides the fact it will give you WRONG RESULTS any decent set of GROUP BY examples will show
youthat you can list/use multiple fields in a grouping query. 

Now, if you truly want the two samples above to evaluate to equal then you do need to do some form of consolidation
(likethe adding in your example).  That, however, would be very unusual. 

David J.



Re: Select duplicated values

От
David Johnston
Дата:
On Nov 21, 2011, at 21:11, David Johnston <polobo@yahoo.com> wrote:

> On Nov 21, 2011, at 17:23, jeffrey <johjeffrey@hotmail.com> wrote:
>
>> Lets say that the primary key column is A.  I am trying to select all
>> the rows with duplicated values in columns B, C, and D.
>>
>> I am not too experienced in SQL syntax, and I've used the following:
>> select A from table_name where B+C+D in (select B+C+D from table_name
>> group by B+C+D having count(*)>1  )
>>
>> I'm looking for a better way, since I am just adding the three columns
>> together right now.
>>
>> Jeffrey
>>
>
> Do you really want these to evaluate to equal (B, C, D)?
>
> (1, 0, 1) = (0, 1, 1); they both sum to 2 but both B and C are unequal
>
> There is no need to do anything other than list each field individually in both the SELECT and the GROUP BY.
>
> SELECT B, C, D
> FROM table
> GROUP BY B, C, D
> HAVING count(*) > 1;
>
> Whatever possessed you to consider that you had to add them to get what you need? (assuming you don't want my
examplesto be considered equal) Besides the fact it will give you WRONG RESULTS any decent set of GROUP BY examples
willshow you that you can list/use multiple fields in a grouping query. 
>
> Now, if you truly want the two samples above to evaluate to equal then you do need to do some form of consolidation
(likethe adding in your example).  That, however, would be very unusual. 
>
> David J.
>
>

So, in a sub-select you would do:

... WHERE ROW(B, C, D) IN (SELECT B, C, D FROM ... GROUP BY B, C, D)

Note that the word ROW is optional ( but not the parentheses )

David J.