Обсуждение: Select duplicated values
Lets say that the primary key column is A. I am trying to select all the rows with duplicated values in columns B, C, and D. I am not too experienced in SQL syntax, and I've used the following: select A from table_name where B+C+D in (select B+C+D from table_name group by B+C+D having count(*)>1 ) I'm looking for a better way, since I am just adding the three columns together right now. Jeffrey
I think you should not "add columns", but concatenate them.
Instead
use "B || '/' || C || '/' || D"
You may need to apply some conversion (cast) depending of data type and output format.
Regards,
Em 21-11-2011 20:23, jeffrey escreveu:
Instead
select A from table_name where B+C+D in (select B+C+D from table_name group by B+C+D having count(*)>1 )
use "B || '/' || C || '/' || D"
select A from table_name where B || '/' || C || '/' || D in (select B || '/' || C || '/' || D from table_name group by 1 having count(*)>1 )
You may need to apply some conversion (cast) depending of data type and output format.
Regards,
Edson Carlos Ericksson Richter SimKorp Informática Ltda | |
Fone: | (51) 3366-7964 |
Celular: | (51)9318-9766/(51) 8585-0796 |
Em 21-11-2011 20:23, jeffrey escreveu:
Lets say that the primary key column is A. I am trying to select all the rows with duplicated values in columns B, C, and D. I am not too experienced in SQL syntax, and I've used the following: select A from table_name where B+C+D in (select B+C+D from table_name group by B+C+D having count(*)>1 ) I'm looking for a better way, since I am just adding the three columns together right now. Jeffrey
Вложения
Another option is to perform a self-join on columns B, C, and D (filtering out the 'same' record where a=a) instead of using the sub-select. This may yield better performance depending on the size of the table. Also, I don't believe the concatenation / sub-select will work if all of B, C, and D are all NULL.
SELECT x.a
FROM my_table x
INNER JOIN my_table y
ON ((x.b = y.b) or (x.b IS NULL AND y.b IS NULL))
AND ((x.c = y.c) or (x.c IS NULL AND y.c IS NULL))
AND ((x.d = y.d) or (x.d IS NULL AND y.d IS NULL))
AND x.a <> y.a
Another alternative to handling the NULL values is to COALESCE them to a value that would never exist in columns B, C, or D. I don't know the datatypes you are using, so I'll just use 'junk' for now.
SELECT x.a
FROM my_table x
INNER JOIN my_table y
ON COALESCE(x.b,'junk') = COALESCE(y.b,'junk')
AND COALESCE(x.c,'junk') = COALESCE(y.c,'junk')
AND COALESCE(x.d,'junk') = COALESCE(y.d,'junk')
AND x.a <> y.a
---
Pete Yunker
Vice President of Data Products
Home Junction, Inc.
On Nov 21, 2011, at 5:23 PM, jeffrey wrote:
Lets say that the primary key column is A. I am trying to select all
the rows with duplicated values in columns B, C, and D.
I am not too experienced in SQL syntax, and I've used the following:
select A from table_name where B+C+D in (select B+C+D from table_name
group by B+C+D having count(*)>1 )
I'm looking for a better way, since I am just adding the three columns
together right now.
Jeffrey
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Nov 21, 2011, at 17:23, jeffrey <johjeffrey@hotmail.com> wrote: > Lets say that the primary key column is A. I am trying to select all > the rows with duplicated values in columns B, C, and D. > > I am not too experienced in SQL syntax, and I've used the following: > select A from table_name where B+C+D in (select B+C+D from table_name > group by B+C+D having count(*)>1 ) > > I'm looking for a better way, since I am just adding the three columns > together right now. > > Jeffrey > Do you really want these to evaluate to equal (B, C, D)? (1, 0, 1) = (0, 1, 1); they both sum to 2 but both B and C are unequal There is no need to do anything other than list each field individually in both the SELECT and the GROUP BY. SELECT B, C, D FROM table GROUP BY B, C, D HAVING count(*) > 1; Whatever possessed you to consider that you had to add them to get what you need? (assuming you don't want my examples tobe considered equal) Besides the fact it will give you WRONG RESULTS any decent set of GROUP BY examples will show youthat you can list/use multiple fields in a grouping query. Now, if you truly want the two samples above to evaluate to equal then you do need to do some form of consolidation (likethe adding in your example). That, however, would be very unusual. David J.
On Nov 21, 2011, at 21:11, David Johnston <polobo@yahoo.com> wrote: > On Nov 21, 2011, at 17:23, jeffrey <johjeffrey@hotmail.com> wrote: > >> Lets say that the primary key column is A. I am trying to select all >> the rows with duplicated values in columns B, C, and D. >> >> I am not too experienced in SQL syntax, and I've used the following: >> select A from table_name where B+C+D in (select B+C+D from table_name >> group by B+C+D having count(*)>1 ) >> >> I'm looking for a better way, since I am just adding the three columns >> together right now. >> >> Jeffrey >> > > Do you really want these to evaluate to equal (B, C, D)? > > (1, 0, 1) = (0, 1, 1); they both sum to 2 but both B and C are unequal > > There is no need to do anything other than list each field individually in both the SELECT and the GROUP BY. > > SELECT B, C, D > FROM table > GROUP BY B, C, D > HAVING count(*) > 1; > > Whatever possessed you to consider that you had to add them to get what you need? (assuming you don't want my examplesto be considered equal) Besides the fact it will give you WRONG RESULTS any decent set of GROUP BY examples willshow you that you can list/use multiple fields in a grouping query. > > Now, if you truly want the two samples above to evaluate to equal then you do need to do some form of consolidation (likethe adding in your example). That, however, would be very unusual. > > David J. > > So, in a sub-select you would do: ... WHERE ROW(B, C, D) IN (SELECT B, C, D FROM ... GROUP BY B, C, D) Note that the word ROW is optional ( but not the parentheses ) David J.