Обсуждение: Adding PRIMARY KEY: Table contains duplicated values
Hello,
when trying to add a forgotten primary key pair
to a PostgreSQL 8.4.13 table I get the error:
# \d pref_rep
Table "public.pref_rep"
Column | Type | Modifiers
-----------+-----------------------------+-----------------------------------------------------------
id | character varying(32) |
author | character varying(32) |
good | boolean |
fair | boolean |
nice | boolean |
about | character varying(256) |
stamp | timestamp without time zone | default now()
author_ip | inet |
rep_id | integer | not null default
nextval('pref_rep_rep_id_seq'::regclass)
Check constraints:
"pref_rep_check" CHECK (id::text <> author::text)
Foreign-key constraints:
"pref_rep_author_fkey" FOREIGN KEY (author) REFERENCES
pref_users(id) ON DELETE CASCADE
"pref_rep_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
DELETE CASCADE
# alter table pref_rep add primary key(id, author);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pref_rep_pkey" for table "pref_rep"
ERROR: could not create unique index "pref_rep_pkey"
DETAIL: Table contains duplicated values.
How could I find those duplicated pairs of id and author?
I've tried following, but this of course doesn't give me "pairs":
# select id, count(id) from pref_rep group by id order by count desc limit 5;
id | count
----------------+-------
OK408547485023 | 706
OK261593357402 | 582
DE11198 | 561
DE13041 | 560
OK347613386893 | 556
(5 rows)
Thank you
Alex
P.S. I've also asked my question also at SO,
hope it is okay to "crosspost" that way :-)
http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values
On 02/04/2013 06:17 AM, Alexander Farber wrote:
> Hello,
>
> when trying to add a forgotten primary key pair
> to a PostgreSQL 8.4.13 table I get the error:
>
> # \d pref_rep
> Table "public.pref_rep"
> Column | Type | Modifiers
> -----------+-----------------------------+-----------------------------------------------------------
> id | character varying(32) |
> author | character varying(32) |
> good | boolean |
> fair | boolean |
> nice | boolean |
> about | character varying(256) |
> stamp | timestamp without time zone | default now()
> author_ip | inet |
> rep_id | integer | not null default
> nextval('pref_rep_rep_id_seq'::regclass)
> Check constraints:
> "pref_rep_check" CHECK (id::text <> author::text)
> Foreign-key constraints:
> "pref_rep_author_fkey" FOREIGN KEY (author) REFERENCES
> pref_users(id) ON DELETE CASCADE
> "pref_rep_id_fkey" FOREIGN KEY (id) REFERENCES pref_users(id) ON
> DELETE CASCADE
>
> # alter table pref_rep add primary key(id, author);
> NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index
> "pref_rep_pkey" for table "pref_rep"
> ERROR: could not create unique index "pref_rep_pkey"
> DETAIL: Table contains duplicated values.
>
> How could I find those duplicated pairs of id and author?
>
> I've tried following, but this of course doesn't give me "pairs":
>
> # select id, count(id) from pref_rep group by id order by count desc limit 5;
> id | count
> ----------------+-------
> OK408547485023 | 706
> OK261593357402 | 582
> DE11198 | 561
> DE13041 | 560
> OK347613386893 | 556
> (5 rows)
SELECT * FROM (SELECT count(*) AS ct, id, author FROM pref_rep GROUP BY
id, author) AS dup WHERE dup.ct >1;
>
> Thank you
> Alex
--
Adrian Klaver
adrian.klaver@gmail.com
Thank you - On Mon, Feb 4, 2013 at 3:26 PM, Andrew Jaimes <andrewjaimes@hotmail.com> wrote: > SELECT id, author, count(1) > FROM pref_rep > GROUP BY id, author > HAVING count(1) >1 >> From: alexander.farber@gmail.com >> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values this has worked and has delivered me 190 records (I still wonder how they could have happened, because I only used a stored procedure with UPDATE - if NOT FOUND - INSERT Is it maybe pgbouncer's fault?): id | author | count ------------------------+------------------------+------- DE10598 | OK495480409724 | 2 DE12188 | MR17925810634439466500 | 3 DE13529 | OK471161192902 | 2 DE13963 | OK434087948702 | 2 DE14037 | DE7692 | 2 ...... VK45132921 | DE3544 | 2 VK6152782 | OK261593357402 | 2 VK72883921 | OK506067284178 | 2 (190 rows) And then I'm trying to construct a query which would delete the older (the "stamp" column) of such pairs - but this also doesn't work: # SELECT id, author, count(1), stamp FROM pref_rep GROUP BY id, author, stamp HAVING count(1) >1; id | author | count | stamp ----------------+-----------------------+-------+---------------------------- OK14832267156 | OK419052078016 | 2 | 2012-04-11 12:54:02.980239 OK333460361587 | VK151946174 | 2 | 2012-07-04 07:08:22.172663 OK351109431016 | OK165881471481 | 2 | 2011-09-18 18:29:33.51362 OK367507493096 | OK342027384470 | 5 | 2012-02-10 20:58:11.488184 OK430882956135 | OK331014635822 | 2 | 2012-11-21 18:38:23.141298 OK446355841129 | OK353460633855 | 2 | 2012-06-15 21:31:56.791688 OK450700410618 | OK511055704249 | 2 | 2012-03-16 15:19:50.27776 OK458979640673 | OK165881471481 | 2 | 2011-08-18 22:31:17.540112 OK468333888972 | MR5100358507294433874 | 2 | 2012-12-05 14:16:15.870061 OK485109177380 | DE12383 | 2 | 2011-09-16 16:00:38.625038 OK505164304516 | OK165881471481 | 2 | 2012-03-24 13:54:27.968482 (11 rows) Any suggestions please? Should I use a temp table here? Thank you Alex
Trying to delete the older of the duplicated pairs:
# SELECT id, author, count(1), max(stamp) as maxx
FROM pref_rep
GROUP BY id, author
HAVING count(1) >1 and stamp < maxx;
ERROR: column "maxx" does not exist
LINE 4: HAVING count(1) >1 and stamp < maxx;
^
On Mon, Feb 4, 2013 at 3:36 PM, Alexander Farber
<alexander.farber@gmail.com> wrote:
>>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values
>
On 02/04/2013 06:45 AM, Alexander Farber wrote: > Trying to delete the older of the duplicated pairs: > > # SELECT id, author, count(1), max(stamp) as maxx > FROM pref_rep > GROUP BY id, author > HAVING count(1) >1 and stamp < maxx; > ERROR: column "maxx" does not exist > LINE 4: HAVING count(1) >1 and stamp < maxx; How about: SELECT id, author, count(1), max(stamp) as maxx FROM pref_rep GROUP BY id, author HAVING count(1) >1 and stamp < max(stamp); > ^ > > On Mon, Feb 4, 2013 at 3:36 PM, Alexander Farber > <alexander.farber@gmail.com> wrote: >>>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values >> > > -- Adrian Klaver adrian.klaver@gmail.com
Unfortunately that fails - On Mon, Feb 4, 2013 at 3:55 PM, Adrian Klaver <adrian.klaver@gmail.com> wrote: > On 02/04/2013 06:45 AM, Alexander Farber wrote: >> >> Trying to delete the older of the duplicated pairs: >> > > How about: > > SELECT id, author, count(1), max(stamp) as maxx > FROM pref_rep > GROUP BY id, author > HAVING count(1) >1 and stamp < max(stamp); >> On Mon, Feb 4, 2013 at 3:36 PM, Alexander Farber >> <alexander.farber@gmail.com> wrote: >>>>> http://stackoverflow.com/questions/14688523/adding-primary-key-table-contains-duplicated-values # SELECT id, author, count(1), max(stamp) as maxx pref-> FROM pref_rep pref-> GROUP BY id, author pref-> HAVING count(1) >1 and stamp < max(stamp); ERROR: column "pref_rep.stamp" must appear in the GROUP BY clause or be used in an aggregate function LINE 4: HAVING count(1) >1 and stamp < max(stamp);
On 02/04/2013 06:45 AM, Alexander Farber wrote: > Trying to delete the older of the duplicated pairs: > > # SELECT id, author, count(1), max(stamp) as maxx > FROM pref_rep > GROUP BY id, author > HAVING count(1) >1 and stamp < maxx; > ERROR: column "maxx" does not exist > LINE 4: HAVING count(1) >1 and stamp < maxx; > ^ Caffeine has not reached critical mass yet, so test before using: SELECT * FROM pref_rep JOIN (SELECT id, author, count(1) AS ct, max(stamp) AS maxx FROM pref_rep GROUP BY id,author) max_time ON max_time.id=pref_rep.id WHERE ct > 1 AND stamp < maxx; > -- Adrian Klaver adrian.klaver@gmail.com
Alexander Farber <alexander.farber@gmail.com> wrote: > # alter table pref_rep add primary key(id, author); > NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index > "pref_rep_pkey" for table "pref_rep" > ERROR: could not create unique index "pref_rep_pkey" > DETAIL: Table contains duplicated values. > > How could I find those duplicated pairs of id and author? similar example: test=*# select * from foo; id1 | id2 -----+----- 1 | 1 1 | 2 1 | 3 2 | 1 2 | 2 2 | 3 1 | 2 3 | 1 3 | 2 3 | 3 3 | 1 (11 rows) Time: 0,151 ms test=*# alter table foo add primary key (id1,id2); NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "foo_pkey" for table "foo" ERROR: could not create unique index "foo_pkey" DETAIL: Key (id1, id2)=(1, 2) is duplicated. Time: 1,394 ms test=*# select id1, id2, count(*) as c from foo group by id1, id2 having count(*) > 1; id1 | id2 | c -----+-----+--- 3 | 1 | 2 1 | 2 | 2 (2 rows) Time: 0,331 ms HTH. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Andreas Kretschmer <akretschmer@spamfence.net> wrote: > Alexander Farber <alexander.farber@gmail.com> wrote: > > > # alter table pref_rep add primary key(id, author); > > NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index > > "pref_rep_pkey" for table "pref_rep" > > ERROR: could not create unique index "pref_rep_pkey" > > DETAIL: Table contains duplicated values. > > > > How could I find those duplicated pairs of id and author? > > similar example: > > test=*# select * from foo; > id1 | id2 > -----+----- > 1 | 1 > 1 | 2 > 1 | 3 > 2 | 1 > 2 | 2 > 2 | 3 > 1 | 2 > 3 | 1 > 3 | 2 > 3 | 3 > 3 | 1 > (11 rows) > > Time: 0,151 ms > test=*# alter table foo add primary key (id1,id2); > NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "foo_pkey" for table "foo" > ERROR: could not create unique index "foo_pkey" > DETAIL: Key (id1, id2)=(1, 2) is duplicated. > Time: 1,394 ms > test=*# select id1, id2, count(*) as c from foo group by id1, id2 having count(*) > 1; > id1 | id2 | c > -----+-----+--- > 3 | 1 | 2 > 1 | 2 | 2 > (2 rows) > > Time: 0,331 ms If your next question is 'how to delete ...', my answer: (yeah, reading SO ;-) ) test=*# select ctid,* from foo; ctid | id1 | id2 --------+-----+----- (0,1) | 1 | 1 (0,2) | 1 | 2 (0,3) | 1 | 3 (0,4) | 2 | 1 (0,5) | 2 | 2 (0,6) | 2 | 3 (0,7) | 1 | 2 (0,8) | 3 | 1 (0,9) | 3 | 2 (0,10) | 3 | 3 (0,11) | 3 | 1 (11 rows) Time: 0,170 ms test=*# delete from foo where ctid in (select min(ctid) from foo where (id1,id2) in (select id1, id2 from foo group by id1,id2 having count(*) > 1) group by id1,id2); DELETE 2 Time: 0,559 ms test=*# select ctid,* from foo; ctid | id1 | id2 --------+-----+----- (0,1) | 1 | 1 (0,3) | 1 | 3 (0,4) | 2 | 1 (0,5) | 2 | 2 (0,6) | 2 | 3 (0,7) | 1 | 2 (0,9) | 3 | 2 (0,10) | 3 | 3 (0,11) | 3 | 1 (9 rows) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Thank you -
On Mon, Feb 4, 2013 at 5:06 PM, Andreas Kretschmer
<akretschmer@spamfence.net> wrote:
>> # alter table pref_rep add primary key(id, author);
>> NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index
>> "pref_rep_pkey" for table "pref_rep"
>> ERROR: could not create unique index "pref_rep_pkey"
>> DETAIL: Table contains duplicated values.
how do you get this DETAIL, is it a setting for psql prompt?
I've got a nice answer for my question at Stackoverflow:
DELETE FROM pref_rep p USING (
SELECT id, author, max(stamp) stamp
FROM pref_rep
GROUP BY id, author
HAVING count(1) > 1) AS f
WHERE p.id=f.id AND p.author=f.author AND p.stamp<f.stamp;
and learnt about SQL Fiddle too -
http://sqlfiddle.com/#!11/59fbc/11
Regards
Alex
Alexander Farber <alexander.farber@gmail.com> wrote: > Thank you - > > On Mon, Feb 4, 2013 at 5:06 PM, Andreas Kretschmer > <akretschmer@spamfence.net> wrote: > >> # alter table pref_rep add primary key(id, author); > >> NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index > >> "pref_rep_pkey" for table "pref_rep" > >> ERROR: could not create unique index "pref_rep_pkey" > >> DETAIL: Table contains duplicated values. > > how do you get this DETAIL, is it a setting for psql prompt? You means the 'DETAIL: Key (id1, id2)=(1, 2) is duplicated.'? I'm using 9.2, i think, that's the reason. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°