Somehow I got duplicate values in a column covered by a unique key.
CREATE TABLE myschema.vocabulary ( phrase_id integer NOT NULL DEFAULT nextval('onto_oper.vocabulary_phraseid_seq'::regclass), phrase text COLLATE pg_catalog."default" NOT NULL, is_to_sync boolean NOT NULL DEFAULT true, last_modified timestamp without time zone NOT NULL DEFAULT timezone('utc'::text, now()), CONSTRAINT pk_vocabulary PRIMARY KEY (phrase_id), CONSTRAINT vocabulary_phrase_key UNIQUE (phrase) ) WITH ( OIDS = FALSE ) TABLESPACE pg_default;
Today I noticed a strange behaviour in a regularly executed routine. The problem seemed to be absolutely irrelevant:
ERROR: duplicate key value violates unique constraint "vocabulary_phrase_key" DETAIL: Key (phrase)=(Ã…sa) already exists. CONTEXT: SQL statement "update myschema.vocabulary set is_to_sync = false where is_to_sync = true"
It was unclear how an update of an independent column could cause such an error, but I started searching duplicate entries. GROUP BY on phrase column did not give any results HAVING count(*) > 1. Then I copied the data into another table, without indices. This gave the answer there is a duplicated value 'Åsa' (I guess the same could be done by switching off index scan).
From the copy I found the IDs of problem entries: 5850963 and 29883128. Some funny queries:
select (select phrase from myschema.vocabulary where phrase_id = 5850963) = (select phrase from myschema.vocabulary where phrase_id = 29883128)
Gives 'true'.
select phrase_id from myschema.vocabulary where phrase = (select phrase from myschema.vocabulary where phrase_id = 5850963)
Gives 29883128.
Is it some misconfiguration problem in my database? Since I see some non-ASCII Unicode characters, I suppose it might be encoding conversion problem. I inspected both values to the best of my abilities and found them equal byte-to byte.
My version() is PostgreSQL 10.10 (Ubuntu 10.10-1.pgdg18.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit