Re: Duplicate data despite unique constraint

Поиск
Список
Период
Сортировка
От Albe Laurenz
Тема Re: Duplicate data despite unique constraint
Дата
Msg-id A737B7A37273E048B164557ADEF4A58B53897A23@ntex2010i.host.magwien.gv.at
обсуждение исходный текст
Ответ на Duplicate data despite unique constraint  (Jonas Tehler <jonas@tehler.se>)
Ответы Re: Duplicate data despite unique constraint  (Daniel Caldeweyher <dcalde@gmail.com>)
Список pgsql-general
Jonas Tehler wrote:
> We’re running Postgresql 9.4.5 on Amazon RDS. One of our tables looks something like this:
> 
> 
> CREATE TABLE users
> (
>   ...
>   email character varying(128) NOT NULL,
>   ...
>   CONSTRAINT users_email_key UNIQUE (email)
> )
> 
> Despite this we have rows with very similar email values. I discovered the problem when I tried to add
> a column and got the following error:
> 
> ActiveRecord::RecordNotUnique: PG::UniqueViolation: ERROR:  could not create unique index
> "users_email_key"
> DETAIL:  Key (email)=(xxx@yyy.com) is duplicated.
> : ALTER TABLE "users" ADD COLUMN "client_branch" character varying(255) DEFAULT ‘beta'
> 
> 
> Now look at this:
> 
> => select email from users where email = 'xxx@yyy.com';
>            email
> ---------------------------
>  xxx@yyy.com
> (1 row)
> 
> 
> => select email from users where email LIKE 'xxx@yyy.com';
>            email
> ---------------------------
>  xxx@yyy.com
>  xxx@yyy.com
> (2 rows)
> 
> 
> I have tried to compare the binary data in various ways, email::bytes, md5(email),
> encode(email::bytea, 'hex’), char_length(email) and it all looks the same for both rows.
> 
> Any suggestions how I can discover the difference between the values and how they could have been
> added without triggering the constraint? I know that the values were added after the constraint was
> added.
> 
> The data was added from a Ruby on Rails app that also has unique constraints on the email field and
> validation on the email format.

That looks very much like data corruption.

I guess there is an index on "users" that is used for one query but not the other.
Can you verify with EXPLAIN?

Assuming that it is a 'text_ops' or 'varchar_ops' index, I'd say it gets used for the first
query, but not for the second.  That would mean that there is an extra entry in the table that
is not in the index.

Did you have any crashes, standby promotion, restore with PITR or other unusual occurrences recently?

Make sure you have a physical backup; there may be other things corrupted.

This is a possible path to proceed:

Once you have made sure that you have a physical backup, try to add the "ctid" column to both queries.

Then delete the extra row from the second query with "DELETE FROM email WHERE ctid = ...".

Then, to make sure there is no other corruption lurking, make a logical backup
with pg_dumpall, create a new database cluster, create a new one with "initdb" and
restore the data.

Yours,
Laurenz Albe

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Duplicate data despite unique constraint
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: UPDATE OR REPLACE?