Re: Duplicate rows during pg_dump

Поиск
Список
Период
Сортировка
От Marc Mamin
Тема Re: Duplicate rows during pg_dump
Дата
Msg-id B6F6FD62F2624C4C9916AC0175D56D884209D6A3@jenmbs01.ad.intershop.net
обсуждение исходный текст
Ответ на Re: Duplicate rows during pg_dump  (Adrian Klaver <adrian.klaver@aklaver.com>)
Ответы Re: Duplicate rows during pg_dump
Список pgsql-general
>On 10/24/2015 12:35 PM, Chaz Yoon wrote:
>> I am seeing a duplicate, stale copy of the same row when performing a
>> pg_dump or copying a specific table, but not when directly selecting
>> from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client
>> tools.
>>
>> It's happening on a users table, which has a primary key and enforces a
>> unique email address:
>>
>>                                          Table "public.users"
>>        Column     |            Type             |                 Modifiers
>>
>> ---------------+-----------------------------+---------------------------------------------------
>>     id            | integer                     | not null default
>> nextval('users_id_seq'::regclass)
>>     email         | character varying(255)      | not null default
>> ''::character varying
>>     last_activity | timestamp without time zone |
>>    Indexes:
>>        "users_pkey" PRIMARY KEY, btree (id)
>>        "users_unique_email" UNIQUE, btree (email)
>>
>> I first noticed the problem when doing copying the table to another
>> database. Roughly this:
>>
>>    % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt
>>    % psql test_db < users.txt
>>    [...]
>>    ERROR:  could not create unique index "users_pkey"
>>    DETAIL:  Key (id)=(123) is duplicated.
>>    [...]
>>    ERROR:  could not create unique index "users_unique_email"
>>    DETAIL:  Key (email)=(this_user@xyz.com <mailto:this_user@xyz.com>)
>> is duplicated.
>>
>> It appears there's some sort of duplicate record for a single user in
>> the database. Checking the pg_dump output, I saw that a single user's
>> record was being exported twice:
>>
>>    % grep -i this_user@xyz.com <mailto:this_user@xyz.com> users.txt
>>    INSERT INTO users (id, email, last_activity) VALUES (123,
>> 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-21
>> 10:32:15.997887');
>>    INSERT INTO users (id, email, last_activity) VALUES (123,
>> 'this_user@xyz.com <mailto:this_user@xyz.com>', '2015-10-02
>> 11:32:58.615743');
>>
>> The rows were not exactly the same. Connecting to the source database
>> directly, I tried this:
>>
>>    remote_db=> select count(1) from users where id = 123;
>>     count
>>    -------
>>         1
>>    (1 row)
>>
>>    remote_db=> select count(1) from users where email =
>> 'this_user@xyz.com <mailto:this_user@xyz.com>';
>>     count
>>    -------
>>         1
>>    (1 row)
>>
>> To eliminate any risk of it being a weird locking issue, I restored a
>> snapshot of the database into a new RDS instance but I got the same
>> results. I then tried the following:
>>
>>    remote_db=> create table users_copy_with_indexes (like users
>> including defaults including constraints including indexes including
>> storage including comments);
>>    CREATE TABLE
>>    remote_db=> insert into users_copy_with_indexes select * from users;
>>    ERROR:  duplicate key value violates unique constraint
>> "users_copy_with_indexes_pkey"
>>    DETAIL:  Key (id)=(123) already exists.
>>
>> However, when I created a copy without the indexes, I can see the
>> duplicate rows:
>>
>>    remote_db=> create table users_copy_without_indexes (like users);
>>    CREATE TABLE
>>    remote_db=> insert into users_copy_without_indexes select * from users;
>>    INSERT 0 523342
>>    remote_db=> select count(1) from users_copy_without_indexes where id
>> = 123;
>>     count
>>    -------
>>         2
>>    (1 row)
>>    remote_db=> select count(1) from users_copy_without_indexes where
>> email = 'this_user@xyz.com <mailto:this_user@xyz.com>';
>>     count
>>    -------
>>         2
>>    (1 row)
>>
>> Any suggestions for what to look for next? Is it table corruption?

Most likely is the index corrupt, not the table.
You should check for further duplicates, fix them and as Adrian writes,
build a new index an then drop the corrupt one.

I've seen this a few times before, and if I recall well it was always after some plate got full.
Is AWS getting out of space :)

regards,
Marc Mamin

>
>I would say the smoking gun is the copy w/o indexes shows both records
>and the one with indexes only one. I would DROP/CREATE index on the
>original table, with the usual caveat that this does place a load on the
>table. Using Concurrently might help, but I would read the information here:
>
>http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY
>
>Building Indexes Concurrently
>>
>> Chaz
>>
>
>
>--
>Adrian Klaver
>adrian.klaver@aklaver.com
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general
>

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Duplicate rows during pg_dump
Следующее
От: Dane Foster
Дата:
Сообщение: Re: partial JOIN (was: ID column naming convention)