Duplicate rows during pg_dump

Поиск
Список
Период
Сортировка
От Chaz Yoon
Тема Duplicate rows during pg_dump
Дата
Msg-id CAHPXOQx0UUeHOEKcf24vVF5ESvwbu=zB+uRgwg-AUVrSeRM2xw@mail.gmail.com
обсуждение исходный текст
Ответы Re: Duplicate rows during pg_dump
Список pgsql-general
I am seeing a duplicate, stale copy of the same row when performing a pg_dump or copying a specific table, but not when directly selecting from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client tools.

It's happening on a users table, which has a primary key and enforces a unique email address:

                                        Table "public.users"
      Column     |            Type             |                 Modifiers
  ---------------+-----------------------------+---------------------------------------------------
   id            | integer                     | not null default nextval('users_id_seq'::regclass)
   email         | character varying(255)      | not null default ''::character varying
   last_activity | timestamp without time zone |
  Indexes:
      "users_pkey" PRIMARY KEY, btree (id)
      "users_unique_email" UNIQUE, btree (email)

I first noticed the problem when doing copying the table to another database. Roughly this:

  % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt
  % psql test_db < users.txt
  [...]
  ERROR:  could not create unique index "users_pkey"
  DETAIL:  Key (id)=(123) is duplicated.
  [...]
  ERROR:  could not create unique index "users_unique_email"
  DETAIL:  Key (email)=(this_user@xyz.com) is duplicated.

It appears there's some sort of duplicate record for a single user in the database. Checking the pg_dump output, I saw that a single user's record was being exported twice:

  % grep -i this_user@xyz.com users.txt
  INSERT INTO users (id, email, last_activity) VALUES (123, 'this_user@xyz.com', '2015-10-21 10:32:15.997887');
  INSERT INTO users (id, email, last_activity) VALUES (123, 'this_user@xyz.com', '2015-10-02 11:32:58.615743');

The rows were not exactly the same. Connecting to the source database directly, I tried this:

  remote_db=> select count(1) from users where id = 123;
   count
  -------
       1
  (1 row)

  remote_db=> select count(1) from users where email = 'this_user@xyz.com';
   count
  -------
       1
  (1 row)

To eliminate any risk of it being a weird locking issue, I restored a snapshot of the database into a new RDS instance but I got the same results. I then tried the following:

  remote_db=> create table users_copy_with_indexes (like users including defaults including constraints including indexes including storage including comments);
  CREATE TABLE
  remote_db=> insert into users_copy_with_indexes select * from users;
  ERROR:  duplicate key value violates unique constraint "users_copy_with_indexes_pkey"
  DETAIL:  Key (id)=(123) already exists.

However, when I created a copy without the indexes, I can see the duplicate rows:

  remote_db=> create table users_copy_without_indexes (like users);
  CREATE TABLE
  remote_db=> insert into users_copy_without_indexes select * from users;
  INSERT 0 523342
  remote_db=> select count(1) from users_copy_without_indexes where id = 123;
   count
  -------
       2
  (1 row)
  remote_db=> select count(1) from users_copy_without_indexes where email = 'this_user@xyz.com';
   count
  -------
       2
  (1 row)

Any suggestions for what to look for next? Is it table corruption?

Chaz

В списке pgsql-general по дате отправления:

Предыдущее
От: Rafal Pietrak
Дата:
Сообщение: Re: partial JOIN (was: ID column naming convention)
Следующее
От: Lele Gaifax
Дата:
Сообщение: Re: Using function returning multiple values in a select