Обсуждение: BUG #3724: Duplicate values added to table despite unique index

Поиск
Список
Период
Сортировка

BUG #3724: Duplicate values added to table despite unique index

От
"Mason Hale"
Дата:
The following bug has been logged online:

Bug reference:      3724
Logged by:          Mason Hale
Email address:      masonhale@gmail.com
PostgreSQL version: 8.2.5
Operating system:   Redhat Linux (kernel: Linux 2.6.18-8.1.15.el5PAE)
Description:        Duplicate values added to table despite unique index
Details:

I discovered this issue when an update statement was resulting in a
duplicate key violation error, even though the update did not change any of
the columns in the unique index (and neither did the one trigger on this
table).

Here is the table description:

prod_2=> \d topic_version_page
                  Table "bdu.topic_version_page"
       Column        |            Type             |   Modifiers
---------------------+-----------------------------+---------------
 topic_version_id    | integer                     | not null
 page_id             | integer                     | not null
 link_score          | double precision            |
 created_at          | timestamp without time zone | default now()
 updated_at          | timestamp without time zone | default now()
 is_entry_page       | boolean                     | default false
 linking_entry_count | integer                     | default 0
Indexes:
    "index_topic_version_page_on_topic_version_id_and_page_id" UNIQUE, btree
(topic_version_id, page_id)
    "index_topic_version_page_on_link_score" btree (link_score)
    "index_topic_version_page_on_topic_version_id_and_created_at" btree
(topic_version_id, created_at)
Foreign-key constraints:
    "topic_version_page_topic_version_id_fkey" FOREIGN KEY
(topic_version_id) REFERENCES topic_version(id) ON DELETE CASCADE
Triggers:
    topic_version_page_updated_at_trigger BEFORE UPDATE ON
topic_version_page FOR EACH ROW EXECUTE PROCEDURE
update_updated_at_timestamp()


Note that there is a unique index on (topic_version_id, page_id).

Now look at the result of this query:

prod_2=> select page_id, count(*) from topic_version_page where
topic_version_id = 263 group by 1 having count(*) > 1;
  page_id  | count
-----------+-------
 161335682 |     2
 194359108 |     2
(2 rows)

Here we have two rows that violate the unique index constraint.

Looking at the rows in more detail:

prod_2=> select * from topic_version_page where topic_version_id = 263 and
page_id in (161335682, 194359108);
 topic_version_id |  page_id  | link_score |         created_at         |
     updated_at         | is_entry_page | linking_entry_count
------------------+-----------+------------+----------------------------+---
-------------------------+---------------+---------------------
              263 | 161335682 |          0 | 2007-10-13 02:40:49.864219 |
2007-11-01 15:58:57.268593 | f             |                   5
              263 | 194359108 |          0 | 2007-10-25 13:34:20.654336 |
2007-10-25 13:34:20.654336 | f             |                   1
              263 | 194359108 |          0 | 2007-10-25 13:34:20.654336 |
2007-11-04 13:08:03.011292 | f             |                   2
              263 | 161335682 |          0 | 2007-10-13 02:40:49.864219 |
2007-11-04 13:08:03.011292 | f             |                   6
(4 rows)

We can see that each duplicate within a pair was created at the same time
(or at least within the same transaction), but that each pair was created at
a different time.

My expectation is that with the unique index in place this should not be
able to happen.

Other info that may be useful:

This database was created by taking a dump from a 8.2.4 database on 11/3 and
restoring into a new 8.2.5 database (on different hardware). The created_at
timestamps seem to indicate that these duplicates were created prior to the
dump being created.

However running the same query on the original 8.2.4 database returns zero
rows:

prod_1=> select page_id, count(*) from topic_version_page where
topic_version_id = 263 group by 1 having count(*) > 1;
 page_id | count
---------+-------
(0 rows)

prod_1=>

Finally, I restored the *same* dump file in another 8.2.5 database instance
and that database does not have the duplicate rows either.

From this it seems to me that on the following occurred:

Some subsequent update operation resulted in two identical rows being
created in the same table. This likely happened at '2007-11-04
13:08:03.011292' which is the updated_at timestamp on two of the duplicate
rows (and is after the restore).

This looks like a bug related to update operations and unique indexes, but I
could of course be wrong.

Thanks for looking into it.

Mason

Re: BUG #3724: Duplicate values added to table despite unique index

От
Tom Lane
Дата:
"Mason Hale" <masonhale@gmail.com> writes:
> However running the same query on the original 8.2.4 database returns zero
> rows:

> prod_1=> select page_id, count(*) from topic_version_page where
> topic_version_id = 263 group by 1 having count(*) > 1;
>  page_id | count
> ---------+-------
> (0 rows)

Is that still true if you do the query with enable_indexscan = off
and enable_bitmapscan = off?  If you see matching rows then, I'd
suspect a corrupt index in the 8.2.4 installation --- does REINDEXing
it succeed?

For that matter, do you still see dups if you prevent use of the index
in the 8.2.5 query?  Maybe it's that index that is corrupt.

            regards, tom lane

Re: BUG #3724: Duplicate values added to table despite unique index

От
"Mason Hale"
Дата:
On Nov 6, 2007 11:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Mason Hale" <masonhale@gmail.com> writes:
> > However running the same query on the original 8.2.4 database returns
> zero
> > rows:
>
> > prod_1=> select page_id, count(*) from topic_version_page where
> > topic_version_id = 263 group by 1 having count(*) > 1;
> >  page_id | count
> > ---------+-------
> > (0 rows)
>
> Is that still true if you do the query with enable_indexscan = off
> and enable_bitmapscan = off?  If you see matching rows then, I'd
> suspect a corrupt index in the 8.2.4 installation --- does REINDEXing
> it succeed?
>

With enable_indexscan = off and enable_bitmapscan = off on the
8.2.4instance, it still returns zero rows.

prod_1=> set enable_indexscan = off;
SET
prod_1=> set enable_bitmapscan = off;
SET
prod_1=> select page_id, count(*) from topic_version_page where
topic_version_id = 263 group by 1 having count(*) > 1;
 page_id | count
---------+-------
(0 rows)


> For that matter, do you still see dups if you prevent use of the index
> in the 8.2.5 query?  Maybe it's that index that is corrupt.

Unfortunately, I'm not able to test that at this point.
To get our production db (the 8.2.5 instance) back in operation I deleted
the extra duplicate rows, so that the update statement would complete.

If I need to reindex the table in 8.2.5 database I'll need to wait until
this evening, barring any emergencies. (It is a 20GB table)

Mason

Re: BUG #3724: Duplicate values added to table despite unique index

От
Tom Lane
Дата:
"Mason Hale" <masonhale@gmail.com> writes:
>> For that matter, do you still see dups if you prevent use of the index
>> in the 8.2.5 query?  Maybe it's that index that is corrupt.

> Unfortunately, I'm not able to test that at this point.
> To get our production db (the 8.2.5 instance) back in operation I deleted
> the extra duplicate rows, so that the update statement would complete.

Mph.  I'm afraid the evidence is mostly gone then, and we probably won't
be able to figure out what happened.  However, it would be worth
checking two things:

1. Can you confirm that the rows that got duplicated were in fact
present (in only one copy) in the 8.2.4 DB?

2. Can you check that there are still 1 (rather than 0) copies of the
rows in the 8.2.5 DB?  One possible theory about this is that what you
had was (two instances of) two index entries pointing at the same heap
row, in which case a DELETE that you thought removed only one copy would
have deleted both.

            regards, tom lane

Re: BUG #3724: Duplicate values added to table despite unique index

От
"Mason Hale"
Дата:
On Nov 6, 2007 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Mason Hale" <masonhale@gmail.com> writes:
> >> For that matter, do you still see dups if you prevent use of the index
> >> in the 8.2.5 query?  Maybe it's that index that is corrupt.
>
> > Unfortunately, I'm not able to test that at this point.
> > To get our production db (the 8.2.5 instance) back in operation I
> deleted
> > the extra duplicate rows, so that the update statement would complete.
>
> Mph.  I'm afraid the evidence is mostly gone then, and we probably won't
> be able to figure out what happened.


Sorry about that. But I had to get things back up and running.


> However, it would be worth
> checking two things:
>
> 1. Can you confirm that the rows that got duplicated were in fact
> present (in only one copy) in the 8.2.4 DB?


Yes, they are present:

prod_1=> select * from topic_version_page where topic_version_id = 263 and
page_id in (161335682, 194359108);
 topic_version_id |  page_id  | link_score |         created_at
|         updated_at         | is_entry_page | linking_entry_count

------------------+-----------+------------+----------------------------+----------------------------+---------------+---------------------
              263 | 161335682 |          0 | 2007-10-13 02:40:49.864219 |
2007-11-01 15:58:57.268593 | f             |                   5
              263 | 194359108 |          0 | 2007-10-25 13:34:20.654336 |
2007-11-04 01:01:50.512446 | f             |                   2
(2 rows)

prod_1=>


>
> 2. Can you check that there are still 1 (rather than 0) copies of the
> rows in the 8.2.5 DB?  One possible theory about this is that what you
> had was (two instances of) two index entries pointing at the same heap
> row, in which case a DELETE that you thought removed only one copy would
> have deleted both.
>

Yes, we have 1 of each row (I kept the most recently updated version of
each):

prod_2=> select * from topic_version_page where topic_version_id = 263 and
page_id in (161335682, 194359108);
 topic_version_id |  page_id  | link_score |         created_at
|         updated_at         | is_entry_page | linking_entry_count

------------------+-----------+------------+----------------------------+----------------------------+---------------+---------------------
              263 | 194359108 |          0 | 2007-10-25 13:34:20.654336 |
2007-11-04 13:08:03.011292 | f             |                   2
              263 | 161335682 |          0 | 2007-10-13 02:40:49.864219 |
2007-11-06 16:01:36.393953 | f             |                   7
(2 rows)

prod_2=>



Mason

Re: BUG #3724: Duplicate values added to table despite unique index

От
Tom Lane
Дата:
"Mason Hale" <masonhale@gmail.com> writes:
> On Nov 6, 2007 1:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Mph.  I'm afraid the evidence is mostly gone then, and we probably won't
>> be able to figure out what happened.

> Sorry about that. But I had to get things back up and running.

Understood, but it's unlikely we'll be able to find the cause without
a test case to poke at.

>> 2. Can you check that there are still 1 (rather than 0) copies of the
>> rows in the 8.2.5 DB?

> Yes, we have 1 of each row (I kept the most recently updated version of
> each):

Ah, I forgot that the rows were obviously not identical because of the
differing updated_at values.

Tell us more about the updating process --- is it likely that there
could be conflicting concurrent updates on a row?  What does your
app do in such cases?

            regards, tom lane

Re: BUG #3724: Duplicate values added to table despite unique index

От
"Mason Hale"
Дата:
> >> 2. Can you check that there are still 1 (rather than 0) copies of the
> >> rows in the 8.2.5 DB?
>
> > Yes, we have 1 of each row (I kept the most recently updated version of
> > each):
>
> Ah, I forgot that the rows were obviously not identical because of the
> differing updated_at values.
>
> Tell us more about the updating process --- is it likely that there
> could be conflicting concurrent updates on a row?  What does your
> app do in such cases?
>

We do some application-layer locking to prevent the same 'topic-version'
from being updated at the same time.
But there could be a bug somewhere that let's that happen, in which case,
concurrent updates of the same row could occur. So I guess I would say it is
unlikely, but possible.

If there is a concurrent update of the same row, I guess we're relying on
Postgres to handle that. If that results in a deadlock or any other error,
then we catch and log the error, mark the update job as failed (in another
db table) and try again. In most every case, the same request will complete
successfully on a second try. In this case, every update to the rows in
question was resulting in a unique index violation, which resulted in a
near-constant stream of errors being logged.