Обсуждение: Re: [BUGS] BUG #14874: Dublicate values in primary key

Поиск
Список
Период
Сортировка

Re: [BUGS] BUG #14874: Dublicate values in primary key

От
CONCHY Elisabeth
Дата:
Hello,
   Thank you for your answer on  PostgreSQL mailing list "pgsql-bugs".   I work with Henri Ky who is out of office
today,and I ditn't find 
 
the way to answer on the web site.   So, I write to you directly, but feel free to add this text on the site.
_______________________________________________________________________

>>    "You'll clearly need to fix the problem by hand, resolving which rows

>> to keep. Rows can be deleted by using the hidden ctid column."

   We can't just delete the rows, because the data are different. The 
id only is duplicate.  It looks as if the sequence was requested only once, and used twice 
for inserting new rows.
   The id has for default value :    

nextval('<sequence>')


and is defined as primary key by :
ALTER TABLE <table> ADD CONSTRAINT <table>_pk PRIMARY KEY
(id) USING INDEX TABLESPACE schema_idx;

Yesterday night, we found out that the problem affects not only one 
table, but 5.

We would like to understand how this is possible, whithout PosgreSQL 
detecting any error.

Can we be sure that couldn't happen again, if we upgrade the version (to 
9.4.12 for example) ?

-- 
Best Regards,
_____________________________________
Elisabeth Conchy
Inspectrice des Finances Publiques
DGFiP/SSI/SI-2B/NESSIE/SoLiPOD




-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14874: Dublicate values in primary key

От
Tom Lane
Дата:
CONCHY Elisabeth <elisabeth.conchy@dgfip.finances.gouv.fr> writes:
> Yesterday night, we found out that the problem affects not only one 
> table, but 5.
> We would like to understand how this is possible, whithout PosgreSQL 
> detecting any error.
> Can we be sure that couldn't happen again, if we upgrade the version (to 
> 9.4.12 for example) ?

Nobody's ever going to promise you that Postgres contains no bugs :-(.
What we *can* say confidently is that 9.4.4 contains several known
data-corruption-causing bugs that are fixed in the latest 9.4 minor
release.  So you really ought to upgrade.

Having said that ... the symptoms you are describing don't sound much
like a Postgres-induced bug.  I could believe a bug possibly having
caused the sequence generator to go backwards.  And I could believe
a bug having corrupted an index so that the index failed to notice
duplicate key insertions.  But it's harder to believe both of those
things happening at once, and even harder to believe that a heretofore
unnoticed bug would cause five different indexes to be corrupted at
the same time that a sequence generator went backwards.

What *does* fit that set of facts really well is an OS crash (either
a power failure or kernel crash), if Postgres's fsync calls were not
being honored properly so that some data file changes got to disk
ahead of the associated WAL log entries.  In that case, even after
WAL replay you would have an inconsistent database: there could be
rows in the tables with IDs past where the sequence generator thought
it was, and the indexes could be corrupt enough so that you didn't
get told about duplicate entries as the sequence generator got advanced
again.  (The "index corruption" might well simply amount to the indexes
not having been updated to match what was in the tables proper.)

So ... have there been any server crashes lately?

In any case, my advice is to take a hard look at whether your storage
stack honors fsync all the way down.  That won't get you out of the
immediate problem --- as Peter said, manual cleanup of the data and
then reindexing the indexes seems like your only way out of that.
But it might save you from a recurrence.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14874: Dublicate values in primary key

От
Peter Geoghegan
Дата:
On Fri, Oct 27, 2017 at 4:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So ... have there been any server crashes lately?
>
> In any case, my advice is to take a hard look at whether your storage
> stack honors fsync all the way down.  That won't get you out of the
> immediate problem --- as Peter said, manual cleanup of the data and
> then reindexing the indexes seems like your only way out of that.
> But it might save you from a recurrence.

I agree with your conclusion, but I still think that it's slightly
ambiguous. There didn't seem to be an issue with sequences going
backwards -- the actual complaint was about there being multiple
versions of the same logical row visible simultaneously (sequences
were mentioned in passing). This is probably a case of tuples within a
HOT chain experiencing some kind of "resurrection". Though that in
itself doesn't prove much of anything.

The latest example of a bug that made "tuple resurrection" possible is
the recent "freeze-the-dead" bug, but I can recall a couple of other
bugs in the history of 9.4 with the same symptom. Of course, storage
problems can also cause this symptom, and if you're using foreign
keys, chances are good that storage infelicities will result in
corruption with symptoms similar to the corruption caused by various
Multixact bugs (e.g., resurrected rows, VACUUM reporting apparent
MultiXactId wraparound, etc). That's why these symptoms don't tell us
much.

I don't want to make too much of this. It's clear that being on such
an old point release is needlessly risky, and it's also clear that the
OP has work to do to validate the reliability of storage as part of a
disaster recovery effort.

-- 
Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14874: Dublicate values in primary key

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> I agree with your conclusion, but I still think that it's slightly
> ambiguous. There didn't seem to be an issue with sequences going
> backwards -- the actual complaint was about there being multiple
> versions of the same logical row visible simultaneously (sequences
> were mentioned in passing). This is probably a case of tuples within a
> HOT chain experiencing some kind of "resurrection". Though that in
> itself doesn't prove much of anything.

Yeah, I wondered about that, particularly in view of the multiple
tuple-freezing bugs we've found.  If the damage were in just one table
I'd believe it for sure ... but I'm not sure I believe it across 5
tables.

But the bottom line is the same: update to 9.4.latest (which is *not*
12 btw), and double-check storage reliability.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14874: Dublicate values in primary key

От
Henri KY
Дата:
> Peter Geoghegan <pg@bowt.ie> writes:
>> I agree with your conclusion, but I still think that it's slightly
>> ambiguous. There didn't seem to be an issue with sequences going
>> backwards -- the actual complaint was about there being multiple
>> versions of the same logical row visible simultaneously (sequences
>> were mentioned in passing). This is probably a case of tuples within a
>> HOT chain experiencing some kind of "resurrection". Though that in
>> itself doesn't prove much of anything.
> Yeah, I wondered about that, particularly in view of the multiple
> tuple-freezing bugs we've found.  If the damage were in just one table
> I'd believe it for sure ... but I'm not sure I believe it across 5
> tables.
>
> But the bottom line is the same: update to 9.4.latest (which is *not*
> 12 btw), and double-check storage reliability.
>
>             regards, tom lane
>
>

Before the issue, there was a data recovery operation (the data cluster 
was restored), but it seems that data in tablespaces were not cleaned up 
before the operation, do you think this could be the cause of the 
duplicate keys issue?

It seems that it is not for us to fix the issue by deleting manually 
duplicated data. Is it possible to restore the data cluster before the 
issue and re-apply WAL?

Concerning the storage, we use data disk bay, it seems that our hardware 
is quite reliable. But it seems that during the issue, no system error 
was found, thus we will reconfirm that the system engineer.

Concerning the upgrade, is it possible to upgrade directly from the 
minor version 9.4.4 to the latest minor version 9.4.14 without applying 
all the minor versions in between? I guess that to upgrade one minor 
version it only needs to stop PostgreSQL, then apply the binaries and 
start PostgreSQL. Could you also confirm this?

Best regards
Henri Ky



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14874: Dublicate values in primary key

От
Henri KY
Дата:
Hello again,

With the uncleaned files left in the tablespaces after database restore 
operation, this let us perform few futher tests.
We tried to duplicate manually data files used by one table and restart 
the cluster, the reindex of the table has failed with duplicate key error.
As a conclusion, PostgreSQL does not check data integrity during cluster 
restart, and if data files are altered, the database is corrupted 
without any notice.
It would be better that PostgreSQL would report error related to data 
integrity issue during database lifetime in the futur versions (this is 
not implemented in version 9.6).


Here below the tests we have perfomed in detail:

[postgres] $ ls -lh 
/u05/pgsql/9.4/data/mydb001/tbs_ges_dat/PG_9.4_201409291/17273
total 1,1G
-rw------- 1 postgres postgres 1,0G 31 oct.  13:48 17274
-rw------- 1 postgres postgres  11M 31 oct.  13:48 17274.1
-rw------- 1 postgres postgres 280K 31 oct.  13:46 17274_fsm

[postgres] mydb=# select count(*) from anomalie;  count
---------- 18000000

[postgres] mydb=# \d+                               Liste des relations Schéma |        Nom         |   Type   |
Propriétaire| Taille   | 
 
Description
------------+--------------------+----------+--------------+------------+------------- public | anomalie           |
table   | postgres     | 1034 MB    | public | anomalie_id_seq    | séquence | postgres     | 8192 bytes | public |
pg_buffercache    | vue      | postgres     | 0 bytes    | public | pg_stat_statements | vue      | postgres     | 0
bytes   |
 

[postgres] mydb=# reindex table anomalie;
REINDEX

[postgres] $ pg_ctl stop ...

[postgres] $ cd 
/u05/pgsql/9.4/data/mydb001/tbs_ges_dat/PG_9.4_201409291/17273
[postgres] $ cp 17274.1 17274.2
[postgres] $ cp 17274 17274.1

[postgres] $ pg_ctl start ...

[postgres] $ ls -lh 
/u05/pgsql/9.4/data/mydb001/tbs_ges_dat/PG_9.4_201409291/17273
total 2,1G
-rw------- 1 postgres postgres 1,0G 31 oct.  13:58 17274
-rw------- 1 postgres postgres 1,0G 31 oct.  13:59 17274.1
-rw------- 1 postgres postgres  11M 31 oct.  13:59 17274.2
-rw------- 1 postgres postgres 280K 31 oct.  13:46 17274_fsm

[postgres] mydb=# select count(*) from anomalie;  count
---------- 35825792

[postgres] mydb=# \d+                               Liste des relations Schéma |        Nom         |   Type   |
Propriétaire| Taille   | 
 
Description
--------+--------------------+----------+--------------+------------+------------- public | anomalie           |
table   | postgres     | 2058 MB    | public | anomalie_id_seq    | séquence | postgres     | 8192 bytes | public |
pg_buffercache    | vue      | postgres     | 0 bytes    | public | pg_stat_statements | vue      | postgres     | 0
bytes   |
 

[postgres] mydb=# reindex table anomalie;
ERROR:  could not create unique index "anomalie_pk"
DÉTAIL : Key (id_anomalie)=(1) is duplicated.


Regards
Henri



-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs