Обсуждение: BUG #8382: Duplicate primary key
The following bug has been logged on the website: Bug reference: 8382 Logged by: Curd Reinert Email address: curd.reinert@ppi.de PostgreSQL version: 9.0.5 Operating system: Windows Server 2013 Description: One of our customers encounters a problem because some tables contain multiple rows with the same primary key. I've noticed a similar issue has been reported in #3231 and #7760, but couldn't see how to proceed from there. The customer has about 1.000 installations of our software, each running with its own Postgres instance, the vast majority running without any problems. One system is causing most of the trouble, with the problem occuring about every one or two weeks. Our software uses about 70 tables. In two of these tables, multiple rows with the same primary key occure. These tables are used to lock resources between threads. One of them, called "cluster_ressource" has a constant set of rows which are updated frequently. The other, called "sperre" is empty when the system is idle. Rows are inserted to lock certain objects and removed when the locks are released. The operating system is Windows Server 2003. apparently in a VMWare virtual machine. The data folder is on D: which is a VMWare virtual disk. Compared to the other systems, this one has quite a lot of traffic and therefore a high probability of concurrent updates. According to our customer, there were no problems with the same system as long as it was running with an older version of our software which used Postgres 8.3. Of course there were also changes in our software. Once the problem is noticed, the customer fixes it by deleting both entries and inserting a new one. In one case, a REINDEX was necessary to get the system running again. Afterwards, the system is running smoothly for some time, approximately one or two weeks. Then duplicate rows appear again. I checked fsync, it has not been switched off. Our application is written in Java, using a plain JDBC driver to access the database. Of course we don't fiddle with the data model, indexes etc. during runtime. I do have a copy of the data directory with a duplicate row in cluster_ressource, in case anyone wants to dive into it. It seems that this not just one system. Most of their systems are nowadays running on Postgres 9.0, and the majority without any problems. However, the customer encountered similar problems with a handful of installations. Again the problems started after updating to Postgres 9.0, and again these systems have a higher workload than the average. Please let me know if you need any further information. Thank you very much in advance!
On Tue, Aug 13, 2013 at 2:50 PM, <curd.reinert@ppi.de> wrote: > The following bug has been logged on the website: > > Bug reference: 8382 > Logged by: Curd Reinert > Email address: curd.reinert@ppi.de > PostgreSQL version: 9.0.5 The current bug-fix release for Postgres 9 is 9.0.13. There was an extremely high profile security vulnerability fixed in 9.0.13 so I would strongly recommend updating. I see bugs that could cause this in both 9.0.7 and 9.0.11. However both are very rare and hard to believe you would see either of them happen repeatedly. Unless you're calling CREATE INDEX CONCURRENTLY *often* on this system? Do you know anything about the history of how often the system was rebooted -- either the host or the virtual machine? Also, have you run a memory checker on the system? -- greg
Hello Greg, gsstark@gmail.com schrieb am 13.08.2013 19:50:35: > The current bug-fix release for Postgres 9 is 9.0.13. There was an > extremely high profile security vulnerability fixed in 9.0.13 so I > would strongly recommend updating. Yes, I noticed that one. However, those installations are not exposed to the outside, they do have backups, the data inside the database is not very critical and there are about 1.000 installations to update. I don't think I can make our customer update all of them. Should be no problem for the one that is causing the trouble, though. > I see bugs that could cause this in both 9.0.7 and 9.0.11. Okay. I thought I checked the list of fixed bugs, but didn't notice them then. I can see them now you've pointed me to the relevant patch levels. I will tell our cutomer to update that installation to 9.0.13. > Unless you're calling CREATE INDEX CONCURRENTLY > *often* on this system? Never. As we use practically the same code for Postgres, DB2 and Oracle, we are stuck to plain old SQL and don't do anything but selects, inserts, updates and deletes. And I'm pretty sure our customer isn't touching the database, either. > Do you know anything about the history of how often the system was > rebooted -- either the host or the virtual machine? I can't say for sure, but it seems that the VM is rebooted at least every night. I have no idea about the host. > Also, have you run a memory checker on the system? I'm two levels remote from the machine, but I will pass it on. Thank you very much. Curd
On Tue, Aug 13, 2013 at 2:50 PM, <curd.reinert@ppi.de> wrote: > I do have a copy of the data directory with a duplicate row in > cluster_ressource, in case anyone wants to dive into it. Could you run the following? select ctid,xmin,xmax,primarykey from tab where primarykey = 'duplicatevalue' And also show the \d tab output? And can you load the pageinspect contrib module in this database? -- greg
gsstark@gmail.com schrieb am 14.08.2013 12:44:40: > select ctid,xmin,xmax,primarykey from tab where primarykey =3D=20 'duplicatevalue' traviclinkengine=3D# select ctid,xmin,xmax,* from cluster=5Fressource where= =20 clusterknoten=5Fid =3D 1 and ressource=5Fid =3D 10; ctid | xmin | xmax | clusterknoten=5Fid | ressource=5Fid | anzahl -------+-----------+------+------------------+--------------+-------- (1,9) | 308561268 | 0 | 1 | 10 | 0 (3,9) | 308561268 | 0 | 1 | 10 | 0 (2 Zeilen) > And also show the \d tab output? traviclinkengine=3D# \d cluster=5Fressource Tabelle =BBpublic.cluster=5Fressource=AB Spalte | Typ | Attribute ------------------+----------+----------- clusterknoten=5Fid | bigint | not null ressource=5Fid | bigint | not null anzahl | smallint | not null Indexe: "cluster=5Fressource=5Fpkey" PRIMARY KEY, btree (clusterknoten=5Fid,=20 ressource=5Fid) "cl=5Fres=5Fres=5Ffkind" btree (ressource=5Fid) Fremdschl=FCssel-Constraints: "cl=5Fres=5Fclkn=5Ffk" FOREIGN KEY (clusterknoten=5Fid) REFERENCES=20 clusterknoten(id) ON DELETE RESTRICT "cl=5Fres=5Fres=5Ffk" FOREIGN KEY (ressource=5Fid) REFERENCES ressource= (id) ON=20 DELETE RESTRICT > And can you load the pageinspect contrib module in this database? Yes. At least I think I've just done it. :-) Curd