Обсуждение: BUG #4913: Row missing from primary key index
The following bug has been logged online: Bug reference: 4913 Logged by: Mathieu De Zutter Email address: mathieu@dezutter.org PostgreSQL version: 8.3.7 Operating system: Debian Lenny Description: Row missing from primary key index Details: I have a table log_event with a primary key on an integer 'id', called log_event_pkey. The tables contains a duplicate for id = 15723018. The duplicate (note that besides the id, all data differs) doesn't seem to be known by the index at all. Example: shs=# select id, event_timestamp from log_event where id = 15723018; id | event_timestamp ----------+---------------------------- 15723018 | 2009-05-09 13:47:33.441668 (1 row) But with indexscan and bitmapscan off, I get: shs=# select id, event_timestamp from log_event where id = 15723018; id | event_timestamp ----------+---------------------------- 15723018 | 2009-05-09 13:47:48.68558 15723018 | 2009-05-09 13:47:33.441668 (2 rows) Table definition: shs=# \d log_event Table "public.log_event" Column | Type | Modifiers -----------------+-----------------------------+---------------------------- ---------------------------- id | bigint | not null default nextval('log_event_id_seq'::regclass) user_id | integer | ip | inet | not null action_id | integer | not null object1_id | integer | object2_id | integer | event_timestamp | timestamp without time zone | not null Indexes: "log_event_pkey" PRIMARY KEY, btree (id) "log_event_action_id_idx" btree (action_id) "log_event_timestamp_idx" btree (event_timestamp) "log_event_user_id_idx" btree (user_id) Foreign-key constraints: "log_event_action_id_fkey" FOREIGN KEY (action_id) REFERENCES config.log_action(id) In this table definition you can also see that 'id' is generated by a sequence, which makes it even stranger to contain a dupe. I have made a copy of the complete pg cluster, so I can debug without interfering with my production db.
>>>>> "Mathieu" == "Mathieu De Zutter" <mathieu@dezutter.org> writes: Mathieu> I have a table log_event with a primary key on an integer Mathieu> 'id', called log_event_pkey. Mathieu> The tables contains a duplicate for id = 15723018. The Mathieu> duplicate (note that besides the id, all data differs) Mathieu> doesn't seem to be known by the index at all. [snip] This was first reported on IRC and I spent a little time working with the OP trying to dig up info to suggest a cause; here is the relevent data (all provided by the OP at my request). I have not been able to suggest a possible cause based on this. shs=# select ctid,xmin,xmax,cmin,cmax,* from log_event where id = 15723018; ctid | xmin | xmax | cmin | cmax | id | user_id | ip | action_id | object1_id | object2_id | event_timestamp ------------+-------+------+------+------+----------+---------+---------------+-----------+------------+------------+---------------------------- (79844,13) | 16410 | 0 | 0 | 0 | 15723018 | 0 | 79.65.40.251 | 4 | 10145 | | 2009-05-0913:47:48.68558 (79870,70) | 16479 | 0 | 0 | 0 | 15723018 | 1 | 80.200.87.228 | 1 | | | 2009-05-0913:47:33.441668 (2 rows) Notice that the two rows seem entirely independent (different xmin). The OP stated that his app generally does single-row inserts (with some exceptions not relevent here); however, we found a nearby row which shares the xmin: (79870,16) | 16410 | 0 | 0 | 0 | 15722964 | 1 | 80.200.87.228 | 3 | 720 | |2009-05-09 13:41:53.099589 (that row shows up correctly) Using pg_filedump on the primary key index did not show any index entries at all pointing to the (79844,13) row. Hexdump of the 79844 page did not reveal anything out of the ordinary; this is tuple (79844,13) starting at offset 1cc8: 00001cc0 00 00 00 00 00 00 00 00 1a 40 00 00 00 00 00 00 |.........@......| 00001cd0 00 00 00 00 01 00 e4 37 0d 00 07 00 03 09 18 5f |.......7......._| 00001ce0 0a ea ef 00 00 00 00 00 00 00 00 00 0f 02 20 4f |.............. O| 00001cf0 41 28 fb 00 04 00 00 00 a1 27 00 00 00 00 00 00 |A(.......'......| (full page at http://pastebin.com/m28036405) There are additional rows in the table which also show similar issues (rows found by seqscan but missing in indexscan). These also have xids which are "suspicious" (apparently not in the expected sequence or dups of other xids). e.g.: these rows don't show up in the index: (79854,50) | 16401 | 0 | 0 | 0 | 15722931 | 0 | 90.39.81.150 | 6 | | |2009-05-09 13:42:06.278105 (79849,6) | 16406 | 0 | 0 | 0 | 15722932 | 0 | 78.150.166.244 | 5 | 940 | |2009-05-09 13:45:49.012712 (79857,39) | 16408 | 0 | 0 | 0 | 15722933 | 0 | 64.17.2.164 | 2 | 22435 | |2009-05-09 13:46:03.1664 but these do: (79870,13) | 16406 | 0 | 0 | 0 | 15722961 | 1 | 80.200.87.228 | 71 | 29005 | |2009-05-09 13:41:50.190479 (79870,14) | 16408 | 0 | 0 | 0 | 15722962 | 0 | 80.136.54.145 | 2 | 92765 | |2009-05-09 13:41:50.439442 This seems to be associated with some odd jumps in xid, time and ID; this set of rows is ordered by id: (79835,21) | 16394 | 0 | 0 | 0 | 15722927 | 0 | 85.225.253.43 | 1 | | |2009-05-09 13:39:43.943032 (79869,12) | 16396 | 0 | 0 | 0 | 15722928 | 0 | 66.235.124.132 | 4 | 1663 | |2009-05-09 13:40:37.048477 (79844,12) | 16398 | 0 | 0 | 0 | 15722929 | 0 | 64.17.2.164 | 2 | 22324 | |2009-05-09 13:40:58.781981 (79841,9) | 16399 | 0 | 0 | 0 | 15722930 | 0 | 78.137.163.133 | 2 | 4950 | |2009-05-09 13:41:32.126199 (79854,50) | 16401 | 0 | 0 | 0 | 15722931 | 0 | 90.39.81.150 | 6 | | |2009-05-09 13:42:06.278105 (79849,6) | 16406 | 0 | 0 | 0 | 15722932 | 0 | 78.150.166.244 | 5 | 940 | |2009-05-09 13:45:49.012712 (79857,39) | 16408 | 0 | 0 | 0 | 15722933 | 0 | 64.17.2.164 | 2 | 22435 | |2009-05-09 13:46:03.1664 (above 7 rows are not in index) (79870,3) | 16393 | 0 | 0 | 0 | 15722951 | 0 | 77.202.22.228 | 56 | 0 | |2009-05-09 13:39:48.460145 (79870,4) | 16394 | 0 | 0 | 0 | 15722952 | 0 | 77.202.22.228 | 3 | 37349 | |2009-05-09 13:40:09.031336 (79870,5) | 16395 | 0 | 0 | 0 | 15722953 | 0 | 77.202.22.228 | 56 | 0 | |2009-05-09 13:40:23.072695 (above 3 rows are) (full list at http://pastebin.com/m16600dc8 - that list is from a seqscan, so includes rows missing from the index) -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > This was first reported on IRC and I spent a little time working with > the OP trying to dig up info to suggest a cause; here is the relevent > data (all provided by the OP at my request). I have not been able to > suggest a possible cause based on this. > shs=# select ctid,xmin,xmax,cmin,cmax,* from log_event where id = 15723018; > ctid | xmin | xmax | cmin | cmax | id | user_id | ip | action_id | object1_id | object2_id | event_timestamp > ------------+-------+------+------+------+----------+---------+---------------+-----------+------------+------------+---------------------------- > (79844,13) | 16410 | 0 | 0 | 0 | 15723018 | 0 | 79.65.40.251 | 4 | 10145 | | 2009-05-0913:47:48.68558 > (79870,70) | 16479 | 0 | 0 | 0 | 15723018 | 1 | 80.200.87.228 | 1 | | | 2009-05-0913:47:33.441668 > (2 rows) > Notice that the two rows seem entirely independent (different xmin). > The OP stated that his app generally does single-row inserts (with > some exceptions not relevent here); however, we found a nearby row > which shares the xmin: > (79870,16) | 16410 | 0 | 0 | 0 | 15722964 | 1 | 80.200.87.228 | 3 | 720 | | 2009-05-09 13:41:53.099589 How is the timestamp column generated? I'm wondering what we can deduce from the fact that the timestamps are all different. It's evidently not now(). One thing that seems odd is that the xids are kinda small. Did the system just recently have a wraparound event? regards, tom lane
Andrew Gierth wrote: > >>>>> "Mathieu" == "Mathieu De Zutter" <mathieu@dezutter.org> writes: > > Mathieu> I have a table log_event with a primary key on an integer > Mathieu> 'id', called log_event_pkey. > > Mathieu> The tables contains a duplicate for id = 15723018. The > Mathieu> duplicate (note that besides the id, all data differs) > Mathieu> doesn't seem to be known by the index at all. > [snip] > > This was first reported on IRC and I spent a little time working with > the OP trying to dig up info to suggest a cause; here is the relevent > data (all provided by the OP at my request). I have not been able to > suggest a possible cause based on this. How are the event_timestamps generated? If these come from the transaction timestamp, it is even more weird that tuples seemingly inserted by the same xmin have differing timestamps. FWIW I think I've seen similar cases before. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> Notice that the two rows seem entirely independent (different >> xmin). The OP stated that his app generally does single-row >> inserts (with some exceptions not relevent here); however, we >> found a nearby row which shares the xmin: Tom> How is the timestamp column generated? I'm wondering what we Tom> can deduce from the fact that the timestamps are all different. Tom> It's evidently not now(). (answering this one since the OP has probably gone for the night) My understanding is that it is now(), but the OP should be able to give a definitive answer. (Yes, this does raise some questions about why it appears to have gone backwards at some points.) Tom> One thing that seems odd is that the xids are kinda small. Did Tom> the system just recently have a wraparound event? The system was recently dump/restored from a different box. The failing rows are all new inserts since the restore. -- Andrew (irc:RhodiumToad)
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > Tom> One thing that seems odd is that the xids are kinda small. Did > Tom> the system just recently have a wraparound event? > The system was recently dump/restored from a different box. The > failing rows are all new inserts since the restore. So the table has been insert-only so far? I was just thinking that HOT bugs seemed like a probable explanation, but that idea goes out the window if there have been no UPDATEs. regards, tom lane
Mathieu De Zutter wrote: > I have a table log_event with a primary key on an integer 'id', called > log_event_pkey. > > The tables contains a duplicate for id = 15723018. The duplicate (note that > besides the id, all data differs) doesn't seem to be known by the index at > all. What do the INSERT lines look like? Is it a trigger, an insert called directly by the application? How is the sequence involved -- lastval(), nextval(), does the code just leave the column out for the default to fire? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> The system was recently dump/restored from a different box. The >> failing rows are all new inserts since the restore. Tom> So the table has been insert-only so far? I was just thinking Tom> that HOT bugs seemed like a probable explanation, but that idea Tom> goes out the window if there have been no UPDATEs. No UPDATEs (and there are no HOT flags set on any tuple I looked at). There may have been DELETEs. -- Andrew (irc:RhodiumToad)
On Fri, Jul 10, 2009 at 2:02 AM, Alvaro Herrera <alvherre@commandprompt.com>wrote: > What do the INSERT lines look like? Is it a trigger, an insert called > directly by the application? How is the sequence involved -- lastval(), > nextval(), does the code just leave the column out for the default to fire? > > * This is the INSERT query, called from PHP/Apache INSERT INTO log_event (user_id, ip, action_id, object1_id, object2_id, event_timestamp) VALUES ($1, $2, $3, $4, $5, NOW()) So the timestamp is generated by NOW(), and the pkey is generated by using the default value (nextval). * There are no UPDATE queries. * There a few DELETE queries that are lauched daily (called from PHP/cron) DELETE FROM log_event WHERE action_id = $1 AND event_timestamp < NOW() - '1 month'::interval (some events are not worthy of archiving individually, we just keep totals) Regards, Mathieu
Mathieu De Zutter <mathieu@dezutter.org> writes: > On Fri, Jul 10, 2009 at 2:02 AM, Alvaro Herrera > <alvherre@commandprompt.com>wrote: >> What do the INSERT lines look like? > * This is the INSERT query, called from PHP/Apache > INSERT INTO log_event (user_id, ip, action_id, object1_id, object2_id, > event_timestamp) > VALUES ($1, $2, $3, $4, $5, NOW()) > So the timestamp is generated by NOW(), and the pkey is generated by using > the default value (nextval). The timestamps are invariably generated by NOW()? That seems impossible given that Andrew's report shows different timestamps for the same insert XID. But actually, you have to believe about six impossible things in order to credit that Andrew's report represents the behavior of a normally-functioning Postgres system. What I am thinking is that we are looking at the results of multiple disconnected backends hacking on the same database files --- ie, there's more than one active shared memory block. That would explain the apparent re-use of XIDs (they're being handed out separately in each shmem block) as well as disappearance of index entries (a write from either shmem block would wipe out recent changes in that page from the other one). This could be checked by looking at the output of "ipcs -m" (run this as root to be sure you get everything). regards, tom lane
On Fri, Jul 10, 2009 at 5:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > This could be checked by looking at the output of "ipcs -m" > (run this as root to be sure you get everything). > $ sudo ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x0052e2c1 1081344 postgres 600 30384128 21 0x00000000 425985 postgres 600 18440 34 0x00000000 458754 postgres 600 16384 34 0x00000000 491523 postgres 600 768 34 0x00000000 524292 postgres 600 12 34 Note that I stopped/started postgresql last night. Regards, Mathieu
Mathieu De Zutter <mathieu@dezutter.org> writes: > $ sudo ipcs -m > ------ Shared Memory Segments -------- > key shmid owner perms bytes nattch status > 0x0052e2c1 1081344 postgres 600 30384128 21 > 0x00000000 425985 postgres 600 18440 34 > 0x00000000 458754 postgres 600 16384 34 > 0x00000000 491523 postgres 600 768 34 > 0x00000000 524292 postgres 600 12 34 Hmm, I wonder what those last four are? Surely they are too small to be Postgres shmem segments, but if they're something else like X11 segments then why do we only see ones belong to user postgres? I think it's probably irrelevant to the problem, but just wondering ... > Note that I stopped/started postgresql last night. The evidence may be gone then. I still think that disconnected backends are the most plausible explanation, but you might have killed everything off. As far as anyone knows, it's impossible to get into the disconnected-backends state unless (a) you manually remove the postmaster.pid file that provides the interlock against it, or (b) you're trying to run multiple copies of Postgres on different machines that're sharing an NFS or SAN mount of the same database files. I assume you'd have mentioned it if you were trying (b), but I wonder whether you have an init script that takes it on itself to remove the postmaster.pid file? It's generally considered irresponsible for an init script to do that, but I'm not sure every distro has gotten the word. regards, tom lane
On Fri, Jul 10, 2009 at 6:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > As far as anyone knows, it's impossible to get into the > disconnected-backends state unless (a) you manually remove the > postmaster.pid file that provides the interlock against it, or > (b) you're trying to run multiple copies of Postgres on different > machines that're sharing an NFS or SAN mount of the same database files. > I assume you'd have mentioned it if you were trying (b), but I wonder > whether you have an init script that takes it on itself to remove the > postmaster.pid file? It's generally considered irresponsible for an > init script to do that, but I'm not sure every distro has gotten the word. > (a) I can't remember doing that, but who knows I got frustrated at some point in time and took a wrong shortcut. (b) I've only got local RAID storage. I've checked the debian init script, and the only thing related to the pid file I found was creating the underlying directory /var/run/postgresql Regards, Mathieu