Обсуждение: delete and select with IN clause issues
I'm having problem with a cleanup script that runs nightly. The script calls the following query: delete from visit where id not in (select distinct visit_id from page_view); This yields the following error: ERROR: update or delete on "visit" violates foreign key constraint "fk34afd255fbacabec" on "page_view" DETAIL: Key (id)=(38635629) is still referenced from table "page_view". Ok, that seems fine, but when I do a select instead of delete, I do not find the referenced id in my list: select id from visit where id not in (select distinct visit_id from page_view) and id = 38635629 ; id ---- (0 rows) Also, if I don't specify the id = bit at the end, I still don't find it in the output when I search through with less, so why is it trying to delete that row? Rewriting the query like so, yields the same problem: delete from visit where NOT EXISTS ( select * from page_view WHERE visit.id = page_view.visit_id); ERROR: update or delete on "visit" violates foreign key constraint "fk34afd255fbacabec" on "page_view" DETAIL: Key (id)=(38638264) is still referenced from table "page_view". The plan looks like this: explain analyze delete from visit where id not in (select distinct visit_id from page_view); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- SeqScan on visit (cost=165027.49..189106.89 rows=211976 width=6) (actual time=4789.595..5330.367 rows=150677 loops=1) Filter: (NOT (hashed subplan)) SubPlan -> Unique (cost=0.00..165017.77 rows=3889 width=8) (actual time=2.717..4388.928rows=273285 loops=1) -> Index Scan using page_view_visit_idx on page_view (cost=0.00..159625.41rows=2156946 width=8) (actual time=2.713..2914.944 rows=2249576 loops=1) Trigger for constraint fk34afd255fbacabec:time=7174.540 calls=150677 Total runtime: 32772.345 ms (7 rows) \d page_view Table "public.page_view" Column | Type | Modifiers ----------------+-----------------------------+--------------------- id | bigint | not nullvisit_id | bigint | not null uri | character varying(255) | params | text | stamp | timestamp without time zone | cindex | integer | not null default -1 tindex | integer | not null default -1 method | character varying(7) | not null source_address | character varying(16) | server_name | character varying(255) | Indexes: "page_view_pkey" PRIMARY KEY, btree (id) "page_view_stamp_idx" btree (stamp) "page_view_uri_idx" btree(uri) "page_view_visit_idx" btree (visit_id) Foreign-key constraints: "fk34afd255fbacabec" FOREIGN KEY (visit_id) REFERENCES visit(id) What kind of silliness am I forgetting? -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Thu, 2 Nov 2006, Jeff Frost wrote: > I'm having problem with a cleanup script that runs nightly. The script calls > the following query: > > delete from visit where id not in (select distinct visit_id from page_view); > > This yields the following error: > > ERROR: update or delete on "visit" violates foreign key constraint > "fk34afd255fbacabec" on "page_view" > DETAIL: Key (id)=(38635629) is still referenced from table "page_view". > > Ok, that seems fine, but when I do a select instead of delete, I do not find > the referenced id in my list: > > select id from visit where id not in (select distinct visit_id from > page_view) and id = 38635629 ; > > id > ---- > (0 rows) > > Also, if I don't specify the id = bit at the end, I still don't find it in > the output when I search through with less, so why is it trying to delete > that row? > > Rewriting the query like so, yields the same problem: > > delete from visit where NOT EXISTS ( select * from page_view WHERE visit.id = > page_view.visit_id); > > ERROR: update or delete on "visit" violates foreign key constraint > "fk34afd255fbacabec" on "page_view" > DETAIL: Key (id)=(38638264) is still referenced from table "page_view". > > The plan looks like this: > > explain analyze delete from visit where id not in (select distinct visit_id > from page_view); > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------- > Seq Scan on visit (cost=165027.49..189106.89 rows=211976 width=6) (actual > time=4789.595..5330.367 rows=150677 loops=1) > Filter: (NOT (hashed subplan)) > SubPlan > -> Unique (cost=0.00..165017.77 rows=3889 width=8) (actual > time=2.717..4388.928 rows=273285 loops=1) > -> Index Scan using page_view_visit_idx on page_view > (cost=0.00..159625.41 rows=2156946 width=8) (actual time=2.713..2914.944 > rows=2249576 loops=1) > Trigger for constraint fk34afd255fbacabec: time=7174.540 calls=150677 > Total runtime: 32772.345 ms > (7 rows) > > \d page_view > Table "public.page_view" > Column | Type | Modifiers > ----------------+-----------------------------+--------------------- > id | bigint | not null > visit_id | bigint | not null > uri | character varying(255) | > params | text | > stamp | timestamp without time zone | > cindex | integer | not null default -1 > tindex | integer | not null default -1 > method | character varying(7) | not null > source_address | character varying(16) | > server_name | character varying(255) | > Indexes: > "page_view_pkey" PRIMARY KEY, btree (id) > "page_view_stamp_idx" btree (stamp) > "page_view_uri_idx" btree (uri) > "page_view_visit_idx" btree (visit_id) > Foreign-key constraints: > "fk34afd255fbacabec" FOREIGN KEY (visit_id) REFERENCES visit(id) > > What kind of silliness am I forgetting? > > Almost forgot: server_version ---------------- 8.1.4 (1 row) -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost <jeff@frostconsultingllc.com> writes: > delete from visit where id not in (select distinct visit_id from page_view); > This yields the following error: > ERROR: update or delete on "visit" violates foreign key constraint > "fk34afd255fbacabec" on "page_view" > DETAIL: Key (id)=(38635629) is still referenced from table "page_view". This seems pretty darn weird. I am wondering about corrupt indexes --- can you find the indicated key in either table if you set enable_indexscan and enable_bitmapscan to 0? Also, this is a long shot, but does visit by any chance have a cascading deletion self-reference? regards, tom lane
On Thu, 2 Nov 2006, Tom Lane wrote: > Jeff Frost <jeff@frostconsultingllc.com> writes: >> delete from visit where id not in (select distinct visit_id from page_view); > >> This yields the following error: > >> ERROR: update or delete on "visit" violates foreign key constraint >> "fk34afd255fbacabec" on "page_view" >> DETAIL: Key (id)=(38635629) is still referenced from table "page_view". > > This seems pretty darn weird. I am wondering about corrupt indexes --- > can you find the indicated key in either table if you set > enable_indexscan and enable_bitmapscan to 0? test_tracking=# begin; BEGIN test_tracking=# set enable_seqscan TO false; SET test_tracking=# set enable_bitmapscan to false; SET test_tracking=# delete from visit where id not in (select distinct visit_id from test_tracking(# page_view); DELETE 150660 test_tracking=# ROLLBACK ; So, it seems everything worked fine with those two set to false. with seqscan enabled, it fails: test_tracking=# begin; BEGIN test_tracking=# set enable_bitmapscan to false; SET test_tracking=# show enable_seqscan ; enable_seqscan ---------------- on (1 row) test_tracking=# delete from visit where id not in (select distinct visit_id from page_view ); ERROR: update or delete on "visit" violates foreign key constraint "fk34afd255fbacabec" on "page_view" DETAIL: Key (id)=(38710245) is still referenced from table "page_view". Looks like with just enable_seqscan disabled it works: test_tracking=# begin; BEGIN test_tracking=# set enable_seqscan to false; SET test_tracking=# show enable_bitmapscan ; enable_bitmapscan ------------------- on (1 row) test_tracking=# delete from visit where id not in (select distinct visit_id from page_view ); DELETE 150661 test_tracking=# ROLLBACK ; ROLLBACK However, this doesn't yield anything: select id from visit where id not in (select distinct visit_id from page_view) and id = 38710245 ; id ---- (0 rows) with them set to true or false. > > Also, this is a long shot, but does visit by any chance have a cascading > deletion self-reference? Nope, I guess I didn't \d visit..thought I did...it's below. I'm going to guess that there is index corruption and a reindex page_view will fix it. Do you want me to gather any information in case this is a reproducible bug before I issue the reindex? Table "public.visit" Column | Type | Modifiers -------------+-----------------------------+------------------------ id | bigint | not nullvisitor_id | bigint | not null campaign_id | bigint | session_id | charactervarying(32) | not null uri | character varying(2000) | referer | character varying(2000) | user_agent | character varying(2000) | remote_host | bigint | not null outcome | character(1) | not null stamp | timestamp without time zone | email_key | character varying(16) | bot | boolean | not null default false status | character(1) | not null Indexes: "visit_pkey" PRIMARY KEY, btree (id) "visit_un" UNIQUE, btree (session_id) "visit_bot_idx" btree (bot) "visit_remote_host_ix" btree (remote_host) "visit_stamp_ix" btree (stamp) "visit_visitor_ix" btree (visitor_id) Foreign-key constraints: "fk6b04d4bbfc2fa3d" FOREIGN KEY (remote_host) REFERENCES remote_host(id) "fk6b04d4be5dc468"FOREIGN KEY (campaign_id) REFERENCES campaign(id) -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost <jeff@frostconsultingllc.com> writes: > On Thu, 2 Nov 2006, Tom Lane wrote: >> This seems pretty darn weird. I am wondering about corrupt indexes --- >> can you find the indicated key in either table if you set >> enable_indexscan and enable_bitmapscan to 0? > test_tracking=# begin; > BEGIN > test_tracking=# set enable_seqscan TO false; > SET > test_tracking=# set enable_bitmapscan to false; > SET That's not actually the test case I asked for ... > Nope, I guess I didn't \d visit..thought I did...it's below. I'm going to > guess that there is index corruption and a reindex page_view will fix it. Do > you want me to gather any information in case this is a reproducible bug > before I issue the reindex? I'm thinking the same --- please save a physical copy of page_view_visit_idx before you reindex it, and if that fixes the problem, please send me copies of both the before and after states of the index. regards, tom lane
On Fri, 3 Nov 2006, Tom Lane wrote: > Jeff Frost <jeff@frostconsultingllc.com> writes: >> On Thu, 2 Nov 2006, Tom Lane wrote: >>> This seems pretty darn weird. I am wondering about corrupt indexes --- >>> can you find the indicated key in either table if you set >>> enable_indexscan and enable_bitmapscan to 0? > > That's not actually the test case I asked for ... hahah, I wondered why you wanted me to disable seqscan...turns out you didn't! I just misread it. :-) > >> Nope, I guess I didn't \d visit..thought I did...it's below. I'm going to >> guess that there is index corruption and a reindex page_view will fix it. Do >> you want me to gather any information in case this is a reproducible bug >> before I issue the reindex? > > I'm thinking the same --- please save a physical copy of > page_view_visit_idx before you reindex it, and if that fixes the > problem, please send me copies of both the before and after states > of the index. Well, interestingly, it seems to work now for some reason even though the cleanup script has failed three nights in a row and I haven't reindexed yet. Seems quite strange. I guess now I'll have to wait and see if it fails again tonight. BTW, just to make sure I get the right file to ship over if we have this again, it would be: /var/lib/pgsql/data/base/9366228/16204210 yes? select * from pg_stat_user_indexes where indexrelname = 'page_view_visit_idx'; relid | indexrelid | schemaname | relname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch ---------+------------+------------+-----------+---------------------+----------+--------------+--------------- 9366257 | 16204210 | public | page_view | page_view_visit_idx | 5652735 | 1540722403 | 1524420263 (1 row) -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost <jeff@frostconsultingllc.com> writes: > BTW, just to make sure I get the right file to ship over if we have this > again, it would be: /var/lib/pgsql/data/base/9366228/16204210 yes? Not necessarily --- the filename is initially the same as the index OID, but it wouldn't be anymore after a REINDEX. Instead use what you get from select relfilenode from pg_class where relname = 'page_view_visit_idx'; and note you will need to recheck it after reindexing. Also, it's a good idea to do CHECKPOINT right before copying the physical file, to make sure you've got a consistent file image on-disk. If you want to be 100% certain, shut down the postmaster while copying, but unless the index file is pretty large I think that's not necessary. regards, tom lane
On Fri, 3 Nov 2006, Tom Lane wrote: > Jeff Frost <jeff@frostconsultingllc.com> writes: >> BTW, just to make sure I get the right file to ship over if we have this >> again, it would be: /var/lib/pgsql/data/base/9366228/16204210 yes? > > Not necessarily --- the filename is initially the same as the index OID, > but it wouldn't be anymore after a REINDEX. Instead use what you get from > > select relfilenode from pg_class where relname = 'page_view_visit_idx'; > > and note you will need to recheck it after reindexing. > > Also, it's a good idea to do CHECKPOINT right before copying the > physical file, to make sure you've got a consistent file image on-disk. > If you want to be 100% certain, shut down the postmaster while copying, > but unless the index file is pretty large I think that's not necessary. Well, I spoke to soon on the it all works front. So, it's been reindexed and appears to be working properly now. I guess I'll keep an eye on it for a while. I didn't get your query suggestion in time, so hopefully I grabbed the right binary file..though it did seem to disappear after the reindex, so I think it's likely the correct one. Definitely got the correct second one. You can grab them here: http://www.frostconsultingllc.com/fortom.tar.gz And thanks again for the help! -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost <jeff@frostconsultingllc.com> writes: > Well, I spoke to soon on the it all works front. So, it's been > reindexed and appears to be working properly now. I guess I'll keep > an eye on it for a while. I didn't get your query suggestion in time, > so hopefully I grabbed the right binary file..though it did seem to > disappear after the reindex, so I think it's likely the correct one. > Definitely got the correct second one. Well, I can't find anything wrong :-(. There are some differences in the list of contained keys, but they're all up near the end of the range, which is consistent with the assumption that the table is live and had some changes between your two dumps of the index. In particular, there's no difference in the entries for the troublesome key value: 38635629 24080 25 38635629 24080 26 38635629 24080 27 So I dunno what to make of it. If it happens again, we need to look more closely. regards, tom lane
On Fri, 3 Nov 2006, Tom Lane wrote: > Jeff Frost <jeff@frostconsultingllc.com> writes: >> Well, I spoke to soon on the it all works front. So, it's been >> reindexed and appears to be working properly now. I guess I'll keep >> an eye on it for a while. I didn't get your query suggestion in time, >> so hopefully I grabbed the right binary file..though it did seem to >> disappear after the reindex, so I think it's likely the correct one. >> Definitely got the correct second one. > > Well, I can't find anything wrong :-(. There are some differences in > the list of contained keys, but they're all up near the end of the > range, which is consistent with the assumption that the table is live > and had some changes between your two dumps of the index. In > particular, there's no difference in the entries for the troublesome > key value: > > 38635629 24080 25 > 38635629 24080 26 > 38635629 24080 27 > > So I dunno what to make of it. If it happens again, we need to look > more closely. Tom, I know we shouldn't have to REINDEX in the 8.1.x days. Do you have any idea what might have allowed this to happen? A while back this particular server was unable to send e-mail and so we weren't getting the vacuum verbose output. As a consequence the FSM settings were too low. That has been remedied, but I'm wondering if it's possible that the FSM settings being too low would allow the INDEX to somehow get damaged? -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Fri, 3 Nov 2006, Tom Lane wrote: > Well, I can't find anything wrong :-(. There are some differences in > the list of contained keys, but they're all up near the end of the > range, which is consistent with the assumption that the table is live > and had some changes between your two dumps of the index. In > particular, there's no difference in the entries for the troublesome > key value: > > 38635629 24080 25 > 38635629 24080 26 > 38635629 24080 27 > > So I dunno what to make of it. If it happens again, we need to look > more closely. Well, it's been working wonderfully since the REINDEX, so I don't know what to say. Any idea if having a too small max_fsm_pages could hose an index, because I know that happened not too long before we started seeing this problem. The fsm settings were increased prior to the problem occurring, but it's possible the index was already damaged? -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Tue, 7 Nov 2006, Jeff Frost wrote: > Well, it's been working wonderfully since the REINDEX, so I don't know what > to say. Any idea if having a too small max_fsm_pages could hose an index, > because I know that happened not too long before we started seeing this > problem. The fsm settings were increased prior to the problem occurring, but > it's possible the index was already damaged? Well, once again I speak too soon: ERROR: update or delete on "visit" violates foreign key constraint "fk34afd255fbacabec" on "page_view" DETAIL: Key (id)=(40276078) is still referenced from table "page_view". and again, it's in both tables, so it shouldn't be trying to remove it from visit: engage_tracking=# select id from visit where id = 40276078; id ---------- 40276078 (1 row) engage_tracking=# select visit_id from page_view where visit_id = 40276078; visit_id ---------- 40276078 Reminder: the delete query is: delete from visit where id not in (select distinct visit_id from page_view); and if I do this: select id from visit where id not in (select distinct visit_id from page_view); I get a result set without that id in it. And now after looking at the data, I realize what's going on. I believe I have a race condition in which the visit entry is created before the select begins, but the page_view entry has not yet been created. I then try to delete the visit entry after the page_view row is created and then my fkey stops me. I added a WHERE clause in my script that looks like this: DELETE FROM visit WHERE stamp < now() - INTERVAL '30 days' AND id NOT IN (SELECT DISTINCT visit_id FROM page_view); I think we can put this one to bed and sorry for wasting everyone's cycles. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954