Обсуждение: High cost of ... where ... not in (select ...)
I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); QUERY PLAN ------------------------------------------------------------------------------------ Seq Scan on pcap_store (cost=4008.22..348521303.54 rows=106532 width=6) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=4008.22..6765.98 rows=205475 width=4) -> Seq Scan on pcap_store_log (cost=0.00..3099.75 rows=205475 width=4) (5 rows) musecurity=# \d muapp.pcap_store Table "muapp.pcap_store" Column | Type | Modifiers -------------------+------------------------+------------------------------------------------------------------------- pcap_storeid | integer | not null default nextval('muapp.pcap_store_pcap_storeid_seq'::regclass) filename | character varying(255) | test_run_dutid | integer | default 0 userid | integer | not null default 0 analysis_recordid | bigint | io_xml | character varying(255) | Indexes: "pcap_store_pkey" PRIMARY KEY, btree (pcap_storeid) Foreign-key constraints: "pcap_store_analysis_recordid_fkey" FOREIGN KEY (analysis_recordid) REFERENCES muapp.analysis(recordid) ON DELETE CASCADE "pcap_store_test_run_dutid_fkey" FOREIGN KEY (test_run_dutid) REFERENCES muapp.test_run_dut(test_run_dutid) ON DELETE CASCADE "pcap_store_userid_fkey" FOREIGN KEY (userid) REFERENCES mucore."user"(recordid) ON DELETE CASCADE As you see, the sequence scan on pcap_store is killing me, even though there appears to be a perfectly good index. Is there a better way construct this query? Thanks, Aaron -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
Aaron Turner escribió: > I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): > > musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid > NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herrera<alvherre@commandprompt.com> wrote: > Aaron Turner escribió: >> I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): >> >> musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid >> NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); > > What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turner<synfinatic@gmail.com> wrote: > On Tue, Jun 16, 2009 at 2:37 PM, Alvaro > Herrera<alvherre@commandprompt.com> wrote: >> Aaron Turner escribió: >>> I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): >>> >>> musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid >>> NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); >> >> What PG version is this? > > Doh, just realized I didn't reply back to list. It's version 8.3.3. > > Also, pcap_storeid is unique in pcap_store_log Speaking as one who has dealt with this frustration more than once, you can typically get better performance with something like: DELETE FROM muapp.pcap_store AS x FROM muapp.pcap_store a LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL This is emphatically lame, but there you have it. It's first of all lame that we can't do a better job optimizing NOT-IN, at least when the expression within the subselect is known to be not-null, and it's secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN without a self-JOIN. </rant> ...Robert
On Tue, Jun 16, 2009 at 5:30 PM, Robert Haas<robertmhaas@gmail.com> wrote: > On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turner<synfinatic@gmail.com> wrote: >> On Tue, Jun 16, 2009 at 2:37 PM, Alvaro >> Herrera<alvherre@commandprompt.com> wrote: >>> Aaron Turner escribió: >>>> I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): >>>> >>>> musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid >>>> NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); >>> >>> What PG version is this? >> >> Doh, just realized I didn't reply back to list. It's version 8.3.3. >> >> Also, pcap_storeid is unique in pcap_store_log > > Speaking as one who has dealt with this frustration more than once, > you can typically get better performance with something like: > > DELETE FROM muapp.pcap_store AS x > FROM muapp.pcap_store a > LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid > WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL That's a syntax error on 8.3.3... I don't see anywhere in the docs where the delete command allows for multiple FROM statements. Perhaps you meant: DELETE FROM muapp.pcap_store AS x USING muapp.pcap_store AS a LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL; Is that right? > This is emphatically lame, but there you have it. It's first of all > lame that we can't do a better job optimizing NOT-IN, at least when > the expression within the subselect is known to be not-null, and it's > secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN > without a self-JOIN. Wow, glad I asked... I never would of figured that out. -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turner<synfinatic@gmail.com> wrote: > On Tue, Jun 16, 2009 at 5:30 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turner<synfinatic@gmail.com> wrote: >>> On Tue, Jun 16, 2009 at 2:37 PM, Alvaro >>> Herrera<alvherre@commandprompt.com> wrote: >>>> Aaron Turner escribió: >>>>> I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): >>>>> >>>>> musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid >>>>> NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); >>>> >>>> What PG version is this? >>> >>> Doh, just realized I didn't reply back to list. It's version 8.3.3. >>> >>> Also, pcap_storeid is unique in pcap_store_log >> >> Speaking as one who has dealt with this frustration more than once, >> you can typically get better performance with something like: >> >> DELETE FROM muapp.pcap_store AS x >> FROM muapp.pcap_store a >> LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid >> WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL > > That's a syntax error on 8.3.3... I don't see anywhere in the docs > where the delete command allows for multiple FROM statements. Perhaps > you meant: > > DELETE FROM muapp.pcap_store AS x > USING muapp.pcap_store AS a > LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = > b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND > b.pcap_storeid IS NULL; > > Is that right? Woops, yes, I think that's it. (but I don't guarantee that it won't blow up your entire universe, so test it carefully first) ...Robert
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haas<robertmhaas@gmail.com> wrote: > On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turner<synfinatic@gmail.com> wrote: >> DELETE FROM muapp.pcap_store AS x >> USING muapp.pcap_store AS a >> LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = >> b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND >> b.pcap_storeid IS NULL; >> >> Is that right? > > Woops, yes, I think that's it. > > (but I don't guarantee that it won't blow up your entire universe, so > test it carefully first) Yeah, doing that now... taking a bit longer then I expected (took ~5min on rather slow hardware- everything is on a pair of 10K RAID1 drives), but the result seems correct. QUERY PLAN ------------------------------------------------------------------------------------------------ Hash Join (cost=19229.08..29478.99 rows=106492 width=6) Hash Cond: (x.pcap_storeid = a.pcap_storeid) -> Seq Scan on pcap_store x (cost=0.00..5617.84 rows=212984 width=10) -> Hash (cost=17533.93..17533.93 rows=106492 width=4) -> Hash Left Join (cost=6371.19..17533.93 rows=106492 width=4) Hash Cond: (a.pcap_storeid = b.pcap_storeid) Filter: (b.pcap_storeid IS NULL) -> Seq Scan on pcap_store a (cost=0.00..5617.84 rows=212984 width=4) -> Hash (cost=3099.75..3099.75 rows=205475 width=4) -> Seq Scan on pcap_store_log b (cost=0.00..3099.75 rows=205475 width=4) I know the costs are just relative, but I assumed cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy hardware. Honestly, not complaining, 5 minutes is acceptable for this query (it's a one time thing) just surprised is all. Thanks for the help! -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haas<robertmhaas@gmail.com> wrote: > On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turner<synfinatic@gmail.com> wrote: >> DELETE FROM muapp.pcap_store AS x >> USING muapp.pcap_store AS a >> LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = >> b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND >> b.pcap_storeid IS NULL; >> >> Is that right? > > Woops, yes, I think that's it. > > (but I don't guarantee that it won't blow up your entire universe, so > test it carefully first) Yeah, doing that now... taking a bit longer then I expected (took ~5min on rather slow hardware- everything is on a pair of 10K RAID1 drives), but the result seems correct. QUERY PLAN ------------------------------------------------------------------------------------------------ Hash Join (cost=19229.08..29478.99 rows=106492 width=6) Hash Cond: (x.pcap_storeid = a.pcap_storeid) -> Seq Scan on pcap_store x (cost=0.00..5617.84 rows=212984 width=10) -> Hash (cost=17533.93..17533.93 rows=106492 width=4) -> Hash Left Join (cost=6371.19..17533.93 rows=106492 width=4) Hash Cond: (a.pcap_storeid = b.pcap_storeid) Filter: (b.pcap_storeid IS NULL) -> Seq Scan on pcap_store a (cost=0.00..5617.84 rows=212984 width=4) -> Hash (cost=3099.75..3099.75 rows=205475 width=4) -> Seq Scan on pcap_store_log b (cost=0.00..3099.75 rows=205475 width=4) I know the costs are just relative, but I assumed cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy hardware. Honestly, not complaining, 5 minutes is acceptable for this query (it's a one time thing) just surprised is all. Thanks for the help! -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
Aaron Turner <synfinatic@gmail.com> writes: > I know the costs are just relative, but I assumed > cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy > hardware. Very likely the bulk of the time is spent in the DELETE work proper, not in the query to find the rows to be deleted. In particular I wonder if you have an unindexed foreign key referencing this table ... regards, tom lane