Обсуждение: High cost of ... where ... not in (select ...)
I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze):
musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);
QUERY PLAN
------------------------------------------------------------------------------------
Seq Scan on pcap_store (cost=4008.22..348521303.54 rows=106532 width=6)
Filter: (NOT (subplan))
SubPlan
-> Materialize (cost=4008.22..6765.98 rows=205475 width=4)
-> Seq Scan on pcap_store_log (cost=0.00..3099.75
rows=205475 width=4)
(5 rows)
musecurity=# \d muapp.pcap_store
Table "muapp.pcap_store"
Column | Type |
Modifiers
-------------------+------------------------+-------------------------------------------------------------------------
pcap_storeid | integer | not null default
nextval('muapp.pcap_store_pcap_storeid_seq'::regclass)
filename | character varying(255) |
test_run_dutid | integer | default 0
userid | integer | not null default 0
analysis_recordid | bigint |
io_xml | character varying(255) |
Indexes:
"pcap_store_pkey" PRIMARY KEY, btree (pcap_storeid)
Foreign-key constraints:
"pcap_store_analysis_recordid_fkey" FOREIGN KEY
(analysis_recordid) REFERENCES muapp.analysis(recordid) ON DELETE
CASCADE
"pcap_store_test_run_dutid_fkey" FOREIGN KEY (test_run_dutid)
REFERENCES muapp.test_run_dut(test_run_dutid) ON DELETE CASCADE
"pcap_store_userid_fkey" FOREIGN KEY (userid) REFERENCES
mucore."user"(recordid) ON DELETE CASCADE
As you see, the sequence scan on pcap_store is killing me, even though
there appears to be a perfectly good index. Is there a better way
construct this query?
Thanks,
Aaron
--
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
Aaron Turner escribió: > I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): > > musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid > NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); What PG version is this? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Tue, Jun 16, 2009 at 2:37 PM, Alvaro Herrera<alvherre@commandprompt.com> wrote: > Aaron Turner escribió: >> I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): >> >> musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid >> NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); > > What PG version is this? Doh, just realized I didn't reply back to list. It's version 8.3.3. Also, pcap_storeid is unique in pcap_store_log -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turner<synfinatic@gmail.com> wrote: > On Tue, Jun 16, 2009 at 2:37 PM, Alvaro > Herrera<alvherre@commandprompt.com> wrote: >> Aaron Turner escribió: >>> I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): >>> >>> musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid >>> NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); >> >> What PG version is this? > > Doh, just realized I didn't reply back to list. It's version 8.3.3. > > Also, pcap_storeid is unique in pcap_store_log Speaking as one who has dealt with this frustration more than once, you can typically get better performance with something like: DELETE FROM muapp.pcap_store AS x FROM muapp.pcap_store a LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL This is emphatically lame, but there you have it. It's first of all lame that we can't do a better job optimizing NOT-IN, at least when the expression within the subselect is known to be not-null, and it's secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN without a self-JOIN. </rant> ...Robert
On Tue, Jun 16, 2009 at 5:30 PM, Robert Haas<robertmhaas@gmail.com> wrote:
> On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turner<synfinatic@gmail.com> wrote:
>> On Tue, Jun 16, 2009 at 2:37 PM, Alvaro
>> Herrera<alvherre@commandprompt.com> wrote:
>>> Aaron Turner escribió:
>>>> I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze):
>>>>
>>>> musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid
>>>> NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log);
>>>
>>> What PG version is this?
>>
>> Doh, just realized I didn't reply back to list. It's version 8.3.3.
>>
>> Also, pcap_storeid is unique in pcap_store_log
>
> Speaking as one who has dealt with this frustration more than once,
> you can typically get better performance with something like:
>
> DELETE FROM muapp.pcap_store AS x
> FROM muapp.pcap_store a
> LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid
> WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL
That's a syntax error on 8.3.3... I don't see anywhere in the docs
where the delete command allows for multiple FROM statements. Perhaps
you meant:
DELETE FROM muapp.pcap_store AS x
USING muapp.pcap_store AS a
LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
b.pcap_storeid IS NULL;
Is that right?
> This is emphatically lame, but there you have it. It's first of all
> lame that we can't do a better job optimizing NOT-IN, at least when
> the expression within the subselect is known to be not-null, and it's
> secondly lame that the syntax of DELETE doesn't permit a LEFT JOIN
> without a self-JOIN.
Wow, glad I asked... I never would of figured that out.
--
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turner<synfinatic@gmail.com> wrote: > On Tue, Jun 16, 2009 at 5:30 PM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Tue, Jun 16, 2009 at 7:39 PM, Aaron Turner<synfinatic@gmail.com> wrote: >>> On Tue, Jun 16, 2009 at 2:37 PM, Alvaro >>> Herrera<alvherre@commandprompt.com> wrote: >>>> Aaron Turner escribió: >>>>> I'm trying to figure out how to optimize this query (yes, I ran vacuum/analyze): >>>>> >>>>> musecurity=# explain DELETE FROM muapp.pcap_store WHERE pcap_storeid >>>>> NOT IN (SELECT pcap_storeid FROM muapp.pcap_store_log); >>>> >>>> What PG version is this? >>> >>> Doh, just realized I didn't reply back to list. It's version 8.3.3. >>> >>> Also, pcap_storeid is unique in pcap_store_log >> >> Speaking as one who has dealt with this frustration more than once, >> you can typically get better performance with something like: >> >> DELETE FROM muapp.pcap_store AS x >> FROM muapp.pcap_store a >> LEFT JOIN muapp.pcap_store_log b ON a.pcap_store_id = b.pcap_storeid >> WHERE x.pcap_storeid = a.pcap_storeid AND b.pcap_storeid IS NULL > > That's a syntax error on 8.3.3... I don't see anywhere in the docs > where the delete command allows for multiple FROM statements. Perhaps > you meant: > > DELETE FROM muapp.pcap_store AS x > USING muapp.pcap_store AS a > LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid = > b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND > b.pcap_storeid IS NULL; > > Is that right? Woops, yes, I think that's it. (but I don't guarantee that it won't blow up your entire universe, so test it carefully first) ...Robert
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haas<robertmhaas@gmail.com> wrote:
> On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turner<synfinatic@gmail.com> wrote:
>> DELETE FROM muapp.pcap_store AS x
>> USING muapp.pcap_store AS a
>> LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
>> b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
>> b.pcap_storeid IS NULL;
>>
>> Is that right?
>
> Woops, yes, I think that's it.
>
> (but I don't guarantee that it won't blow up your entire universe, so
> test it carefully first)
Yeah, doing that now... taking a bit longer then I expected (took
~5min on rather slow hardware- everything is on a pair of 10K RAID1
drives), but the result seems correct.
QUERY PLAN
------------------------------------------------------------------------------------------------
Hash Join (cost=19229.08..29478.99 rows=106492 width=6)
Hash Cond: (x.pcap_storeid = a.pcap_storeid)
-> Seq Scan on pcap_store x (cost=0.00..5617.84 rows=212984 width=10)
-> Hash (cost=17533.93..17533.93 rows=106492 width=4)
-> Hash Left Join (cost=6371.19..17533.93 rows=106492 width=4)
Hash Cond: (a.pcap_storeid = b.pcap_storeid)
Filter: (b.pcap_storeid IS NULL)
-> Seq Scan on pcap_store a (cost=0.00..5617.84
rows=212984 width=4)
-> Hash (cost=3099.75..3099.75 rows=205475 width=4)
-> Seq Scan on pcap_store_log b
(cost=0.00..3099.75 rows=205475 width=4)
I know the costs are just relative, but I assumed
cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy
hardware. Honestly, not complaining, 5 minutes is acceptable for this
query (it's a one time thing) just surprised is all.
Thanks for the help!
--
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
On Tue, Jun 16, 2009 at 6:36 PM, Robert Haas<robertmhaas@gmail.com> wrote:
> On Tue, Jun 16, 2009 at 9:23 PM, Aaron Turner<synfinatic@gmail.com> wrote:
>> DELETE FROM muapp.pcap_store AS x
>> USING muapp.pcap_store AS a
>> LEFT JOIN muapp.pcap_store_log b ON a.pcap_storeid =
>> b.pcap_storeid WHERE x.pcap_storeid = a.pcap_storeid AND
>> b.pcap_storeid IS NULL;
>>
>> Is that right?
>
> Woops, yes, I think that's it.
>
> (but I don't guarantee that it won't blow up your entire universe, so
> test it carefully first)
Yeah, doing that now... taking a bit longer then I expected (took
~5min on rather slow hardware- everything is on a pair of 10K RAID1
drives), but the result seems correct.
QUERY PLAN
------------------------------------------------------------------------------------------------
Hash Join (cost=19229.08..29478.99 rows=106492 width=6)
Hash Cond: (x.pcap_storeid = a.pcap_storeid)
-> Seq Scan on pcap_store x (cost=0.00..5617.84 rows=212984 width=10)
-> Hash (cost=17533.93..17533.93 rows=106492 width=4)
-> Hash Left Join (cost=6371.19..17533.93 rows=106492 width=4)
Hash Cond: (a.pcap_storeid = b.pcap_storeid)
Filter: (b.pcap_storeid IS NULL)
-> Seq Scan on pcap_store a (cost=0.00..5617.84
rows=212984 width=4)
-> Hash (cost=3099.75..3099.75 rows=205475 width=4)
-> Seq Scan on pcap_store_log b
(cost=0.00..3099.75 rows=205475 width=4)
I know the costs are just relative, but I assumed
cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy
hardware. Honestly, not complaining, 5 minutes is acceptable for this
query (it's a one time thing) just surprised is all.
Thanks for the help!
--
Aaron Turner
http://synfin.net/
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
-- Benjamin Franklin
Aaron Turner <synfinatic@gmail.com> writes:
> I know the costs are just relative, but I assumed
> cost=19229.08..29478.99 isn't 5 minutes of effort even on crappy
> hardware.
Very likely the bulk of the time is spent in the DELETE work proper,
not in the query to find the rows to be deleted. In particular I wonder
if you have an unindexed foreign key referencing this table ...
regards, tom lane