Обсуждение: Long-running DELETE...WHERE...
Why would a delete involving a subselect run so much longer than the individual delete commands? My situation: table A has 200,000 rows. I've made up a temporary table which holds the single-valued primary key for 80,000 rows which I want to delete. DELETE FROM a WHERE id IN (select ID from tmp LIMIT 800); runs for several minutes. But if I do \o tmpfile \t SELECT 'DELETE FROM a WHERE id = ' || id || ';' from tmp limit 800; \o \i tmpfile this completes in about 15 seconds, or 1/50 of the time for the single-statement delete above. In trying to optimize this process, I disabled all the relational integrity triggers (foreign keys) involving the table, and then I dropped all the indexes EXCEPT that of the primary key. All the experiments were done within a single transaction using BEGIN. The database version is 7.1.3, and the table was vacuumed very recently.
jboes@nexcerpt.com (Jeff Boes) writes:
> My situation: table A has 200,000 rows. I've made up a temporary table
> which holds the single-valued primary key for 80,000 rows which I want
> to delete.
>   DELETE FROM a WHERE id IN (select ID from tmp LIMIT 800);
> runs for several minutes.
WHERE ... IN ... is notoriously inefficient.  I'd try
DELETE FROM aa WHERE id = tmp.id;
which is not standard SQL but should be able to produce a decent plan.
You might find that a VACUUM ANALYZE on both tables beforehand would be
a good idea, too; never-vacuumed temp tables have some default
statistics assumed that are a lot less than 80k rows.
        regards, tom lane
			
		Nice syntax. Could not find in doc. Do you have any ideas where I could find it? --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > jboes@nexcerpt.com (Jeff Boes) writes: > > My situation: table A has 200,000 rows. I've made up a temporary > table > > which holds the single-valued primary key for 80,000 rows which I > want > > to delete. > > DELETE FROM a WHERE id IN (select ID from tmp LIMIT 800); > > runs for several minutes. > > WHERE ... IN ... is notoriously inefficient. I'd try > > DELETE FROM aa WHERE id = tmp.id; > > which is not standard SQL but should be able to produce a decent > plan. > > You might find that a VACUUM ANALYZE on both tables beforehand would > be > a good idea, too; never-vacuumed temp tables have some default > statistics assumed that are a lot less than 80k rows. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly __________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/
At 08:21 PM 1/14/02 -0500, Tom Lane wrote: >jboes@nexcerpt.com (Jeff Boes) writes: >> My situation: table A has 200,000 rows. I've made up a temporary table >> which holds the single-valued primary key for 80,000 rows which I want >> to delete. >> DELETE FROM a WHERE id IN (select ID from tmp LIMIT 800); >> runs for several minutes. > >WHERE ... IN ... is notoriously inefficient. I'd try > > DELETE FROM aa WHERE id = tmp.id; > >which is not standard SQL but should be able to produce a decent plan. Nice alternative. Is there an alternate format for this one: DELETE FROM teamwork WHERE emp NOT IN ( SELECT DISTINCT emp FROM timesheet WHERE lo_shift > (now()-'90days'::interval)) Frank
Jeff Boes wrote: > Why would a delete involving a subselect run so much longer than the > individual delete commands? Some SQL statements are faster than others. Try to rewrite your query to use EXISTS instead of IN and see if it makes any difference. I would recommend using EXPLAIN to find out what PostgreSQL is really doing and optimize from there. jochem