Обсуждение: Long-running DELETE...WHERE...

Поиск
Список
Период
Сортировка

Long-running DELETE...WHERE...

От
jboes@nexcerpt.com (Jeff Boes)
Дата:
Why would a delete involving a subselect run so much longer than the
individual delete commands?

My situation: table A has 200,000 rows. I've made up a temporary table
which holds the single-valued primary key for 80,000 rows which I want
to delete.
 DELETE FROM a WHERE id IN (select ID from tmp LIMIT 800);

runs for several minutes.  But if I do

\o tmpfile
\t
SELECT 'DELETE FROM a WHERE id = ' || id || ';' from tmp limit 800;
\o
\i tmpfile

this completes in about 15 seconds, or 1/50 of the time for the
single-statement delete above.

In trying to optimize this process, I disabled all the relational
integrity triggers (foreign keys) involving the table, and then I
dropped all the indexes EXCEPT that of the primary key.  All the
experiments were done within a single transaction using BEGIN. The
database version is 7.1.3, and the table was vacuumed very recently.


Re: Long-running DELETE...WHERE...

От
Tom Lane
Дата:
jboes@nexcerpt.com (Jeff Boes) writes:
> My situation: table A has 200,000 rows. I've made up a temporary table
> which holds the single-valued primary key for 80,000 rows which I want
> to delete.
>   DELETE FROM a WHERE id IN (select ID from tmp LIMIT 800);
> runs for several minutes.

WHERE ... IN ... is notoriously inefficient.  I'd try
DELETE FROM aa WHERE id = tmp.id;

which is not standard SQL but should be able to produce a decent plan.

You might find that a VACUUM ANALYZE on both tables beforehand would be
a good idea, too; never-vacuumed temp tables have some default
statistics assumed that are a lot less than 80k rows.
        regards, tom lane


Re: Long-running DELETE...WHERE...

От
Chester Carlton Young
Дата:
Nice syntax.  Could not find in doc.  Do you have any ideas where I
could find it?

--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> jboes@nexcerpt.com (Jeff Boes) writes:
> > My situation: table A has 200,000 rows. I've made up a temporary
> table
> > which holds the single-valued primary key for 80,000 rows which I
> want
> > to delete.
> >   DELETE FROM a WHERE id IN (select ID from tmp LIMIT 800);
> > runs for several minutes.
> 
> WHERE ... IN ... is notoriously inefficient.  I'd try
> 
>     DELETE FROM aa WHERE id = tmp.id;
> 
> which is not standard SQL but should be able to produce a decent
> plan.
> 
> You might find that a VACUUM ANALYZE on both tables beforehand would
> be
> a good idea, too; never-vacuumed temp tables have some default
> statistics assumed that are a lot less than 80k rows.
> 
>             regards, tom lane
> 
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly


__________________________________________________
Do You Yahoo!?
Send FREE video emails in Yahoo! Mail!
http://promo.yahoo.com/videomail/


Re: Long-running DELETE...WHERE...

От
Frank Bax
Дата:
At 08:21 PM 1/14/02 -0500, Tom Lane wrote:
>jboes@nexcerpt.com (Jeff Boes) writes:
>> My situation: table A has 200,000 rows. I've made up a temporary table
>> which holds the single-valued primary key for 80,000 rows which I want
>> to delete.
>>   DELETE FROM a WHERE id IN (select ID from tmp LIMIT 800);
>> runs for several minutes.
>
>WHERE ... IN ... is notoriously inefficient.  I'd try
>
>    DELETE FROM aa WHERE id = tmp.id;
>
>which is not standard SQL but should be able to produce a decent plan.

Nice alternative.  Is there an alternate format for this one:

DELETE  FROM  teamwork  WHERE  emp  NOT IN   ( SELECT  DISTINCT  emp  FROM  timesheet      WHERE  lo_shift >
(now()-'90days'::interval))
 

Frank



Re: Long-running DELETE...WHERE...

От
Jochem van Dieten
Дата:
Jeff Boes wrote:

> Why would a delete involving a subselect run so much longer than the
> individual delete commands?


Some SQL statements are faster than others. Try to rewrite your query to 
use EXISTS instead of IN and see if it makes any difference. I would 
recommend using EXPLAIN to find out what PostgreSQL is really doing and 
optimize from there.

jochem