not in vs not exists - vastly diferent performance

Поиск
Список
Период
Сортировка
От Iain
Тема not in vs not exists - vastly diferent performance
Дата
Msg-id 013d01c3ca89$7307d7d0$7201a8c0@mst1x5r347kymb
обсуждение исходный текст
Ответы Re: not in vs not exists - vastly diferent performance  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-sql
Hi All,
 
I found this interesting and thought I'd offer it up for comment.
 
I have the following delete:
 
delete from tableB where id not in (select id from tableA);
 
There are about 100,000 rows in table A, and about 500,000 in table B. id is indexed on both tables. This is just a development DB, and I wanted to clean it up so I could put in RI constraints. Somewhere along the line, records have been deleted from tableA leaving orphans in tableB.
 
I launched the query yesterday afternoon, and it hadn't returned as of this morning, 15 hours later. Running top showed that CPU utilization was running close to 100%, and the disk was not busy at all. Anyway, I killed it and did some testing:
 
Analyse revealed this plan, and varying random_page_cost between 1 and 4 didn't affect it:
 
 Seq Scan on tableB  (cost=0.00..1003619849.56 rows=251513 width=6)
   Filter: (NOT (subplan))
   SubPlan
     ->  Seq Scan on tableA (cost=0.00..3738.64 rows=100664 width=4)
recoding the delete to use not exists as follows:
 
delete from tableB where not exists (select id from tableA where tableA.id = tableB.id);
 
Gave this plan:
 
Seq Scan on tableB  (cost=0.00..719522.41 rows=236131 width=6)
   Filter: (NOT (subplan))
   SubPlan
     ->  Index Scan using tableB_pk on tableA(cost=0.00..3.01 rows=2 width=4)
           Index Cond: ((id)::integer = ($0)::integer)
This deleted 1200 rows in about 2 seconds. Much better.
 
Anyway, I was a little surprised that "not in" chose to use a seq scan on the tableA in this case. I had imagined, given that statistics were up to date and the size of the table, that the plan would have been similar to that generated by not exists, or perhaps would have used a hash table based on the tableA ids.
 
Something to think about anyway.
Regards
Iain

В списке pgsql-sql по дате отправления:

Предыдущее
От: alvaro@audifarma.com.co
Дата:
Сообщение: MD5 encrypt
Следующее
От: "Chris Travers"
Дата:
Сообщение: Re: Distributed keys