not in vs not exists - vastly diferent performance

Поиск

Список

Период

Сортировка

От	Iain
Тема	not in vs not exists - vastly diferent performance
Дата	24 декабря 2003 г. 21:51:13
Msg-id	013d01c3ca89$7307d7d0$7201a8c0@mst1x5r347kymb обсуждение исходный текст
Ответы	Re: not in vs not exists - vastly diferent performance
Список	pgsql-sql

Дерево обсуждения

Hi All,

I found this interesting and thought I'd offer it up for comment.

I have the following delete:

delete from tableB where id not in (select id from tableA);

There are about 100,000 rows in table A, and about 500,000 in table B. id is indexed on both tables. This is just a development DB, and I wanted to clean it up so I could put in RI constraints. Somewhere along the line, records have been deleted from tableA leaving orphans in tableB.

I launched the query yesterday afternoon, and it hadn't returned as of this morning, 15 hours later. Running top showed that CPU utilization was running close to 100%, and the disk was not busy at all. Anyway, I killed it and did some testing:

Analyse revealed this plan, and varying random_page_cost between 1 and 4 didn't affect it:

Seq Scan on tableB (cost=0.00..1003619849.56 rows=251513 width=6)
   Filter: (NOT (subplan))
   SubPlan
     -> Seq Scan on tableA (cost=0.00..3738.64 rows=100664 width=4)

recoding the delete to use not exists as follows:

delete from tableB where not exists (select id from tableA where tableA.id = tableB.id);

Gave this plan:

Seq Scan on tableB (cost=0.00..719522.41 rows=236131 width=6)
   Filter: (NOT (subplan))
   SubPlan
     -> Index Scan using tableB_pk on tableA(cost=0.00..3.01 rows=2 width=4)
           Index Cond: ((id)::integer = ($0)::integer)

This deleted 1200 rows in about 2 seconds. Much better.

Anyway, I was a little surprised that "not in" chose to use a seq scan on the tableA in this case. I had imagined, given that statistics were up to date and the size of the table, that the plan would have been similar to that generated by not exists, or perhaps would have used a hash table based on the tableA ids.

Something to think about anyway.

Regards

Iain

В списке pgsql-sql по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

not in vs not exists - vastly diferent performance