Обсуждение: Why does delete from table where not exists (select 1 from ... LIMIT 1) perform badly?
Why does delete from table where not exists (select 1 from ... LIMIT 1) perform badly?
От
Palle Girgensohn
Дата:
Hi,
I've read about the reason for this before, but cannot find a reference
to it now.
How come the planner treats the
delete from table where not extists(select 1 from table2 where ... LIMIT 1)
so differently, and usually badly, when the LIMIT 1 is there. In older
version of postgresql, I remember that the effect was the opposite, a
limit 1 would actually perform substantially better. Hence we have old
code (and old habits), where the LIMIT 1 is still used.
Shouldn't the planner really understand that the intention is the same
in these two queries?
-- bad:
DELETE FROM iup_locked_gradings ilg
WHERE NOT EXISTS ( SELECT 1 FROM iup_locked_subject ils WHERE ils.locked_gradings_id = ilg.locked_gradings_id LIMIT 1
)
;
-- good:
DELETE FROM iup_locked_gradings ilg
WHERE NOT EXISTS ( SELECT 1 FROM iup_locked_subject ils WHERE ils.locked_gradings_id = ilg.locked_gradings_id )
;
pp=# begin; explain DELETE FROM iup_locked_gradings ilg WHERE NOT EXISTS
(SELECT 1 FROM iup_locked_subject ils WHERE ils.locked_gradings_id =
ilg.locked_gradings_id LIMIT 1);
BEGIN QUERY PLAN
-----------------------------------------------------------------------------------------------Delete
(cost=0.00..523542963.48rows=291737 width=6) -> Seq Scan on iup_locked_gradings ilg (cost=0.00..523542963.48
rows=291737 width=6) Filter: (NOT (SubPlan 1)) SubPlan 1 -> Limit (cost=0.00..897.27 rows=1
width=0) -> Seq Scan on iup_locked_subject ils
(cost=0.00..18842.76 rows=21 width=0) Filter: (locked_gradings_id = $0)
(7 rows)
pp=# begin; explain DELETE FROM iup_locked_gradings ilg WHERE NOT EXISTS
(SELECT 1 FROM iup_locked_subject ils WHERE ils.locked_gradings_id =
ilg.locked_gradings_id );
BEGIN QUERY PLAN
---------------------------------------------------------------------------------------------------Delete
(cost=31705.39..47934.47rows=553737 width=12) -> Hash Anti Join (cost=31705.39..47934.47 rows=553737 width=12)
Hash Cond: (ilg.locked_gradings_id = ils.locked_gradings_id) -> Seq Scan on iup_locked_gradings ilg
(cost=0.00..6677.44
rows=583474 width=10) -> Hash (cost=15776.83..15776.83 rows=1226373 width=10) -> Seq Scan on
iup_locked_subjectils
(cost=0.00..15776.83 rows=1226373 width=10)
(6 rows)
pp=#
chalmers=# \d iup_locked_gradings Table "public.iup_locked_gradings" Column
| Type |
Modifiers
----------------------+---------+----------------------------------------------------------------------------------locked_gradings_id
| integer | not null default
nextval('iup_locked_gradings_locked_gradings_id_seq'::regclass)type | integer |description |
text |name | text |original_gradings_id | integer |
Indexes: "iup_locked_gradings_pkey" PRIMARY KEY, btree (locked_gradings_id),
tablespace "opt"
Referenced by: TABLE "iup_locked_subject" CONSTRAINT
"iup_locked_subject_locked_gradings_id_fkey" FOREIGN KEY
(locked_gradings_id) REFERENCES iup_locked_gradings(locked_gradings_id)
ON UPDATE CASCADE ON DELETE SET NULL
Tablespace: "opt"
chalmers=# \d iup_locked_subject Table "public.iup_locked_subject" Column
| Type |
Modifiers
---------------------+---------+--------------------------------------------------------------------------------locked_subject_id
| integer | not null default
nextval('iup_locked_subject_locked_subject_id_seq'::regclass)name | text | not nulllink_url
| text |description | text |use_measures | boolean | not null default truelocked_gradings_id |
integer|original_subject_id | integer |use_fail_warning | boolean | not null default false
Indexes: "iup_locked_subject_pkey" PRIMARY KEY, btree (locked_subject_id),
tablespace "opt"
Foreign-key constraints: "iup_locked_subject_locked_gradings_id_fkey" FOREIGN KEY
(locked_gradings_id) REFERENCES iup_locked_gradings(locked_gradings_id)
ON UPDATE CASCADE ON DELETE SET NULL
Referenced by:
Tablespace: "opt"
Palle Girgensohn <girgen@pingpong.net> writes:
> How come the planner treats the
> delete from table where not extists(select 1 from table2 where ... LIMIT 1)
> so differently, and usually badly, when the LIMIT 1 is there.
Because it can't optimize it into an antijoin.
> In older
> version of postgresql, I remember that the effect was the opposite, a
> limit 1 would actually perform substantially better. Hence we have old
> code (and old habits), where the LIMIT 1 is still used.
Well, you're basically forcing it into the same type of plan you would
have gotten before antijoins were implemented (circa 8.4), so I don't
see that this is a regression. But I'd get rid of the LIMIT 1 if I were
you. It's been a *very* long time since that was a net benefit in an
EXISTS subquery, if indeed it ever was --- AFAIR, even the earliest PG
versions that understood about optimizing for fast-start plans would do
so in an EXISTS subquery, with or without any LIMIT.
regards, tom lane