UPDATE with subquery too slow
От | Eric Jain |
---|---|
Тема | UPDATE with subquery too slow |
Дата | |
Msg-id | 00ed01c3f552$ec054e80$c300000a@caliente обсуждение исходный текст |
Ответы |
Re: UPDATE with subquery too slow
(Kevin Brown <kevin@sysexperts.com>)
|
Список | pgsql-performance |
I can't get the following statement to complete with reasonable time. I've had it running for over ten hours without getting anywhere. I suspect (hope) there may be a better way to accomplish what I'm trying to do (set fields containing unique values to null): UPDATE requests SET session = NULL WHERE session IN ( SELECT session FROM requests GROUP BY session HAVING COUNT(*) = 1 ); Output of EXPLAIN: Nested Loop (cost=170350.16..305352.37 rows=33533 width=98) -> HashAggregate (cost=170350.16..170350.16 rows=200 width=8) -> Subquery Scan "IN_subquery" (cost=169728.12..170261.30 rows=35545 width=8) -> HashAggregate (cost=169728.12..169905.85 rows=35545 width=8) Filter: (count(*) = 1) -> Seq Scan on requests (cost=0.00..139207.75 rows=6104075 width=8) -> Index Scan using requests_session_idx on requests (cost=0.00..672.92 rows=168 width=106) Index Cond: (requests."session" = "outer"."session") If I drop the index on requests(session): Hash Join (cost=170350.66..340414.12 rows=33533 width=98) Hash Cond: ("outer"."session" = "inner"."session") -> Seq Scan on requests (cost=0.00..139207.75 rows=6104075 width=106) -> Hash (cost=170350.16..170350.16 rows=200 width=8) -> HashAggregate (cost=170350.16..170350.16 rows=200 width=8) -> Subquery Scan "IN_subquery" (cost=169728.12..170261.30 rows=35545 width=8) -> HashAggregate (cost=169728.12..169905.85 rows=35545 width=8) Filter: (count(*) = 1) -> Seq Scan on requests (cost=0.00..139207.75 rows=6104075 width=8) The subquery itself requires 5-10 min to run on its own, and may return several million rows. Using EXISTS rather than IN (I'm using 7.4-RC2, not sure if IN queries were already improved in this release): UPDATE requests SET session = NULL WHERE NOT EXISTS ( SELECT r.session FROM requests r WHERE r.session = session AND NOT r.id = id ); With and without index: Result (cost=227855.74..415334.22 rows=8075449 width=101) One-Time Filter: (NOT $0) InitPlan -> Seq Scan on requests r (cost=0.00..227855.74 rows=201 width=8) Filter: (("session" = "session") AND (id <> id)) -> Seq Scan on requests (cost=0.00..187478.49 rows=8075449 width=101) I've been running this for more than an hour so far, and no end in sight, either... Any ideas?
В списке pgsql-performance по дате отправления: