Re: enable_XXX options
От | Peterson, Bjorn |
---|---|
Тема | Re: enable_XXX options |
Дата | |
Msg-id | 42E7C401A4CBCD49829BB53DDD538BA402114414@chmpgexgmail3.ic.ncs.com обсуждение исходный текст |
Ответ на | enable_XXX options ("Peterson, Bjorn" <Bjorn.Peterson@pearson.com>) |
Ответы |
Re: enable_XXX options
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-performance |
-----Original Message----- >From: Tom Lane [mailto:tgl@sss.pgh.pa.us] >Sent: Friday, April 01, 2005 10:37 AM >To: Peterson, Bjorn >Cc: pgsql-performance@postgresql.org >Subject: Re: [PERFORM] enable_XXX options > >"Peterson, Bjorn" <Bjorn.Peterson@pearson.com> writes: >> I have a query in my application that takes an unreasonable amount of time >> to complete (>4.5 hours execution time). After reviewing the EXPLAIN and >> EXPLAIN ANALYZE output for that and similar queries, my colleagues and I >> determined that turning off the enable_nestloop option might help - we >> noticed dramatic speed improvements for that specific query after doing so >> (<2 minutes execution time). I was warned not to mess with the enable_XXX >> options in a production environment, but does anyone see any problem with >> turning off the enable_nestloop option right before executing my query and >> turning it back on afterwards? > >That's what it's there for ... but it would be useful to look into why >the planner gets it so wrong without that hint. Could we see EXPLAIN >ANALYZE both ways? > > regards, tom lane > Below is my query and the output of EXPLAIN - I was not able to run EXPLAIN ANALYZE, as the query never completes unless we turn enable_nestloop off: SELECT t.term_id AS term_id, a.user_id AS user_id, a.time_slot AS course_id, a.attendance_status AS status, SUM(CASE WHEN a.attendance_date>=t.start_date THEN 1 ELSE 0 END) AS cur_total, COUNT(a.attendance_date) AS ytd_total FROM "Attendance" a, "Terms" t, "Terms" ytd, "CoursesUsers" cu, "Courses" c, "CoursesOffered" co, "Schools" s WHERE a.attendance_type=1 AND a.attendance_status IN(3,4,1,2) AND a.attendance_date>=ytd.start_date AND a.attendance_date<=t.end_date AND a.attendance_date<=now() AND a.user_id=cu.user_id AND a.time_slot=cu.course_id AND cu.course_id=c.course_id AND co.course_offered_id=c.course_offered_id AND co.school_id=s.school_id AND s.district_id=2 AND ytd.term_id=t.top_term_id AND c.course_id IN (221,395,244,394,366,370,400,11,373,369,392,406,398,381,391,393,403,376,220, 846,440,935,910,431,428,904,905,222,201,453,913,1794,408,901,856,424,443,175 0,452,461,462,471,463,911,489,821,916,501,223) GROUP BY a.user_id, a.time_slot, t.term_id, a.attendance_status ORDER BY a.user_id, a.time_slot, t.term_id, a.attendance_status The Attendance table is the largest (about 2 million records), Terms has about 50 records, CoursesUsers has about 30,000 records, Courses has about 2000 records, CoursesOffered has about 1000 records, and Schools has 3 records. The purpose of this query is to retrieve the number of absences for each student/course/term combination - we need separate totals for year-to-date (from the start of the school year), and for absences only within the current term. Every field referenced in the WHERE clause has an appropriate single or multi-column index. We are using the standard PostgreSQL JDBC driver and the only parameter being set in this query is the district_id (s.district_id=2). We are running Postgres 8.0.1 on a Windows 2000 server. With enable_nestloop on (default): QUERY PLAN GroupAggregate (cost=4674.63..4677.13 rows=100 width=22) -> Sort (cost=4674.63..4674.88 rows=100 width=22) Sort Key: a.user_id -> Nested Loop (cost=276.69..4671.30 rows=100 width=22) Join Filter: (("outer".attendance_date <= "inner".end_date) AND ("outer".attendance_date >= "inner".start_date)) -> Hash Join (cost=273.30..4649.92 rows=20 width=14) Hash Cond: ("outer".school_id = "inner".school_id) -> Nested Loop (cost=272.26..4648.50 rows=25 width=18) -> Hash Join (cost=272.26..986.69 rows=836 width=16) Hash Cond: ("outer".course_offered_id = "inner".course_offered_id) -> Hash Join (cost=246.81..948.70 rows=836 width=16) Hash Cond: ("outer".course_id = "inner".course_id) -> Seq Scan on "CoursesUsers" cu (cost=0.00..545.02 rows=29702 width=8) -> Hash (cost=246.68..246.68 rows=49 width=8) -> Seq Scan on "Courses" c (cost=0.00..246.68 rows=49 width=8) Filter: ((course_id = 221) OR (course_id = 395) OR (course_id = 244) OR (course_id = 394) OR (course_id = 366) OR (course_id = 370) OR (course_id = 400) OR (course_id = 11) OR (course_id = 373) OR (course_i (..) -> Hash (cost=23.36..23.36 rows=836 width=8) -> Seq Scan on "CoursesOffered" co (cost=0.00..23.36 rows=836 width=8) -> Index Scan using "Attendance_pkey" on "Attendance" a (cost=0.00..4.37 rows=1 width=14) Index Cond: ((a.attendance_date <= now()) AND (a.attendance_type = 1) AND ("outer".course_id = a.time_slot) AND (a.user_id = "outer".user_id)) Filter: ((attendance_status = 3) OR (attendance_status = 4) OR (attendance_status = 1) OR (attendance_status = 2)) -> Hash (cost=1.04..1.04 rows=3 width=4) -> Seq Scan on "Schools" s (cost=0.00..1.04 rows=3 width=4) Filter: (district_id = 2) -> Materialize (cost=3.39..3.75 rows=36 width=16) -> Hash Join (cost=1.45..3.35 rows=36 width=16) Hash Cond: ("outer".top_term_id = "inner".term_id) -> Seq Scan on "Terms" t (cost=0.00..1.36 rows=36 width=16) -> Hash (cost=1.36..1.36 rows=36 width=8) -> Seq Scan on "Terms" ytd (cost=0.00..1.36 rows=36 width=8) ------------------------------------------------------------------------- After turning enable_nestloop off: QUERY PLAN GroupAggregate (cost=100078595.13..100078597.63 rows=100 width=22) -> Sort (cost=100078595.13..100078595.38 rows=100 width=22) Sort Key: a.user_id -> Nested Loop (cost=100078571.91..100078591.81 rows=100 width=22) Join Filter: (("inner".attendance_date <= "outer".end_date) AND ("inner".attendance_date >= "outer".start_date)) -> Hash Join (cost=1.45..3.35 rows=36 width=16) Hash Cond: ("outer".top_term_id = "inner".term_id) -> Seq Scan on "Terms" t (cost=0.00..1.36 rows=36 width=16) -> Hash (cost=1.36..1.36 rows=36 width=8) -> Seq Scan on "Terms" ytd (cost=0.00..1.36 rows=36 width=8) -> Materialize (cost=78570.46..78570.66 rows=20 width=14) -> Hash Join (cost=991.91..78570.44 rows=20 width=14) Hash Cond: ("outer".school_id = "inner".school_id) -> Hash Join (cost=990.87..78569.02 rows=25 width=18) Hash Cond: (("outer".time_slot = "inner".course_id) AND ("outer".user_id = "inner".user_id)) -> Seq Scan on "Attendance" a (cost=0.00..75599.26 rows=79148 width=14) Filter: ((attendance_type = 1) AND ((attendance_status = 3) OR (attendance_status = 4) OR (attendance_status = 1) OR (attendance_status = 2)) AND (attendance_date <= now())) -> Hash (cost=986.69..986.69 rows=836 width=16) -> Hash Join (cost=272.26..986.69 rows=836 width=16) Hash Cond: ("outer".course_offered_id = "inner".course_offered_id) -> Hash Join (cost=246.81..948.70 rows=836 width=16) Hash Cond: ("outer".course_id = "inner".course_id) -> Seq Scan on "CoursesUsers" cu (cost=0.00..545.02 rows=29702 width=8) -> Hash (cost=246.68..246.68 rows=49 width=8) -> Seq Scan on "Courses" c (cost=0.00..246.68 rows=49 width=8) Filter: ((course_id = 221) OR (course_id = 395) OR (course_id = 244) OR (course_id = 394) OR (course_id = 366) OR (course_id = 370) OR (course_id = 400) OR (course_id = 11) OR (course_id = 373) (..) -> Hash (cost=23.36..23.36 rows=836 width=8) -> Seq Scan on "CoursesOffered" co (cost=0.00..23.36 rows=836 width=8) -> Hash (cost=1.04..1.04 rows=3 width=4) -> Seq Scan on "Schools" s (cost=0.00..1.04 rows=3 width=4) Filter: (district_id = 2) **************************************************************************** This email may contain confidential material. If you were not an intended recipient, Please notify the sender and delete all copies. We may monitor email to and from our network. ****************************************************************************
В списке pgsql-performance по дате отправления: