Re: Troubleshooting query performance issues

Поиск

Список

Период

Сортировка

От	Jim Garrison
Тема	Re: Troubleshooting query performance issues
Дата	24 сентября 2013 г. 22:25:49
Msg-id	0C723FEB5B4E5642B25B451BA57E27303EE05BE6@S1P5DAG3C.EXCHPROD.USA.NET обсуждение исходный текст
Ответ на	Re: Troubleshooting query performance issues (Dave Cramer <pg@fastcrypt.com>)
Список	pgsql-jdbc

Дерево обсуждения

Sorry, I will repost there.

From: davecramer@gmail.com [mailto:davecramer@gmail.com] On Behalf Of Dave Cramer
Sent: Tuesday, September 24, 2013 12:14 PM
To: Jim Garrison
Cc: pgsql-jdbc@postgresql.org
Subject: Re: [JDBC] Troubleshooting query performance issues

This is a question for the performance list

Dave Cramer

dave.cramer(at)credativ(dot)ca
http://www.credativ.ca

On Tue, Sep 24, 2013 at 1:54 PM, Jim Garrison <jim.garrison@nwea.org> wrote:

I spent about a week optimizing a query in our performance-testing environment, which has hardware similar to production.

I was able to refactor the query and reduce the runtime from hours to about 40 seconds, through the use of CTEs and a couple of new indexes.

The database was rebuilt and refreshed with the very similar data from production, but now the query takes hours again.

In the query plan, it is clear that the row count estimates are WAY too low, even though the statistics are up to date. Here's a sample query plan:

CTE Scan on stef (cost=164.98..165.00 rows=1 width=38)
CTE terms
-> Nested Loop (cost=0.00..62.40 rows=1 width=12)
-> Index Scan using term_idx1 on term t (cost=0.00..52.35 rows=1 width=12)
Index Cond: (partner_id = 497)
Filter: (recalculate_district_averages_yn AND (NOT is_deleted_yn))
-> Index Scan using growth_measurement_window_fk1 on growth_measurement_window gw (cost=0.00..10.04 rows=1 width=4)
Index Cond: (term_id = t.term_id)
Filter: (test_window_complete_yn AND (NOT is_deleted_yn) AND ((growth_window_type)::text = 'DISTRICT'::text))
CTE stef
-> Nested Loop (cost=0.00..102.58 rows=1 width=29)
Join Filter: ((ssef.student_id = terf.student_id) AND (ssef.grade_id = terf.grade_id))
-> Nested Loop (cost=0.00..18.80 rows=3 width=28)
-> CTE Scan on terms t (cost=0.00..0.02 rows=1 width=8)
-> Index Scan using student_school_enrollment_fact_idx2 on student_school_enrollment_fact ssef (cost=0.00..18.74 rows=3 width=20)
Index Cond: ((partner_id = t.partner_id) AND (term_id = t.term_id))
Filter: primary_yn
-> Index Scan using test_event_result_fact_idx3 on test_event_result_fact terf (cost=0.00..27.85 rows=4 width=25)
Index Cond: ((partner_id = t.partner_id) AND (term_id = t.term_id))
Filter: growth_event_yn

The estimates in the first CTE are correct, but in the second, the scan on student_school_enrollment_fact will return about 1.5 million rows, and the scan on test_event_result_fact actually returns about 1.1 million. The top level join should return about 900K rows.

What would cause PG to so badly estimate the row counts, and is this the cause of the slowness?

I've already regenerated the indexes and re-analyzed the tables involved.

What else can I do to find out why it's running so slowly?

--
Sent via pgsql-jdbc mailing list (pgsql-jdbc@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-jdbc

В списке pgsql-jdbc по дате отправления:

Предыдущее

От: Dave Cramer
Дата: 24 сентября 2013 г., 22:14:57
Сообщение: Re: Troubleshooting query performance issues

Следующее

От: Bosco Rama
Дата: 25 сентября 2013 г., 04:23:32
Сообщение: Maven repo version

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Troubleshooting query performance issues

Предыдущее

Следующее