Re: Slow query and wrong row estimates for CTE

Поиск

Список

Период

Сортировка

От	Justin Pryzby
Тема	Re: Slow query and wrong row estimates for CTE
Дата	15 февраля 2021 г. 22:32:41
Msg-id	20210215223240.GE28165@telsasoft.com обсуждение исходный текст
Ответ на	Slow query and wrong row estimates for CTE (Dane Foster <studdugie@gmail.com>)
Ответы	Re: Slow query and wrong row estimates for CTE
Список	pgsql-performance

Дерево обсуждения

On Mon, Feb 15, 2021 at 12:49:29PM -0500, Dane Foster wrote:
> PostgreSQL version: PostgreSQL 12.5 on x86_64-pc-linux-gnu, compiled by gcc
> (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit

> EXPLAIN (ANALYZE, BUFFERS)
> WITH max_spi AS (

Since v12, CTEs are usually inlined by default.
I suspect it doesn't help, but as an experiment you could try
WITH .. AS MATERIALIZED.

You could try instead: CREATE TEMPORARY TABLE + ANALYZE, which will use
statistics that "WITH" CTE's don't have (like the rowcount after GROUPing).

>  Aggregate  (cost=672426.02..672426.03 rows=1 width=24) (actual time=903359.923..903368.957 rows=1 loops=1)
>    Buffers: shared hit=6167172 read=4199539, temp read=99551 written=99678
>    I/O Timings: read=839121.853

This shows that most of time is spent in I/O (839s/903s)

>   ->  Nested Loop  (cost=672389.80..672425.91 rows=8 width=37) (actual time=36633.920..885232.956 rows=7034196
loops=1)
>         Buffers: shared hit=6167172 read=4199539, temp read=99551 written=99678
...
>                            ->  Hash Join  (cost=424676.58..671389.26 rows=1 width=24) (actual
time=25169.930..34121.825rows=833436 loops=3)

>                                  Hash Cond: ((attempt_scores_1.student_id = attempt_scores_2.student_id) AND
(attempt_scores_1.assignment_id= attempt_scores_2.assignment_id) AND (attempt_scores_1.score_value =
(max(attempt_scores_2.score_value))))

This shows that it estimated 1 row but got 833k, so the plan may be no good.
As another quick experiment, you could try SET enable_nestloop=off.

>         ->  Index Scan using for_upsert on attempt_scores (cost=0.43..36.42 rows=8 width=37) (actual
time=0.394..0.896rows=7 loops=938244)

>               Index Cond: (attempt_report_id = (max(attempt_scores_1.attempt_report_id)))
>               Buffers: shared hit=5976100 read=3869579
>               I/O Timings: read=820910.987

This shows where most of your I/O time is from.
I think you could maybe improve this by clustering the table on for_upsert and
analyzing.  Very possibly your "id" and "time" columns are all correlated.
They might already/automatically be correlated - you can check the correlation
stat:
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

Without looking closely, an index might help: student_id,assignment_id
That'd avoid the sort, and maybe change the shape of the whole plan.

-- 
Justin

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Slow query and wrong row estimates for CTE