Re: Ye olde slow query

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Ye olde slow query
Дата	11 марта 2014 г. 22:23:46
Msg-id	22920.1394576621@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Ye olde slow query ("Murphy, Kevin" <MURPHYKE@email.chop.edu>)
Ответы	Re: Ye olde slow query
Список	pgsql-performance

Дерево обсуждения

"Murphy, Kevin" <MURPHYKE@email.chop.edu> writes:
> Synopsis: 8-table join with one "WHERE foo IN (...)" condition; works OK with fewer
> than 5 items in the IN list, but at N=5, the planner starts using a compound index
> for the first time that completely kills performance (5-6 minutes versus 0-12 seconds).
> I'm interested in learning what plays a role in this switch of plans (or the
> unanticipated relative slowness of the N=5 plan).  TIA for any wisdom; I've finally
> made a commitment to really delve into PG.  -Kevin

FWIW, I think the right question here is not "why is the slow query
slow?", but "why is the fast query fast?".  The planner is estimating
them both at nearly the same cost, and since that cost is quite high,
I'd say it's not too wrong about the slow query.  What it's wrong about
is the fast query; so you need to look at where its estimates are way
off base in that plan.

It looks like the trouble spot is this intermediate nested loop:

>                     ->  Nested Loop  (cost=4.32..283545.98 rows=80929 width=12) (actual time=163.609..571.237
rows=102loops=1) 
>                           Buffers: shared hit=419 read=63
>                           ->  Nested Loop  (cost=4.32..3426.09 rows=471 width=4) (actual time=93.595..112.404 rows=85
loops=1)
>                               ...
>                           ->  Index Scan using sample_result_variant_id on sample_result  (cost=0.00..593.01 rows=172
width=8)(actual time=5.147..5.397 rows=1 loops=85) 
>                                 Index Cond: (variant_id = variant_effect.variant_id)
>                                 Buffers: shared hit=400 read=42

which is creating the bulk of the estimated cost for the whole plan,
but execution is actually pretty cheap.  There seem to be two components
to the misestimation: one is that the sub-nested loop is producing about a
fifth as many rows as expected, and the other is that the probes into
sample_result are producing (on average) 1 row, not the 172 rows the
planner expects.  If you could get the latter estimate to be even within
one order of magnitude of reality, the planner would certainly see this
plan as way cheaper than the other.

So I'm wondering if the stats on sample_result and variant_effect are up
to date.  If they are, you might try increasing the stats targets for the
variant_id columns.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Ye olde slow query