Re: Ye olde slow query

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Ye olde slow query
Дата
Msg-id 22920.1394576621@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Ye olde slow query  ("Murphy, Kevin" <MURPHYKE@email.chop.edu>)
Ответы Re: Ye olde slow query
Список pgsql-performance
"Murphy, Kevin" <MURPHYKE@email.chop.edu> writes:
> Synopsis: 8-table join with one "WHERE foo IN (...)" condition; works OK with fewer
> than 5 items in the IN list, but at N=5, the planner starts using a compound index
> for the first time that completely kills performance (5-6 minutes versus 0-12 seconds).
> I'm interested in learning what plays a role in this switch of plans (or the
> unanticipated relative slowness of the N=5 plan).  TIA for any wisdom; I've finally
> made a commitment to really delve into PG.  -Kevin

FWIW, I think the right question here is not "why is the slow query
slow?", but "why is the fast query fast?".  The planner is estimating
them both at nearly the same cost, and since that cost is quite high,
I'd say it's not too wrong about the slow query.  What it's wrong about
is the fast query; so you need to look at where its estimates are way
off base in that plan.

It looks like the trouble spot is this intermediate nested loop:

>                     ->  Nested Loop  (cost=4.32..283545.98 rows=80929 width=12) (actual time=163.609..571.237
rows=102loops=1) 
>                           Buffers: shared hit=419 read=63
>                           ->  Nested Loop  (cost=4.32..3426.09 rows=471 width=4) (actual time=93.595..112.404 rows=85
loops=1)
>                               ...
>                           ->  Index Scan using sample_result_variant_id on sample_result  (cost=0.00..593.01 rows=172
width=8)(actual time=5.147..5.397 rows=1 loops=85) 
>                                 Index Cond: (variant_id = variant_effect.variant_id)
>                                 Buffers: shared hit=400 read=42

which is creating the bulk of the estimated cost for the whole plan,
but execution is actually pretty cheap.  There seem to be two components
to the misestimation: one is that the sub-nested loop is producing about a
fifth as many rows as expected, and the other is that the probes into
sample_result are producing (on average) 1 row, not the 172 rows the
planner expects.  If you could get the latter estimate to be even within
one order of magnitude of reality, the planner would certainly see this
plan as way cheaper than the other.

So I'm wondering if the stats on sample_result and variant_effect are up
to date.  If they are, you might try increasing the stats targets for the
variant_id columns.

            regards, tom lane


В списке pgsql-performance по дате отправления:

Предыдущее
От: "Murphy, Kevin"
Дата:
Сообщение: Ye olde slow query
Следующее
От: Evgeny Shishkin
Дата:
Сообщение: Re: Query taking long time