Re: query optimization

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: query optimization
Дата	25 ноября 2009 г. 16:27:35
Msg-id	603c8f070911250927v495f9317j730da3bd55f26870@mail.gmail.com обсуждение исходный текст
Ответ на	query optimization (Faheem Mitha <faheem@email.unc.edu>)
Ответы	Re: query optimization
Список	pgsql-performance

Дерево обсуждения

On Mon, Nov 23, 2009 at 5:47 PM, Faheem Mitha <faheem@email.unc.edu> wrote:
>
> Hi everybody,
>
> I've got two queries that needs optimizing. Actually, there are others, but
> these are pretty representative.
>
> You can see the queries and the corresponding plans at
>
> http://bulldog.duhs.duke.edu/~faheem/snpdb/opt.pdf
>
> or
>
> http://bulldog.duhs.duke.edu/~faheem/snpdb/opt.tex
>
> if you prefer text (latex file, effectively text in this case)
>
> The background to this is at
> http://bulldog.duhs.duke.edu/~faheem/snpdb/diag.pdf
>
> If more details are required, let me know and I can add them. I'd appreciate
> suggestions about how to make these queries go faster.
>
> Please CC this email address on any replies.

I've found that a good way to approach optimizing queries of this type
is to look at the EXPLAIN ANALYZE results and figure out which parts
of the query are slow.  Then simplify the rest of the query as much as
possible without eliminating the slowness.  Then try to figure out how
to optimize the simplified query: rewrite the logic, add indices,
change the schema, etc.  Lastly start adding the other bits back in.

It looks like the dedup_patient_anno CTE is part of your problem.  Try
pulling that piece out and optimizing it separately.  I wonder if that
could be rewritten to use SELECT DISTINCT ON (...) and whether that
would be any faster.  If not, you might want to look at some way of
pre-marking the non-duplicate rows so that you don't have to recompute
that each time.  Then you might be able to use the underlying table
directly in the next CTE, which will usually permit better
optimization, more use of indices, etc.  It seems pretty unfortunate
that dedup_patient_anno joins against geno and then patient_geno does
what appears to be the same join again.  Is there some way to
eliminate that?  If so it will probably help.

Once you've got those parts of the query as well-optimized as you can,
add the next pieces in and start hacking on those.

...Robert

В списке pgsql-performance по дате отправления:

Предыдущее

От: Grzegorz Jaśkiewicz
Дата: 25 ноября 2009 г., 16:05:18
Сообщение: Re: Query times change by orders of magnitude as DB ages

Следующее

От: Robert Haas
Дата: 25 ноября 2009 г., 18:02:00
Сообщение: Re: Query times change by orders of magnitude as DB ages

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: query optimization

Предыдущее

Следующее