Re: Query running a lot faster with enable_nestloop=false

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: Query running a lot faster with enable_nestloop=false
Дата	3 ноября 2011 г. 15:35:09
Msg-id	CA+TgmoY1Qhi6-8iz_y6Qr_Wnjcs_BXufHHOZPuBqij_4jewVHQ@mail.gmail.com обсуждение исходный текст
Ответ на	Query running a lot faster with enable_nestloop=false (Mohanaraj Gopala Krishnan <mohangk@gmail.com>)
Список	pgsql-performance

Дерево обсуждения

On Tue, Oct 25, 2011 at 5:09 AM, Mohanaraj Gopala Krishnan
<mohangk@gmail.com> wrote:
> I have a query that runs a lot slower (~5 minutes) when I run it with
> the default enable_nestloop=true and enable_nestloop=false (~10 secs).
> The actual query is available here http://pastie.org/2754424 . It is a
> reporting query with many joins as the database is mainly used for
> transaction processing.
>
> Explain analyse result for both cases:
>
> Machine A nestloop=true - http://explain.depesz.com/s/nkj0 (~5 minutes)
> Machine A nestloop=false - http://explain.depesz.com/s/wBM (~10 secs)

A good start might be to refactor this:

Seq Scan on retailer_categories retailer_category_leaf_nodes
(cost=0.00..18.52 rows=1 width=16) (actual time=0.016..0.392 rows=194
loops=1)
    Filter: ((tree_right - tree_left) = 1)

And this:

Seq Scan on product_categories product_category_leaf_nodes
(cost=0.00..148.22 rows=2 width=32) (actual time=0.031..1.109 rows=383
loops=1)
    Filter: ((tree_right - tree_left) = 1)

The query planner really has no idea what selectivity to assign to
that condition, and so it's basically guessing, and it's way off.  You
could probably improve the estimate a lot by adding a column that
stores the values of tree_right - tree_left and is updated manually or
by triggers as you insert and update data.  Then you could just check
tree_left_right_difference = 1, which should get a much more accurate
estimate, and hopefully therefore a better plan.

You've also got a fairly large estimation error here:

Index Scan using invoices_create_date_idx on invoices (cost=0.00..8.28
rows=1 width=4) (actual time=0.055..0.305 rows=109 loops=1)
    Index Cond: ((create_date >= '2011-09-15'::date) AND (create_date
<= '2011-09-15'::date))
    Filter: (status = 128)

Apparently, status 128 is considerably more common among rows in that
date range than it is overall.  Unfortunately, it's not so easy to fix
this kind of estimation error, unless you can refactor your schema to
avoid needing to filter on both create_date and status at the same
time.

It might be worth using temporary tables here - factor out sections of
the query that are referenced multiple times, like the join between
sales_order_items and invoices, and create a temporary table.  ANALYZE
it, and then use it to run the main query.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-performance по дате отправления:

Предыдущее

От: Rodrigo Gonzalez
Дата: 03 ноября 2011 г., 15:32:34
Сообщение: Re: function slower than the same code in an sql file

Следующее

От: Robert Haas
Дата: 03 ноября 2011 г., 15:40:37
Сообщение: Re: function slower than the same code in an sql file

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Query running a lot faster with enable_nestloop=false

Предыдущее

Следующее