Re: Performance issues

Поиск
Список
Период
Сортировка
От Vivekanand Joshi
Тема Re: Performance issues
Дата
Msg-id CANwLwati=vJNza17YAhHmPg8jfX+=g2VdKssPYb-6g2xEXVmsA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Performance issues  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-performance

Hey guys, thanks a lot.

This is really helping. I am learning a lot. BTW, I changed CTE into subquery and it improved the performance by miles. I am getting the result in less than 3 seconds, though I am using a 24 GB ram server. It is still a great turnaround time as compared to  previous execution time.

Now I will look into the bigger query. I read explain analyze and that helped a lot. I will be coming up with more questions tomorrow as bigger query still has got some problems.

On 16 Mar 2015 23:55, "Tomas Vondra" <tomas.vondra@2ndquadrant.com> wrote:
On 16.3.2015 18:49, Marc Mamin wrote:
>
>> Hi Team,
>>
>> This is the EXPLAIN ANALYZE for one of the view : S_V_D_CAMPAIGN_HIERARCHY:

FWIW, this is a somewhat more readable version of the plan:

    http://explain.depesz.com/s/nbB

In the future, please do two things:

(1) Attach the plan as a text file, because the mail clients tend to
    screw things up (wrapping long lines). Unless the plan is trivial,
    of course - but pgsql-performance usually deals with complex stuff.

(2) Put the plan on explain.depesz.com helps too, because it's
    considerably more readable (but always do 1, because resorces
    placed somewhere else tends to disappear, and the posts then make
    very little sense, which is bad when searching in the archives)

(3) Same for stuff pasted somewhere else - always attach it to the
    message. For example I'd like to give you more accurate advice, but
    I can't as http://pgsql.privatepaste.com/41207bea45 is unavailable.

>
>
>       >Rows Removed by Join Filter: 3577676116
>
>       That's quite a lot.
>       You're possibly missing a clause in a join, resulting in a cross join.
>       It is also helpful to put your result here:
>       http://explain.depesz.com/
>       regards,

IMHO this is merely a consequence of using the CTE, which produces 52997
rows and is scanned 67508x as the inner relation of a nested loop. That
gives you 3577721476 tuples in total, and only 45360 are kept (hence
3577676116 are removed).

This is a prime example of why CTEs are not just aliases for subqueries,
but may actually cause serious trouble.

There are other issues (e.g. the row count estimate of the CTE is
seriously off, most likely because of the HashAggregate in the outer
branch), but that's a secondary issue IMHO.

Vivekanand, try this (in the order of intrusiveness):

(1) Get rid of the CTE, and just replace it with subselect in the FROM
    part of the query, so instead of this:

    WITH valid_executions AS (...)
    SELECT ... FROM ... JOIN valid_executions ON (...)

    you'll have something like this:

    SELECT ... FROM ... JOIN (...) AS valid_executions ON (...)

    This way the subselect will optimized properly.


(2) Replace the CTE with a materialized view, or a temporary table.
    This has both advantages and disadvantages - the main advantage is
    that you can create indexes, collect statistics. Disadvantage is
    you have to refresh the MV, fill temporary table etc.

I expect (1) to improve the performance significantly, and (2) might
improve it even further by fixing the misestimates.


regards

--
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bad cost estimate with FALSE filter condition
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: Bad cost estimate with FALSE filter condition