Re: parallel joins, and better parallel explain

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: parallel joins, and better parallel explain
Дата
Msg-id CAFiTN-s7fViQ17d1yxn9BoCuH+z9WL86tg9td_DU3yGQVU-==g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: parallel joins, and better parallel explain  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Wed, Dec 17, 2015 at 11:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

> While looking at plans of Q5 and Q7, I have observed that Gather is
> pushed below another Gather node for which we don't have appropriate
> way of dealing.  I think that could be the reason why you are seeing
> the errors.

Ok

> Also, I think it would be good if you can once check the plan/execution
> time with max_parallel_degree=0 as that can give us base reference
> data without parallelism, also I am wondering if have you have changed
> any other parallel cost related parameter?

Oops, Earlier i had changed parallel_tuple_cost parameter to 0.01, now i have changed it to default value 0.1 and taken performance again, with  max_parallel_degree=0 and max_parallel_degree=4.

Note: Last time i used scale factor 1 for generating TPC-H data (./dbgen -v -s 1), but after using default value of parallel_tuple_cost, it was not selecting parallel join, so i have taken the results with scale factor 5 (./dbgen -v -s 5)

Below are the latest performance data.

1. TPC-H Q2:
    max_parallel_degree=0  
        Planning time: 2.321 ms
        Execution time: 829.817 ms
    
    max_parallel_degree=4
        Planning time: 2.530 ms
        Execution time: 803.428 ms
2. TPC-H Q5:
    max_parallel_degree=0  
        Planning time: 1.938 ms
        Execution time: 1062.419 ms

    max_parallel_degree=4
       Planning time: 2.950 ms
       Execution time: 487.461 ms

3. TPC-H Q7:
    max_parallel_degree=0  
       Planning time: 2.515 ms
       Execution time: 1651.763 ms

    max_parallel_degree=4
        Planning time: 2.379 ms
        Execution time: 2107.863 ms

Plans for max_parallel_degree=0 and max_parallel_degree=4 are attached in the mail with  file names are q*_base.out and q*_parallel.out respectively.

For Q3 its not selecting parallel plan.


Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

On Thu, Dec 17, 2015 at 11:03 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Dec 16, 2015 at 9:55 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
On Wed, Dec 16, 2015 at 6:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

>On Tue, Dec 15, 2015 at 7:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Mon, Dec 14, 2015 at 8:38 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

> In any case,
>I have done some more investigation of the patch and found that even
>without changing query planner related parameters, it seems to give
>bad plans (as in example below [1]).  I think here the costing of rework each

I have done some more testing using TPC-H benchmark (For some of the queries, specially for Parallel Hash Join), and Results summary is as below.


Planning Time(ms)
QueryBasePatch
TPC-H Q22.22.4
TPCH- Q30.670.71
TPCH- Q53.172.3
TPCH- Q72.432.4




Execution Time(ms)
QueryBasePatch
TPC-H Q22826766
TPCH- Q32347324271
TPCH- Q5213571432
TPCH- Q767791138

All Test files and Detail plan output is attached in mail
q2.sql, q3.sql, q.5.sql ans q7.sql are TPCH benchmark' 2nd, 3rd, 5th and 7th query
and Results with base and Parallel join are attached in q*_base.out and q*_parallel.out respectively.

Summary: With TPC-H queries where ever Hash Join is pushed under gather Node, significant improvement is visible,
with Q2, using 3 workers, time consumed is almost 1/3 of the base.


I Observed one problem, with Q5 and Q7, there some relation and snapshot references are leaked and i am getting below warning, havn't yet looked into the issue.


While looking at plans of Q5 and Q7, I have observed that Gather is
pushed below another Gather node for which we don't have appropriate
way of dealing.  I think that could be the reason why you are seeing
the errors.

Also, I think it would be good if you can once check the plan/execution
time with max_parallel_degree=0 as that can give us base reference
data without parallelism, also I am wondering if have you have changed
any other parallel cost related parameter?


With Regards,
Amit Kapila.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: extend pgbench expressions with functions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_tables bug?