Обсуждение: Parallel sec scan in plpgsql

Поиск

Список

Период

Сортировка

Parallel sec scan in plpgsql

От

Alex Ignatov

Дата:

15 сентября 2016 г., 15:45:18

Hello!
Does parallel secscan works in plpgsql?

-- 
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Parallel sec scan in plpgsql

От

Ashutosh Bapat

Дата:

16 сентября 2016 г., 04:27:14

On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:
> Hello!
> Does parallel secscan works in plpgsql?
>

Parallel seq scan is a query optimization that will work independent
of the source of the query - i.e whether it comes directly from a
client or a procedural language like plpgsql. So, I guess, answer to
your question is yes. If you are expecting something else, more
context will help.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Parallel sec scan in plpgsql

От

Alex Ignatov

Дата:

16 сентября 2016 г., 13:27:53

No it doesn't.
Paralleling neither sql function nor plpgsql:
Here is example :

ipdr=> show max_worker_processes ; max_worker_processes
---------------------- 128
(1 row)
ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set force_parallel_mode=on;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET

ipdr=> create table test as select (random ()*1000)::int % 3 as a,
ipdr->                             (random ()*1000)::int % 5 as b,
ipdr->                             (random ()*1000)::int % 7 as c,
ipdr->                             (random ()*1000)::int % 11 as d,
ipdr->                             (random ()*1000)::int % 13 as e,
ipdr->                             (random ()*1000)::int % 17 as bytes
ipdr->                             from generate_series(1,10*1000*1000);
SELECT 10000000


ipdr=> create or replace function parallel_test_plpgsql() returns bigint as
ipdr-> $$
ipdr$> declare
ipdr$>    cnt int:=0;
ipdr$> begin
ipdr$>    select count(*) into cnt from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
ipdr$>    return cnt;
ipdr$> end;
ipdr$> $$ language plpgsql PARALLEL SAFE  STRICT;
CREATE FUNCTION

ipdr=>
ipdr=> create or replace function parallel_test_sql() returns bigint as
ipdr-> $$
ipdr$>    select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
ipdr$> $$ language sql PARALLEL SAFE STRICT;
CREATE FUNCTION

ipdr=> analyze test;
ANALYZE
ipdr=> explain (analyze,buffers) select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
                                                                   QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=87702.33..87702.34 rows=1 width=8) (actual time=723.792..723.792 rows=1 loops=1)   Buffers: shared
hit=65015  ->  Finalize HashAggregate  (cost=87364.49..87514.64 rows=15015 width=28) (actual time=720.496..722.589
rows=15015loops=1)         Group Key: test.a, test.b, test.c, test.d, test.e         Buffers: shared hit=65015
-> Gather  (cost=85149.78..85299.93 rows=165165 width=20) (actual time=502.607..665.039 rows=180180 loops=1)
  Workers Planned: 11               Workers Launched: 11               Buffers: shared hit=65015               ->
PartialHashAggregate  (cost=84149.78..84299.93 rows=15015 width=20) (actual time=497.106..501.170 rows=15015 loops=12)
                  Group Key: test.a, test.b, test.c, test.d, test.e                     Buffers: shared hit=63695
             ->  Parallel Seq Scan on test  (cost=0.00..72786.01 rows=909101 width=20) (actual time=0.018..166.556
rows=833333loops=12)                           Buffers: shared hit=63695 Planning time: 0.250 ms Execution time:
724.293ms
 
(16 rows)

ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();                                           QUERY PLAN
------------------------------------------------------------------------------------------------ Gather
(cost=1000.00..1000.26rows=1 width=8) (actual time=4088.952..4088.956 rows=1 loops=1)   Workers Planned: 1   Workers
Launched:1   Single Copy: true   Buffers: shared hit=64186   ->  Result  (cost=0.00..0.26 rows=1 width=8) (actual
time=4084.997..4084.999rows=1 loops=1)         Buffers: shared hit=64149 Planning time: 0.025 ms Execution time:
4100.026ms
 
(9 rows)

Log from auto_explain:
2016-09-16 16:05:11 MSK [28209]: [1-1] user=,db=,app=,client= LOG:  duration: 4082.517 ms  plan:        Query Text:
selectcount(*)          from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t        Aggregate
(cost=289035.43..289035.44rows=1 width=8)          ->  HashAggregate  (cost=288697.59..288847.74 rows=15015 width=28)
            Group Key: test.a, test.b, test.c, test.d, test.e                ->  Seq Scan on test
(cost=0.00..163696.15rows=10000115 width=20)
 


ipdr=> explain (analyze,buffers) select  parallel_test_sql();                                           QUERY PLAN
------------------------------------------------------------------------------------------------ Gather
(cost=1000.00..1000.26rows=1 width=8) (actual time=4256.830..4256.837 rows=1 loops=1)   Workers Planned: 1   Workers
Launched:1   Single Copy: true   Buffers: shared hit=64132   ->  Result  (cost=0.00..0.26 rows=1 width=8) (actual
time=4252.401..4252.403rows=1 loops=1)         Buffers: shared hit=64095 Planning time: 0.151 ms Execution time:
4267.959ms
 
(9 rows)

Log from auto_explain:
2016-09-16 16:22:03 MSK [731]: [1-1] user=,db=,app=,client= LOG:  duration: 4249.851 ms  plan:        Query Text:
   select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
 
        Aggregate  (cost=289035.43..289035.44 rows=1 width=8)          ->  HashAggregate  (cost=288697.59..288847.74
rows=15015width=28)                Group Key: test.a, test.b, test.c, test.d, test.e                ->  Seq Scan on
test (cost=0.00..163696.15 rows=10000115 width=20)
 



So as we can see parallel secscan doesn't works in plpgsql and sql functions.
Can somebody explains me where I was wrong?


Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On 16.09.2016 07:27, Ashutosh Bapat wrote:
> On Thu, Sep 15, 2016 at 9:15 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:
>> Hello!
>> Does parallel secscan works in plpgsql?
>>
>
> Parallel seq scan is a query optimization that will work independent
> of the source of the query - i.e whether it comes directly from a
> client or a procedural language like plpgsql. So, I guess, answer to
> your question is yes. If you are expecting something else, more
> context will help.
>

Re: Parallel sec scan in plpgsql

От

Amit Kapila

Дата:

16 сентября 2016 г., 13:50:13

On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:
> No it doesn't.
> Paralleling neither sql function nor plpgsql:
> Here is example :
>
> ipdr=> show max_worker_processes ;
>  max_worker_processes
> ----------------------
>  128
> (1 row)
> ipdr=> set max_parallel_workers_per_gather to 128;
> SET
> ipdr=> set force_parallel_mode=on;
> SET
> ipdr=> set min_parallel_relation_size =0;
> SET
> ipdr=> set parallel_tuple_cost=0;
> SET
>

Can you try by setting force_parallel_mode = off;?  I think it is
sending the whole function execution to worker due to
force_parallel_mode.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel sec scan in plpgsql

От

Alex Ignatov

Дата:

16 сентября 2016 г., 15:18:26

On 16.09.2016 16:50, Amit Kapila wrote:
> On Fri, Sep 16, 2016 at 6:57 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:
>> No it doesn't.
>> Paralleling neither sql function nor plpgsql:
>> Here is example :
>>
>> ipdr=> show max_worker_processes ;
>>  max_worker_processes
>> ----------------------
>>  128
>> (1 row)
>> ipdr=> set max_parallel_workers_per_gather to 128;
>> SET
>> ipdr=> set force_parallel_mode=on;
>> SET
>> ipdr=> set min_parallel_relation_size =0;
>> SET
>> ipdr=> set parallel_tuple_cost=0;
>> SET
>>
>
> Can you try by setting force_parallel_mode = off;?  I think it is
> sending the whole function execution to worker due to
> force_parallel_mode.
>
>

No changes:

ipdr=> set max_parallel_workers_per_gather to 128;
SET
ipdr=> set min_parallel_relation_size =0;
SET
ipdr=> set parallel_tuple_cost=0;
SET
ipdr=> set force_parallel_mode = off;
SET
ipdr=> select name,setting from pg_settings where name in('max_parallel_workers_per_gather',
ipdr(>                                                    'min_parallel_relation_size',
ipdr(>                                                    'parallel_tuple_cost',
ipdr(>                                                    'force_parallel_mode');              name               |
setting
---------------------------------+--------- force_parallel_mode             | off max_parallel_workers_per_gather | 128
min_parallel_relation_size     | 0 parallel_tuple_cost             | 0
 
(4 rows)

ipdr=> explain (analyze,buffers) select count(*) from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t;
                                                            QUERY PLAN
 

-------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=87702.33..87702.34 rows=1 width=8) (actual time=709.643..709.643 rows=1 loops=1)   Buffers: shared
hit=65015  ->  Finalize HashAggregate  (cost=87364.49..87514.64 rows=15015 width=28) (actual time=706.382..708.456
rows=15015loops=1)         Group Key: test.a, test.b, test.c, test.d, test.e         Buffers: shared hit=65015
-> Gather  (cost=85149.78..85299.93 rows=165165 width=20) (actual time=478.626..645.209 rows=180180 loops=1)
  Workers Planned: 11               Workers Launched: 11               Buffers: shared hit=65015               ->
PartialHashAggregate  (cost=84149.78..84299.93 rows=15015 width=20) (actual time=473.890..478.309 rows=15015 loops=12)
                  Group Key: test.a, test.b, test.c, test.d, test.e                     Buffers: shared hit=63695
             ->  Parallel Seq Scan on test  (cost=0.00..72786.01 rows=909101 width=20) (actual time=0.021..163.120
rows=833333loops=12)                           Buffers: shared hit=63695 Planning time: 0.318 ms Execution time:
710.600ms
 
(16 rows)


ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();                                        QUERY PLAN
------------------------------------------------------------------------------------------ Result  (cost=0.00..0.26
rows=1width=8) (actual time=4003.719..4003.720 rows=1 loops=1)   Buffers: shared hit=63869 Planning time: 0.021 ms
Executiontime: 4003.769 ms
 
(4 rows)

auto_explain:
2016-09-16 18:02:29 MSK [29353]: [53-1] user=ipdr,db=ipdr,app=psql,client=[local] LOG:  duration: 4001.275 ms  plan:
   Query Text: select count(*)          from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t
Aggregate (cost=289035.43..289035.44 rows=1 width=8)          ->  HashAggregate  (cost=288697.59..288847.74 rows=15015
width=28)               Group Key: test.a, test.b, test.c, test.d, test.e                ->  Seq Scan on test
(cost=0.00..163696.15rows=10000115 width=20)
 
2016-09-16 18:02:29 MSK [29353]: [54-1] user=ipdr,db=ipdr,app=psql,client=[local] CONTEXT:  SQL statement "select
count(*)         from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t"        PL/pgSQL function
parallel_test_plpgsql()line 5 at SQL statement
 


ipdr=> explain (analyze,buffers) select  parallel_test_plpgsql();                                        QUERY PLAN
------------------------------------------------------------------------------------------ Result  (cost=0.00..0.26
rows=1width=8) (actual time=4497.820..4497.822 rows=1 loops=1)   Buffers: shared hit=63695 Planning time: 0.023 ms
Executiontime: 4497.872 ms
 
(4 rows)

auto_explain:
2016-09-16 18:03:23 MSK [29353]: [57-1] user=ipdr,db=ipdr,app=psql,client=[local] LOG:  duration: 4497.050 ms  plan:
   Query Text: select count(*)          from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t
Aggregate (cost=289035.43..289035.44 rows=1 width=8)          ->  HashAggregate  (cost=288697.59..288847.74 rows=15015
width=28)               Group Key: test.a, test.b, test.c, test.d, test.e                ->  Seq Scan on test
(cost=0.00..163696.15rows=10000115 width=20)
 
2016-09-16 18:03:23 MSK [29353]: [58-1] user=ipdr,db=ipdr,app=psql,client=[local] CONTEXT:  SQL statement "select
count(*)         from (select a,b,c,d,e,sum(bytes) from test group by a,b,c,d,e)t"        PL/pgSQL function
parallel_test_plpgsql()line 5 at SQL statement

Re: Parallel sec scan in plpgsql

От

Amit Kapila

Дата:

18 сентября 2016 г., 03:54:56

On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:
>
> On 16.09.2016 16:50, Amit Kapila wrote:
>>
>>
>> Can you try by setting force_parallel_mode = off;?  I think it is
>> sending the whole function execution to worker due to
>> force_parallel_mode.
>>
>>
>
> No changes:
>

Okay, it just skipped from my mind that we don't support parallel
queries for SQL statement execution (or statements executed via
exec_stmt_execsql) from plpgsql.  For detailed explanation of why that
is not feasible you can refer one of my earlier e-mails [1] on similar
topic.  I think if we can somehow get the results via Perform
statement, then it could be possible to use parallelism via plpgsql.

However, you can use it via SQL functions, an example is below:

set min_parallel_relation_size =0;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;

Load 'auto_explain';
set auto_explain.log_min_duration = 0;
set auto_explain.log_analyze = true;
set auto_explain.log_nested_statements = true;

create table test_plpgsql(c1 int, c2 char(1000));
insert into test_plpgsql values(generate_series(1,100000),'aaa');

create or replace function parallel_test_set_sql() returns
setof bigint as $$
select count(*) from test_plpgsql;
$$language sql PARALLEL SAFE STRICT STABLE;

Then execute function as: select * from parallel_test_set_sql();  You
can see below plan if auto_explain module is loaded.
       Finalize Aggregate  (cost=14806.85..14806.86 rows=1 width=8) (actual tim
e=1094.966..1094.967 rows=1 loops=1)         ->  Gather  (cost=14806.83..14806.84 rows=2 width=8) (actual time=472.
216..1094.943 rows=3 loops=1)               Workers Planned: 2               Workers Launched: 2               ->
PartialAggregate  (cost=14806.83..14806.84 rows=1 width=8)

(actual time=177.867..177.868 rows=1 loops=3)                     ->  Parallel Seq Scan on test_plpgsql
(cost=0.00..14702.6
7 rows=41667 width=0) (actual time=0.384..142.565 rows=33333 loops=3)
CONTEXT:  SQL function "parallel_test_set_sql" statement 1
LOG:  duration: 2965.040 ms  plan:       Query Text: select * from parallel_test_set_sql();       Function Scan on
parallel_test_set_sql (cost=0.25..10.25 rows=1000 widt

h=8) (actual time=2538.620..2776.955 rows=1 loops=1)

In general, I think we should support the cases as required (or
written) by you from plpgsql or sql functions.  We need more work to
support such cases. There are probably two ways of supporting such
cases, we can build some intelligence in plpgsql execution such that
it can recognise such queries and allow to use parallelism or we need
to think of enabling parallelism for cases where we don't run the plan
to completion.  Most of the use cases from plpgsql or sql function
fall into later category as they don't generally run the plan to
completion.

[1] - https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel sec scan in plpgsql

От

Robert Haas

Дата:

19 сентября 2016 г., 18:09:07

On Sat, Sep 17, 2016 at 11:54 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> In general, I think we should support the cases as required (or
> written) by you from plpgsql or sql functions.  We need more work to
> support such cases. There are probably two ways of supporting such
> cases, we can build some intelligence in plpgsql execution such that
> it can recognise such queries and allow to use parallelism or we need
> to think of enabling parallelism for cases where we don't run the plan
> to completion.  Most of the use cases from plpgsql or sql function
> fall into later category as they don't generally run the plan to
> completion.

I think there's certainly more work that could be done to teach
PL/pgsql about cases where the query will run to completion.  I didn't
work very hard to make sure we covered all of those; there are
probably several different cases where parallelism could be safely
enabled but currently isn't.  Also, I didn't do anything at all to
update the other PLs, and that would be good, too.

However, I think the chances of supporting parallel query for queries
not executed to completion any time in the near future are very poor.
Once the query is suspended, the user can do anything they like,
including stuff that's parallel-unsafe, and then we have no choice but
to error out, and that's not what we want to happen.  If we had
absolutely no parallel-unsafe operations - which would be quite a feat
- then we might be able to get there.  But even then you have the
problem that while the query is suspended, you are still nailing down
workers that are sitting around idle, waiting for the leader to resume
the query, and that's not very good either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel sec scan in plpgsql

От

Alex Ignatov

Дата:

20 сентября 2016 г., 11:15:54

On 18.09.2016 06:54, Amit Kapila wrote:
> On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:
>>
>> On 16.09.2016 16:50, Amit Kapila wrote:
>>>
>>>
>>> Can you try by setting force_parallel_mode = off;?  I think it is
>>> sending the whole function execution to worker due to
>>> force_parallel_mode.
>>>
>>>
>>
>> No changes:
>>
>
> Okay, it just skipped from my mind that we don't support parallel
> queries for SQL statement execution (or statements executed via
> exec_stmt_execsql) from plpgsql.  For detailed explanation of why that
> is not feasible you can refer one of my earlier e-mails [1] on similar
> topic.  I think if we can somehow get the results via Perform
> statement, then it could be possible to use parallelism via plpgsql.
>
> However, you can use it via SQL functions, an example is below:
>
> set min_parallel_relation_size =0;
> set parallel_tuple_cost=0;
> set parallel_setup_cost=0;
>
> Load 'auto_explain';
> set auto_explain.log_min_duration = 0;
> set auto_explain.log_analyze = true;
> set auto_explain.log_nested_statements = true;
>
> create table test_plpgsql(c1 int, c2 char(1000));
> insert into test_plpgsql values(generate_series(1,100000),'aaa');
>
> create or replace function parallel_test_set_sql() returns
> setof bigint as $$
> select count(*) from test_plpgsql;
> $$language sql PARALLEL SAFE STRICT STABLE;
>
> Then execute function as: select * from parallel_test_set_sql();  You
> can see below plan if auto_explain module is loaded.
>
>         Finalize Aggregate  (cost=14806.85..14806.86 rows=1 width=8) (actual tim
> e=1094.966..1094.967 rows=1 loops=1)
>           ->  Gather  (cost=14806.83..14806.84 rows=2 width=8) (actual time=472.
> 216..1094.943 rows=3 loops=1)
>                 Workers Planned: 2
>                 Workers Launched: 2
>                 ->  Partial Aggregate  (cost=14806.83..14806.84 rows=1 width=8)
> (actual time=177.867..177.868 rows=1 loops=3)
>                       ->  Parallel Seq Scan on test_plpgsql  (cost=0.00..14702.6
> 7 rows=41667 width=0) (actual time=0.384..142.565 rows=33333 loops=3)
> CONTEXT:  SQL function "parallel_test_set_sql" statement 1
> LOG:  duration: 2965.040 ms  plan:
>         Query Text: select * from parallel_test_set_sql();
>         Function Scan on parallel_test_set_sql  (cost=0.25..10.25 rows=1000 widt
> h=8) (actual time=2538.620..2776.955 rows=1 loops=1)
>
>
> In general, I think we should support the cases as required (or
> written) by you from plpgsql or sql functions.  We need more work to
> support such cases. There are probably two ways of supporting such
> cases, we can build some intelligence in plpgsql execution such that
> it can recognise such queries and allow to use parallelism or we need
> to think of enabling parallelism for cases where we don't run the plan
> to completion.  Most of the use cases from plpgsql or sql function
> fall into later category as they don't generally run the plan to
> completion.
>
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com
>

Thank you for you sugestion! That works.

But what  we can do with this function:
create or replace function parallel_test_sql(t int) returns setof bigint as
$$   select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t;
$$ language sql PARALLEL SAFE STRICT STABLE;

explain (analyze,buffers) select  * from  parallel_test_sql(2);

"Function Scan on parallel_test_sql  (cost=0.25..10.25 rows=1000 width=8) (actual time=2410.789..2410.790 rows=1
loops=1)"
"  Buffers: shared hit=63696"
"Planning time: 0.082 ms"
"Execution time: 2410.841 ms"

2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - Query Tool,client=127.0.0.1 LOG:  duration:
2410.135ms  plan:        Query Text:           select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1
groupby a,b,c,d,e)t;
 
        Aggregate  (cost=230701.42..230701.43 rows=1 width=8)          ->  HashAggregate  (cost=230363.59..230513.74
rows=15015width=28)                Group Key: test.a, test.b, test.c, test.d, test.e                ->  Seq Scan on
test (cost=0.00..188696.44 rows=3333372 width=20)                      Filter: (a >= $1)
 


No parallelism again. Looks like that Filter: (a >= $1)  breaks parallelism



Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Parallel sec scan in plpgsql

От

Amit Kapila

Дата:

20 сентября 2016 г., 13:24:11

On Mon, Sep 19, 2016 at 11:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Sep 17, 2016 at 11:54 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> In general, I think we should support the cases as required (or
>> written) by you from plpgsql or sql functions.  We need more work to
>> support such cases. There are probably two ways of supporting such
>> cases, we can build some intelligence in plpgsql execution such that
>> it can recognise such queries and allow to use parallelism or we need
>> to think of enabling parallelism for cases where we don't run the plan
>> to completion.  Most of the use cases from plpgsql or sql function
>> fall into later category as they don't generally run the plan to
>> completion.
>
> I think there's certainly more work that could be done to teach
> PL/pgsql about cases where the query will run to completion.  I didn't
> work very hard to make sure we covered all of those; there are
> probably several different cases where parallelism could be safely
> enabled but currently isn't.  Also, I didn't do anything at all to
> update the other PLs, and that would be good, too.
>
> However, I think the chances of supporting parallel query for queries
> not executed to completion any time in the near future are very poor.
>

I think here point is that for any case where there is count of rows
to be selected, we disable parallelism.  There are many genuine cases
like select count(*) into cnt ... which will run to completion, but as
plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode.
There are couple other cases like that.  Do you see a reason for not
enabling parallelism for such cases?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel sec scan in plpgsql

От

Tom Lane

Дата:

20 сентября 2016 г., 14:58:26

Amit Kapila <amit.kapila16@gmail.com> writes:
> On Mon, Sep 19, 2016 at 11:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> However, I think the chances of supporting parallel query for queries
>> not executed to completion any time in the near future are very poor.

> I think here point is that for any case where there is count of rows
> to be selected, we disable parallelism.  There are many genuine cases
> like select count(*) into cnt ... which will run to completion, but as
> plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode.
> There are couple other cases like that.  Do you see a reason for not
> enabling parallelism for such cases?

The other problem that would have to be confronted here is nesting,
ie it would only be OK for a plpgsql function to invoke a parallel
query if it wasn't itself being executed by a parallel worker ---
or maybe even if it's being executed by the leader process but there's
an active Gather somewhere else in the calling query's plan tree.
(Not sure about the implementation's properties for that case.)
We'd have to decide whether we want plancache to track both parallel
and nonparallel plans for plpgsql queries.  Do-able no doubt but
pretty ugly.  Maybe it would be better to fix the rule against workers
invoking their own parallel queries.

However that's probably moot unless the not-executing-to-completion
issue can be solved.
        regards, tom lane

Re: Parallel sec scan in plpgsql

От

Robert Haas

Дата:

20 сентября 2016 г., 15:01:25

On Tue, Sep 20, 2016 at 9:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think here point is that for any case where there is count of rows
> to be selected, we disable parallelism.  There are many genuine cases
> like select count(*) into cnt ... which will run to completion, but as
> plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode.
> There are couple other cases like that.  Do you see a reason for not
> enabling parallelism for such cases?

If we can somehow know that the rowcount which is passed is greater
than or equal to the actual number of rows which will be generated,
then it's fine to enable parallelism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel sec scan in plpgsql

От

Robert Haas

Дата:

20 сентября 2016 г., 15:03:02

On Tue, Sep 20, 2016 at 10:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maybe it would be better to fix the rule against workers
> invoking their own parallel queries.

That rule does have the advantage of preventing us from having one
user backend launch N^2 workers.  I don't think it would be that much
work to fix it, but the results might be pretty exciting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel sec scan in plpgsql

От

Amit Kapila

Дата:

22 сентября 2016 г., 12:36:50

On Tue, Sep 20, 2016 at 8:31 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Sep 20, 2016 at 9:24 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think here point is that for any case where there is count of rows
>> to be selected, we disable parallelism.  There are many genuine cases
>> like select count(*) into cnt ... which will run to completion, but as
>> plpgsql passes row count to be 1 or 2, it doesn't enter parallel mode.
>> There are couple other cases like that.  Do you see a reason for not
>> enabling parallelism for such cases?
>
> If we can somehow know that the rowcount which is passed is greater
> than or equal to the actual number of rows which will be generated,
> then it's fine to enable parallelism.
>

I think for certain cases like into clause, the rows passed will be
equal to actual number of rows, otherwise it will generate error.  So
we can pass that information in executor layer.  Another kind of cases
which are worth considering are when from plpgsql we fetch limited
rows at-a-time, but we fetch till end like the case of
exec_stmt_return_query().

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Parallel sec scan in plpgsql

От

Robert Haas

Дата:

22 сентября 2016 г., 14:02:25

On Thu, Sep 22, 2016 at 8:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> I think for certain cases like into clause, the rows passed will be
> equal to actual number of rows, otherwise it will generate error.  So
> we can pass that information in executor layer.  Another kind of cases
> which are worth considering are when from plpgsql we fetch limited
> rows at-a-time, but we fetch till end like the case of
> exec_stmt_return_query().

Yes, I think that those cases can be considered.  Some careful code
inspection will be needed to make sure the cases we want to enable are
safe, and some testing will be needed to make sure they behave
properly.  But it doesn't sound like an unsolvable problem.  I hope
someone who isn't me will decide to work on it.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Parallel sec scan in plpgsql

От

Amit Kapila

Дата:

23 сентября 2016 г., 03:34:21

On Thu, Sep 22, 2016 at 7:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 22, 2016 at 8:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> I think for certain cases like into clause, the rows passed will be
>> equal to actual number of rows, otherwise it will generate error.  So
>> we can pass that information in executor layer.  Another kind of cases
>> which are worth considering are when from plpgsql we fetch limited
>> rows at-a-time, but we fetch till end like the case of
>> exec_stmt_return_query().
>
> Yes, I think that those cases can be considered.  Some careful code
> inspection will be needed to make sure the cases we want to enable are
> safe, and some testing will be needed to make sure they behave
> properly.  But it doesn't sound like an unsolvable problem.  I hope
> someone who isn't me will decide to work on it.  :-)
>

makes sense.  I think along with that we can also evaluate, if we can
enable parallel query from other pl languages. I think if we can
enable parallelism from all pl languages in 10.0, that will be a good
step forward.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Parallel sec scan in plpgsql