Обсуждение: Choosing parallel_degree

Поиск
Список
Период
Сортировка

Choosing parallel_degree

От
David Rowley
Дата:
Over in [1] James mentioned about wanting more to be able to have more
influence over the partial path's parallel_degree decision.  At risk
of a discussion on that hijacking the parallel aggregate thread, I
thought I'd start this for anyone who would want to discuss making
changes to that.

I've attached a simple C program which shows the parallel_degree which
will be chosen at the moment. For now it's based on the size of the
base relation. Perhaps that will need to be rethought later, perhaps
based on costs. But I just don't think it's something for 9.6.

Here's the output of the C program.

For 1 pages there will be 1 workers (rel size 0 MB, 0 GB)
For 3001 pages there will be 2 workers (rel size 23 MB, 0 GB)
For 9001 pages there will be 3 workers (rel size 70 MB, 0 GB)
For 27001 pages there will be 4 workers (rel size 210 MB, 0 GB)
For 81001 pages there will be 5 workers (rel size 632 MB, 0 GB)
For 243001 pages there will be 6 workers (rel size 1898 MB, 1 GB)
For 729001 pages there will be 7 workers (rel size 5695 MB, 5 GB)
For 2187001 pages there will be 8 workers (rel size 17085 MB, 16 GB)
For 6561001 pages there will be 9 workers (rel size 51257 MB, 50 GB)
For 19683001 pages there will be 10 workers (rel size 153773 MB, 150 GB)
For 59049001 pages there will be 11 workers (rel size 461320 MB, 450 GB)
For 177147001 pages there will be 12 workers (rel size 1383960 MB, 1351 GB)
For 531441001 pages there will be 13 workers (rel size 4151882 MB, 4054 GB)
For 1594323001 pages there will be 14 workers (rel size 12455648 MB, 12163 GB)

[1] http://www.postgresql.org/message-id/CANkGpBtUvzpdvF2=_iQ64UjmVrPYcS6d4i9-wepbUsq1sq+AWw@mail.gmail.com

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: Choosing parallel_degree

От
James Sewell
Дата:
Thanks David,

Eventually it would be great to take into account the cost of the function doing the agg (pg_proc.procost, which is a multiple of CPU units).

This would allow people to mark specific aggregations as needing more CPU power, therefore needing more workers per page (or should it be tuple in this case?).

In the meantime some way to manually influence this would be good. I just did some testing (on an 8VCPU machine) with a 139MB table, which gets 3 workers currently.

For a count(*) I get a time of 131.754 ms. If I increase this to 8 workers I get around 86.193 ms.

Obviously this doesn't mean much as YMMV - but it does show that the ability to manually adjust the scaling would be great, especially in data warehouse or reporting environments.

I did want to test with some really slow aggs, but even when I take out the small table test in create_parallel_paths I can't seem to get a parallel plan for a tiny table. Any idea on why this would be David?


Cheers,













James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
 

Level 2, 50 Queen St, Melbourne VIC 3000

(+61) 3 8370 8000  W www.lisasoft.com  (+61) 3 8370 8099
 

On Tue, Mar 15, 2016 at 12:25 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
Over in [1] James mentioned about wanting more to be able to have more
influence over the partial path's parallel_degree decision.  At risk
of a discussion on that hijacking the parallel aggregate thread, I
thought I'd start this for anyone who would want to discuss making
changes to that.

I've attached a simple C program which shows the parallel_degree which
will be chosen at the moment. For now it's based on the size of the
base relation. Perhaps that will need to be rethought later, perhaps
based on costs. But I just don't think it's something for 9.6.

Here's the output of the C program.

For 1 pages there will be 1 workers (rel size 0 MB, 0 GB)
For 3001 pages there will be 2 workers (rel size 23 MB, 0 GB)
For 9001 pages there will be 3 workers (rel size 70 MB, 0 GB)
For 27001 pages there will be 4 workers (rel size 210 MB, 0 GB)
For 81001 pages there will be 5 workers (rel size 632 MB, 0 GB)
For 243001 pages there will be 6 workers (rel size 1898 MB, 1 GB)
For 729001 pages there will be 7 workers (rel size 5695 MB, 5 GB)
For 2187001 pages there will be 8 workers (rel size 17085 MB, 16 GB)
For 6561001 pages there will be 9 workers (rel size 51257 MB, 50 GB)
For 19683001 pages there will be 10 workers (rel size 153773 MB, 150 GB)
For 59049001 pages there will be 11 workers (rel size 461320 MB, 450 GB)
For 177147001 pages there will be 12 workers (rel size 1383960 MB, 1351 GB)
For 531441001 pages there will be 13 workers (rel size 4151882 MB, 4054 GB)
For 1594323001 pages there will be 14 workers (rel size 12455648 MB, 12163 GB)

[1] http://www.postgresql.org/message-id/CANkGpBtUvzpdvF2=_iQ64UjmVrPYcS6d4i9-wepbUsq1sq+AWw@mail.gmail.com

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Choosing parallel_degree

От
David Rowley
Дата:
On 15 March 2016 at 15:24, James Sewell <james.sewell@lisasoft.com> wrote:
>
> I did want to test with some really slow aggs, but even when I take out the small table test in create_parallel_paths
Ican't seem to get a parallel plan for a tiny table. Any idea on why this would be David?
 

In the test program I attached to the previous email, if I change the
parallel_threshold = 1000; to be parallel_threshold = 1; then I get
the following output:

For 1 pages there will be 1 workers (rel size 0 MB, 0 GB)
For 4 pages there will be 2 workers (rel size 0 MB, 0 GB)

So I'm getting 2 workers for only 4 pages. I've not tested in
Postgres, but if you do this and: SET parallel_setup_cost = 0; then
I'd imagine it should generate a parallel plan.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Mon, Mar 14, 2016 at 9:25 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Over in [1] James mentioned about wanting more to be able to have more
> influence over the partial path's parallel_degree decision.  At risk
> of a discussion on that hijacking the parallel aggregate thread, I
> thought I'd start this for anyone who would want to discuss making
> changes to that.
>
> I've attached a simple C program which shows the parallel_degree which
> will be chosen at the moment. For now it's based on the size of the
> base relation. Perhaps that will need to be rethought later, perhaps
> based on costs. But I just don't think it's something for 9.6.

I thought about this a bit more.  There are a couple of easy things we
could do here.

The 1000-page threshold could be made into a GUC.

We could add a per-table reloption for parallel-degree that would
override the calculation.

Neither of those things is very smart, but they'd probably both help
some people.  If someone is able to produce a patch for either or both
of these things *quickly*, we could possibly try to squeeze it into
9.6 as a cleanup of work already done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 15/03/2016 21:12, Robert Haas wrote:
> On Mon, Mar 14, 2016 at 9:25 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> Over in [1] James mentioned about wanting more to be able to have more
>> influence over the partial path's parallel_degree decision.  At risk
>> of a discussion on that hijacking the parallel aggregate thread, I
>> thought I'd start this for anyone who would want to discuss making
>> changes to that.
>>
>> I've attached a simple C program which shows the parallel_degree which
>> will be chosen at the moment. For now it's based on the size of the
>> base relation. Perhaps that will need to be rethought later, perhaps
>> based on costs. But I just don't think it's something for 9.6.
>
> I thought about this a bit more.  There are a couple of easy things we
> could do here.
>
> The 1000-page threshold could be made into a GUC.
>
> We could add a per-table reloption for parallel-degree that would
> override the calculation.
>
> Neither of those things is very smart, but they'd probably both help
> some people.  If someone is able to produce a patch for either or both
> of these things *quickly*, we could possibly try to squeeze it into
> 9.6 as a cleanup of work already done.
>

I'm not too familiar with parallel planning, but I tried to implement
both in attached patch. I didn't put much effort into the
parallel_threshold GUC documentation, because I didn't really see a good
way to explain it. I'd e happy to improve it if needed. Also, to make
this parameter easier to tune for users, perhaps we could divide the
default value by 3 and use it as is in the first iteration in
create_parallel_path() ?

Also, global max_parallel_degree still needs to be at least 1 for the
per table value to be considered.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
David Rowley
Дата:
On 16 March 2016 at 13:26, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
> On 15/03/2016 21:12, Robert Haas wrote:
>> I thought about this a bit more.  There are a couple of easy things we
>> could do here.
>>
>> The 1000-page threshold could be made into a GUC.
>>
>> We could add a per-table reloption for parallel-degree that would
>> override the calculation.
>>
>> Neither of those things is very smart, but they'd probably both help
>> some people.  If someone is able to produce a patch for either or both
>> of these things *quickly*, we could possibly try to squeeze it into
>> 9.6 as a cleanup of work already done.
>>
>
> I'm not too familiar with parallel planning, but I tried to implement
> both in attached patch. I didn't put much effort into the
> parallel_threshold GUC documentation, because I didn't really see a good
> way to explain it. I'd e happy to improve it if needed. Also, to make
> this parameter easier to tune for users, perhaps we could divide the
> default value by 3 and use it as is in the first iteration in
> create_parallel_path() ?
>
> Also, global max_parallel_degree still needs to be at least 1 for the
> per table value to be considered.

Thanks for working on this. I've only skimmed the patch so far, but
will try to look more closely later.

This did get me wondering why we have the parallel_threshold at all,
and not just allow the parallel_setup_cost to make parallel plans look
less favourable for smaller relations. I assume that this is so that
we don't burden the planner with the overhead of generating parallel
paths for smaller relations?

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Tue, Mar 15, 2016 at 8:45 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 16 March 2016 at 13:26, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>> On 15/03/2016 21:12, Robert Haas wrote:
>>> I thought about this a bit more.  There are a couple of easy things we
>>> could do here.
>>>
>>> The 1000-page threshold could be made into a GUC.
>>>
>>> We could add a per-table reloption for parallel-degree that would
>>> override the calculation.
>>>
>>> Neither of those things is very smart, but they'd probably both help
>>> some people.  If someone is able to produce a patch for either or both
>>> of these things *quickly*, we could possibly try to squeeze it into
>>> 9.6 as a cleanup of work already done.
>>>
>>
>> I'm not too familiar with parallel planning, but I tried to implement
>> both in attached patch. I didn't put much effort into the
>> parallel_threshold GUC documentation, because I didn't really see a good
>> way to explain it. I'd e happy to improve it if needed. Also, to make
>> this parameter easier to tune for users, perhaps we could divide the
>> default value by 3 and use it as is in the first iteration in
>> create_parallel_path() ?
>>
>> Also, global max_parallel_degree still needs to be at least 1 for the
>> per table value to be considered.
>
> Thanks for working on this. I've only skimmed the patch so far, but
> will try to look more closely later.
>
> This did get me wondering why we have the parallel_threshold at all,
> and not just allow the parallel_setup_cost to make parallel plans look
> less favourable for smaller relations. I assume that this is so that
> we don't burden the planner with the overhead of generating parallel
> paths for smaller relations?

Right.  And, also, we need some heuristic for judging how many workers
to deploy.  parallel_setup_cost is of no use in making that decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
James Sewell
Дата:
On Wed, Mar 16, 2016 at 11:26 AM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:

I'm not too familiar with parallel planning, but I tried to implement
both in attached patch. I didn't put much effort into the
parallel_threshold GUC documentation, because I didn't really see a good
way to explain it. I'd e happy to improve it if needed. Also, to make
this parameter easier to tune for users, perhaps we could divide the
default value by 3 and use it as is in the first iteration in
create_parallel_path() ?

Also, global max_parallel_degree still needs to be at least 1 for the
per table value to be considered.


All applies and works from my end.

Is the max_parallel_degree per table of much use here? It allows the max number of workers per table to be set - but it's still bound by the same formula (now from the GUC). So in reality it's only really useful for limiting the number of workers, not raising it.

Would there be a common use case for limiting parallelism on a subset of tables in a database you've explicitly set to have a higher amount of parallel operations via the GUC? I struggle to think of one?

I think in practicality the reverse would be more common, you'd want to set certain tables to a starting point of a certain number of workers (and ramp up to more if the formula allowed it). You could set this to 0 for never use parallel agg on this table.

Another option is to allow access to the the threshold multiplier (currently hard coded to 3) per table - but this might become pretty hard to explain succinctly in the documentation.

Cheers,
James

 


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 16/03/2016 05:45, James Sewell wrote:
> On Wed, Mar 16, 2016 at 11:26 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>>wrote:
> 
> 
>     I'm not too familiar with parallel planning, but I tried to implement
>     both in attached patch. I didn't put much effort into the
>     parallel_threshold GUC documentation, because I didn't really see a good
>     way to explain it. I'd e happy to improve it if needed. Also, to make
>     this parameter easier to tune for users, perhaps we could divide the
>     default value by 3 and use it as is in the first iteration in
>     create_parallel_path() ?
> 
>     Also, global max_parallel_degree still needs to be at least 1 for the
>     per table value to be considered.
> 
> 
> All applies and works from my end.
> 

Thanks for testing!

> Is the max_parallel_degree per table of much use here? It allows the max
> number of workers per table to be set - but it's still bound by the same
> formula (now from the GUC). So in reality it's only really useful for
> limiting the number of workers, not raising it.
> 

You can set a global max_parallel_degree low, and raise it per table. If
you set up max_parallel_degree to 1, you can "activate" parallel workers
for only a subset of tables.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
Robert Haas
Дата:
<div dir="ltr">On Tue, Mar 15, 2016 at 8:26 PM, Julien Rouhaud <span dir="ltr"><<a
href="mailto:julien.rouhaud@dalibo.com"target="_blank">julien.rouhaud@dalibo.com</a>></span> wrote:<br /><div
class="gmail_extra"><divclass="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#cccsolid;padding-left:1ex"><span class="">On 15/03/2016 21:12, Robert Haas wrote:<br /> > On Mon, Mar 14, 2016 at
9:25PM, David Rowley<br /> > <<a href="mailto:david.rowley@2ndquadrant.com">david.rowley@2ndquadrant.com</a>>
wrote:<br/> >> Over in [1] James mentioned about wanting more to be able to have more<br /> >> influence
overthe partial path's parallel_degree decision.  At risk<br /> >> of a discussion on that hijacking the parallel
aggregatethread, I<br /> >> thought I'd start this for anyone who would want to discuss making<br /> >>
changesto that.<br /> >><br /> >> I've attached a simple C program which shows the parallel_degree which<br
/>>> will be chosen at the moment. For now it's based on the size of the<br /> >> base relation. Perhaps
thatwill need to be rethought later, perhaps<br /> >> based on costs. But I just don't think it's something for
9.6.<br/> ><br /> > I thought about this a bit more.  There are a couple of easy things we<br /> > could do
here.<br/> ><br /> > The 1000-page threshold could be made into a GUC.<br /> ><br /> > We could add a
per-tablereloption for parallel-degree that would<br /> > override the calculation.<br /> ><br /> > Neither of
thosethings is very smart, but they'd probably both help<br /> > some people.  If someone is able to produce a patch
foreither or both<br /> > of these things *quickly*, we could possibly try to squeeze it into<br /> > 9.6 as a
cleanupof work already done.<br /> ><br /><br /></span>I'm not too familiar with parallel planning, but I tried to
implement<br/> both in attached patch. I didn't put much effort into the<br /> parallel_threshold GUC documentation,
becauseI didn't really see a good<br /> way to explain it. I'd e happy to improve it if needed. Also, to make<br />
thisparameter easier to tune for users, perhaps we could divide the<br /> default value by 3 and use it as is in the
firstiteration in<br /> create_parallel_path() <span class="HOEnZb"><font color="#888888"><br
/></font></span></blockquote></div><br/></div><div class="gmail_extra">Hmm.  I'm not sure I like the parallel_threshold
GUCafter all.  That's a little strange.  But maybe.<br /><br /></div><div class="gmail_extra">For the reloption, I was
thinkingit would be parallel_degree, not max_parallel_degree.  max_parallel_degree would still control, so if the
parallel_degreefor a given table was greater than max_parallel_degree, you'd get max_parallel_degree instead.  But you
couldcrank up the parallel_degree for a small table to force more parallelism when querying it.<br clear="all"
/></div><divclass="gmail_extra"><br />-- <br /><div class="gmail_signature">Robert Haas<br />EnterpriseDB: <a
href="http://www.enterprisedb.com"target="_blank">http://www.enterprisedb.com</a><br />The Enterprise PostgreSQL
Company</div></div></div>

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 16/03/2016 17:16, Robert Haas wrote:
> On Tue, Mar 15, 2016 at 8:26 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
> 
>     On 15/03/2016 21:12, Robert Haas wrote:
>     > On Mon, Mar 14, 2016 at 9:25 PM, David Rowley
>     > <david.rowley@2ndquadrant.com <mailto:david.rowley@2ndquadrant.com>>
>     wrote:
>     >> Over in [1] James mentioned about wanting more to be able to have more
>     >> influence over the partial path's parallel_degree decision.  At risk
>     >> of a discussion on that hijacking the parallel aggregate thread, I
>     >> thought I'd start this for anyone who would want to discuss making
>     >> changes to that.
>     >>
>     >> I've attached a simple C program which shows the parallel_degree which
>     >> will be chosen at the moment. For now it's based on the size of the
>     >> base relation. Perhaps that will need to be rethought later, perhaps
>     >> based on costs. But I just don't think it's something for 9.6.
>     >
>     > I thought about this a bit more.  There are a couple of easy things we
>     > could do here.
>     >
>     > The 1000-page threshold could be made into a GUC.
>     >
>     > We could add a per-table reloption for parallel-degree that would
>     > override the calculation.
>     >
>     > Neither of those things is very smart, but they'd probably both help
>     > some people.  If someone is able to produce a patch for either or both
>     > of these things *quickly*, we could possibly try to squeeze it into
>     > 9.6 as a cleanup of work already done.
>     >
> 
>     I'm not too familiar with parallel planning, but I tried to implement
>     both in attached patch. I didn't put much effort into the
>     parallel_threshold GUC documentation, because I didn't really see a good
>     way to explain it. I'd e happy to improve it if needed. Also, to make
>     this parameter easier to tune for users, perhaps we could divide the
>     default value by 3 and use it as is in the first iteration in
>     create_parallel_path()
> 
> 
> Hmm.  I'm not sure I like the parallel_threshold GUC after all.  That's
> a little strange.  But maybe.
> 

FWIW I have two commits in my local branch, so I can remove this one easily.

> For the reloption, I was thinking it would be parallel_degree, not
> max_parallel_degree.  max_parallel_degree would still control, so if the
> parallel_degree for a given table was greater than max_parallel_degree,
> you'd get max_parallel_degree instead.  But you could crank up the
> parallel_degree for a small table to force more parallelism when
> querying it.
> 

Something like a "min_parallel_degree" then ?


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Wed, Mar 16, 2016 at 12:47 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> Something like a "min_parallel_degree" then ?

Why not just parallel_degree without any prefix?  As in, when scanning
this table in parallel, the reloption suggests using N workers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 16/03/2016 17:55, Robert Haas wrote:
> On Wed, Mar 16, 2016 at 12:47 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> Something like a "min_parallel_degree" then ?
>
> Why not just parallel_degree without any prefix?  As in, when scanning
> this table in parallel, the reloption suggests using N workers.
>

Agreed.

PFA v2 that implements that.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Wed, Mar 16, 2016 at 1:23 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 16/03/2016 17:55, Robert Haas wrote:
>> On Wed, Mar 16, 2016 at 12:47 PM, Julien Rouhaud
>> <julien.rouhaud@dalibo.com> wrote:
>>> Something like a "min_parallel_degree" then ?
>>
>> Why not just parallel_degree without any prefix?  As in, when scanning
>> this table in parallel, the reloption suggests using N workers.
>>
>
> Agreed.
>
> PFA v2 that implements that.

I think create_parallel_paths shouldn't actually run the loop if the
reloption is specified; it should just adopt the specified value (or
max_parallel_degree, whichever is less).  Right now, you have it doing
the work to compute the default value but then overriding it.

Also, I think parallel_degree should be down in the section that says
/* information about a base rel (not set for join rels!) */ and I
think it should be called something like rel_parallel_degree, to make
it more clear that it's a value set on the relation level.

Let's leave out the parallel_threshold stuff for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 16/03/2016 18:42, Robert Haas wrote:
> On Wed, Mar 16, 2016 at 1:23 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> On 16/03/2016 17:55, Robert Haas wrote:
>>> On Wed, Mar 16, 2016 at 12:47 PM, Julien Rouhaud
>>> <julien.rouhaud@dalibo.com> wrote:
>>>> Something like a "min_parallel_degree" then ?
>>>
>>> Why not just parallel_degree without any prefix?  As in, when scanning
>>> this table in parallel, the reloption suggests using N workers.
>>>
>>
>> Agreed.
>>
>> PFA v2 that implements that.
>
> I think create_parallel_paths shouldn't actually run the loop if the
> reloption is specified; it should just adopt the specified value (or
> max_parallel_degree, whichever is less).  Right now, you have it doing
> the work to compute the default value but then overriding it.
>

Oh ugly mistake. Fixed.

> Also, I think parallel_degree should be down in the section that says
> /* information about a base rel (not set for join rels!) */ and I
> think it should be called something like rel_parallel_degree, to make
> it more clear that it's a value set on the relation level.
>

You're right, fixed.

> Let's leave out the parallel_threshold stuff for now.
>

attached v3 drops the GUC part.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
James Sewell
Дата:

On Thu, Mar 17, 2016 at 5:05 AM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:

attached v3 drops the GUC part.

This looks good good. I do think that some threshold control would be good in the long term - but you are right Robert it just feels strange. 

Maybe once the final formula is implemented in 9.7+ and this gets some real world use cases it can be revisited?

One thing I really, really like about the way the new patch works is that you can set parallel_degree on an inheritance parent, then that will set the minimum workers for all associated children (when accessing from the parent).

Currently this patch will not fire on small tables even when parallel_degree is set, can we fix this by adding a check for ref->parallel_degree  to the table size condition?

Cheers,
James 


The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 17/03/2016 02:07, James Sewell wrote:
> 
> On Thu, Mar 17, 2016 at 5:05 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
> 
> 
>     attached v3 drops the GUC part.
> 
> 
> This looks good good. I do think that some threshold control would be
> good in the long term - but you are right Robert it just feels strange. 
> 
> Maybe once the final formula is implemented in 9.7+ and this gets some
> real world use cases it can be revisited?
> 
> One thing I really, really like about the way the new patch works is
> that you can set parallel_degree on an inheritance parent, then that
> will set the minimum workers for all associated children (when accessing
> from the parent).
> 
> Currently this patch will not fire on small tables even
> when parallel_degree is set, can we fix this by adding a check
> for ref->parallel_degree  to the table size condition?
> 

Actually, a parallel plan will be created, since in this case the
rel->reloptkind will be RELOPT_OTHER_MEMBER_REL, not RELOPT_BASEREL.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 17/03/2016 11:23, Julien Rouhaud wrote:
> On 17/03/2016 02:07, James Sewell wrote:
>>
>> On Thu, Mar 17, 2016 at 5:05 AM, Julien Rouhaud
>> <julien.rouhaud@dalibo.com <mailto:julien.rouhaud@dalibo.com>> wrote:
>>
>>
>>     attached v3 drops the GUC part.
>>
>>
>> This looks good good. I do think that some threshold control would be
>> good in the long term - but you are right Robert it just feels strange. 
>>
>> Maybe once the final formula is implemented in 9.7+ and this gets some
>> real world use cases it can be revisited?
>>
>> One thing I really, really like about the way the new patch works is
>> that you can set parallel_degree on an inheritance parent, then that
>> will set the minimum workers for all associated children (when accessing
>> from the parent).
>>
>> Currently this patch will not fire on small tables even
>> when parallel_degree is set, can we fix this by adding a check
>> for ref->parallel_degree  to the table size condition?
>>
> 
> Actually, a parallel plan will be created, since in this case the
> rel->reloptkind will be RELOPT_OTHER_MEMBER_REL, not RELOPT_BASEREL.
> 

And after re-reading your mail I see that it was what you meant, sorry :)

With the current threshold, you need a table bigger than 8 MB to be able
to force parallel workers. I'm not sure there'll be benefits for
multiple workers on a table smaller than 8 MB, since setting up all the
parallel stuff takes time.


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
David Rowley
Дата:
On 18 March 2016 at 00:13, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
> With the current threshold, you need a table bigger than 8 MB to be able
> to force parallel workers. I'm not sure there'll be benefits for
> multiple workers on a table smaller than 8 MB, since setting up all the
> parallel stuff takes time.

It would be really nice if it were possible to drop the setting really
low, so that combined with a low parallel_setup_cost we could enable
parallel query on small tables in the regression test suite.


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 17/03/2016 12:21, David Rowley wrote:
> On 18 March 2016 at 00:13, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>> With the current threshold, you need a table bigger than 8 MB to be able
>> to force parallel workers. I'm not sure there'll be benefits for
>> multiple workers on a table smaller than 8 MB, since setting up all the
>> parallel stuff takes time.
> 
> It would be really nice if it were possible to drop the setting really
> low, so that combined with a low parallel_setup_cost we could enable
> parallel query on small tables in the regression test suite.
> 
> 

Indeed. That could also be a use case for moving parallel_threshold to a
GUC, but not sure what'd be best.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
James Sewell
Дата:
Hey,

I think are definitely use cases for using parallel agg  on a small table when the time for each agg operation is very high. PostGIS can be used to create many examples of low row count and table size but high CPU operations.

This does bring up an interesting point I don't quite understand though. If I run parallel agg on a table with 4 rows with 2 workers will it run on two workers (2 rows each) or will the first one grab all 4 rows?

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
 

Level 2, 50 Queen St, Melbourne VIC 3000

(+61) 3 8370 8000  W www.lisasoft.com  (+61) 3 8370 8099
 

On Thu, Mar 17, 2016 at 10:30 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
On 17/03/2016 12:21, David Rowley wrote:
> On 18 March 2016 at 00:13, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>> With the current threshold, you need a table bigger than 8 MB to be able
>> to force parallel workers. I'm not sure there'll be benefits for
>> multiple workers on a table smaller than 8 MB, since setting up all the
>> parallel stuff takes time.
>
> It would be really nice if it were possible to drop the setting really
> low, so that combined with a low parallel_setup_cost we could enable
> parallel query on small tables in the regression test suite.
>
>

Indeed. That could also be a use case for moving parallel_threshold to a
GUC, but not sure what'd be best.



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 16/03/2016 18:42, Robert Haas wrote:
> On Wed, Mar 16, 2016 at 1:23 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> On 16/03/2016 17:55, Robert Haas wrote:
>>> On Wed, Mar 16, 2016 at 12:47 PM, Julien Rouhaud
>>> <julien.rouhaud@dalibo.com> wrote:
>>>> Something like a "min_parallel_degree" then ?
>>>
>>> Why not just parallel_degree without any prefix?  As in, when scanning
>>> this table in parallel, the reloption suggests using N workers.
>>>
>>
>> Agreed.
>>
>> PFA v2 that implements that.
> 
> I think create_parallel_paths shouldn't actually run the loop if the
> reloption is specified; it should just adopt the specified value (or
> max_parallel_degree, whichever is less).  Right now, you have it doing
> the work to compute the default value but then overriding it.
> 

After a second look at this part:
/* * Limit the degree of parallelism logarithmically based on the size of the * relation.  This probably needs to be a
gooddeal more sophisticated,
 
but we * need something here for now. */while (rel->pages > parallel_threshold * 3 &&       parallel_degree <
max_parallel_degree){   parallel_degree++;    parallel_threshold *= 3;    if (parallel_threshold >= PG_INT32_MAX / 3)
    break;}
 

Shouldn't we also check "parallel_degree < max_worker_process" ?

There's no need to compute any further than that. I think the best fix
would be to add a CheckHook or AssignHook on max_parallel_degree GUC to
make sure it's not more than max_worker_process.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
Tom Lane
Дата:
Julien Rouhaud <julien.rouhaud@dalibo.com> writes:
> Shouldn't we also check "parallel_degree < max_worker_process" ?

> There's no need to compute any further than that. I think the best fix
> would be to add a CheckHook or AssignHook on max_parallel_degree GUC to
> make sure it's not more than max_worker_process.

Please, let's not go there.  Interdependent checks on GUC values are far
harder to get right than you think.  It's far better to design the GUC
specifications so that it doesn't matter.

For an example whereof I speak, check the sordid history of commit
ee1e5662d8d83307 ("Auto-tune effective_cache size to be 4x shared
buffers"), which eventually got reverted after a huge amount of thrashing
trying to make it work consistently.  Admittedly, that was trying to make
the default value of GUC X depend on GUC Y, but I think checking whether
X <= Y would have many of the same problems.  The core issue is you don't
know which one's going to get set first.

In this particular case I think it'd be fine to document that the
effective amount of parallelism is Min(parallel_degree,max_worker_process).
        regards, tom lane



Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 18/03/2016 00:56, Tom Lane wrote:
> Julien Rouhaud <julien.rouhaud@dalibo.com> writes:
>> Shouldn't we also check "parallel_degree < max_worker_process" ?
>
>> There's no need to compute any further than that. I think the best fix
>> would be to add a CheckHook or AssignHook on max_parallel_degree GUC to
>> make sure it's not more than max_worker_process.
>
> Please, let's not go there.  Interdependent checks on GUC values are far
> harder to get right than you think.  It's far better to design the GUC
> specifications so that it doesn't matter.
>
> For an example whereof I speak, check the sordid history of commit
> ee1e5662d8d83307 ("Auto-tune effective_cache size to be 4x shared
> buffers"), which eventually got reverted after a huge amount of thrashing
> trying to make it work consistently.  Admittedly, that was trying to make
> the default value of GUC X depend on GUC Y, but I think checking whether
> X <= Y would have many of the same problems.  The core issue is you don't
> know which one's going to get set first.
>

Oh, I wasn't aware of that, thanks for the pointer.

> In this particular case I think it'd be fine to document that the
> effective amount of parallelism is Min(parallel_degree,max_worker_process).
>
>             regards, tom lane
>

I just saw that it's already documented that way. I attach a patch that
makes sure we don't try to compute a parallel_degree beyond this limit
(if you think it's worth it), and a missing description and "change
requires restart" for the max_worker_processes parameter in
postgresql.conf.sample.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
David Rowley
Дата:
On 18 March 2016 at 10:13, James Sewell <james.sewell@lisasoft.com> wrote:
> This does bring up an interesting point I don't quite understand though. If I run parallel agg on a table with 4 rows
with2 workers will it run on two workers (2 rows each) or will the first one grab all 4 rows?
 
It works on a per page basis, workers just each grab the next page to
be scanned from a page counter that sits in shared memory, the worker
just increments the page number, releases the lock on the counter and
scans that page.

See heap_parallelscan_nextpage()

So the answer to your question is probably no. At least not unless the
the page only contained 2 rows.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Choosing parallel_degree

От
James Sewell
Дата:
OK cool, thanks.

Can we remove the minimum size limit when the per table degree setting is applied?

This would help for tables with 2  - 1000 pages combined with a high CPU cost aggregate.

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
______________________________________
 

Level 2, 50 Queen St, Melbourne VIC 3000

(+61) 3 8370 8000  W www.lisasoft.com  (+61) 3 8370 8099
 

On Sun, Mar 20, 2016 at 11:23 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
On 18 March 2016 at 10:13, James Sewell <james.sewell@lisasoft.com> wrote:
> This does bring up an interesting point I don't quite understand though. If I run parallel agg on a table with 4 rows with 2 workers will it run on two workers (2 rows each) or will the first one grab all 4 rows?
It works on a per page basis, workers just each grab the next page to
be scanned from a page counter that sits in shared memory, the worker
just increments the page number, releases the lock on the counter and
scans that page.

See heap_parallelscan_nextpage()

So the answer to your question is probably no. At least not unless the
the page only contained 2 rows.

--
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



The contents of this email are confidential and may be subject to legal or professional privilege and copyright. No representation is made that this email is free of viruses or other defects. If you have received this communication in error, you may not copy or distribute any part of it or otherwise disclose its contents to anyone. Please advise the sender of your incorrect receipt of this correspondence.

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 21/03/2016 05:18, James Sewell wrote:
> OK cool, thanks.
>
> Can we remove the minimum size limit when the per table degree setting
> is applied?
>
> This would help for tables with 2  - 1000 pages combined with a high CPU
> cost aggregate.
>

Attached v4 implements that. It also makes sure that the chosen
parallel_degree won't be more than the relation's estimated number of pages.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 21/03/2016 20:38, Julien Rouhaud wrote:
> On 21/03/2016 05:18, James Sewell wrote:
>> OK cool, thanks.
>>
>> Can we remove the minimum size limit when the per table degree setting
>> is applied?
>>
>> This would help for tables with 2  - 1000 pages combined with a high CPU
>> cost aggregate.
>>
>
> Attached v4 implements that. It also makes sure that the chosen
> parallel_degree won't be more than the relation's estimated number of pages.
>

And I just realize that it'd prevent from forcing parallelism on
partitionned table, v5 attached removes the check on the estimated
number of pages.


--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 22/03/2016 07:58, Julien Rouhaud wrote:
> On 21/03/2016 20:38, Julien Rouhaud wrote:
>> On 21/03/2016 05:18, James Sewell wrote:
>>> OK cool, thanks.
>>>
>>> Can we remove the minimum size limit when the per table degree setting
>>> is applied?
>>>
>>> This would help for tables with 2  - 1000 pages combined with a high CPU
>>> cost aggregate.
>>>
>>
>> Attached v4 implements that. It also makes sure that the chosen
>> parallel_degree won't be more than the relation's estimated number of pages.
>>
> 
> And I just realize that it'd prevent from forcing parallelism on
> partitionned table, v5 attached removes the check on the estimated
> number of pages.
> 
> 

The feature freeze is now very close.  If this GUC is still wanted,
should I add this patch to the next commitfest?


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
Amit Kapila
Дата:
On Sun, Apr 3, 2016 at 4:37 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>
> On 22/03/2016 07:58, Julien Rouhaud wrote:
> > On 21/03/2016 20:38, Julien Rouhaud wrote:
> >> On 21/03/2016 05:18, James Sewell wrote:
> >>> OK cool, thanks.
> >>>
> >>> Can we remove the minimum size limit when the per table degree setting
> >>> is applied?
> >>>
> >>> This would help for tables with 2  - 1000 pages combined with a high CPU
> >>> cost aggregate.
> >>>
> >>
> >> Attached v4 implements that. It also makes sure that the chosen
> >> parallel_degree won't be more than the relation's estimated number of pages.
> >>
> >
> > And I just realize that it'd prevent from forcing parallelism on
> > partitionned table, v5 attached removes the check on the estimated
> > number of pages.
> >
> >

Few comments:
1.
+      limited according to the <xref linkend="gux-max-parallel-degree">

A. typo.
   /gux-max-parallel-degree/guc-max-parallel-degree
   /worker/workers
B. +     <para>
+      Number of workers wanted for this table. The number of worker will be
+      limited according to 
the <xref linkend="gux-max-parallel-degree">
+      parameter.
+     </para>

How about writing the above as:
Sets the degree of parallelism for an individual relation.  The requested number of workers will be limited by <xref linkend="guc-max-parallel-degree">

2.
+ {
+ {
+ "parallel_degree",
+ "Number of parallel processes 
per executor node wanted for this relation.",
+ RELOPT_KIND_HEAP,
+
AccessExclusiveLock
+ },
+ -1, 1, INT_MAX
+ },

I think here min and max values should be same as for max_parallel_degree (I have verified that for some of the other reloption parameters, min and max are same as their guc values); Is there a reason to keep them different?

3.
@@ -1291,7 +1300,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)

Comment on top of this function says:
/*
 * Option parser for anything that uses StdRdOptions (i.e. fillfactor and
 * autovacuum)
 */

I think it is better to include parallel_degree in above comment along with fillfactor and autovacuum.


4.
/*
+ * RelationGetMaxParallelDegree
+ * Returns the relation's parallel_degree.  Note multiple eval of 
argument!
+ */
+#define RelationGetParallelDegree(relation, defaultmpd) \
+ ((relation)->rd_options ? \
+  
((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultmpd))
+

There are minor in-consistencies in the above macro definition.

a. RelationGetMaxParallelDegree - This should be RelationGetParallelDegree.
b. defaultmpd - it is better to name it as defaultpd

 
>
>
> The feature freeze is now very close.  If this GUC is still wanted,
> should I add this patch to the next commitfest?
>

I am hoping that this will be committed to 9.6, but I think it is good to register it in next CF.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Choosing parallel_degree

От
Corey Huinker
Дата:
On Mon, Apr 4, 2016 at 2:55 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Sun, Apr 3, 2016 at 4:37 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>
> On 22/03/2016 07:58, Julien Rouhaud wrote:
> > On 21/03/2016 20:38, Julien Rouhaud wrote:
> >> On 21/03/2016 05:18, James Sewell wrote:
> >>> OK cool, thanks.
> >>>
> >>> Can we remove the minimum size limit when the per table degree setting
> >>> is applied?
> >>>
> >>> This would help for tables with 2  - 1000 pages combined with a high CPU
> >>> cost aggregate.
> >>>
> >>
> >> Attached v4 implements that. It also makes sure that the chosen
> >> parallel_degree won't be more than the relation's estimated number of pages.
> >>
> >
> > And I just realize that it'd prevent from forcing parallelism on
> > partitionned table, v5 attached removes the check on the estimated
> > number of pages.
> >
> >

Few comments:
1.
+      limited according to the <xref linkend="gux-max-parallel-degree">

A. typo.
   /gux-max-parallel-degree/guc-max-parallel-degree
   /worker/workers
B. +     <para>
+      Number of workers wanted for this table. The number of worker will be
+      limited according to 
the <xref linkend="gux-max-parallel-degree">
+      parameter.
+     </para>

How about writing the above as:
Sets the degree of parallelism for an individual relation.  The requested number of workers will be limited by <xref linkend="guc-max-parallel-degree">

2.
+ {
+ {
+ "parallel_degree",
+ "Number of parallel processes 
per executor node wanted for this relation.",
+ RELOPT_KIND_HEAP,
+
AccessExclusiveLock
+ },
+ -1, 1, INT_MAX
+ },

I think here min and max values should be same as for max_parallel_degree (I have verified that for some of the other reloption parameters, min and max are same as their guc values); Is there a reason to keep them different?

3.
@@ -1291,7 +1300,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)

Comment on top of this function says:
/*
 * Option parser for anything that uses StdRdOptions (i.e. fillfactor and
 * autovacuum)
 */

I think it is better to include parallel_degree in above comment along with fillfactor and autovacuum.


4.
/*
+ * RelationGetMaxParallelDegree
+ * Returns the relation's parallel_degree.  Note multiple eval of 
argument!
+ */
+#define RelationGetParallelDegree(relation, defaultmpd) \
+ ((relation)->rd_options ? \
+  
((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultmpd))
+

There are minor in-consistencies in the above macro definition.

a. RelationGetMaxParallelDegree - This should be RelationGetParallelDegree.
b. defaultmpd - it is better to name it as defaultpd

 
>
>
> The feature freeze is now very close.  If this GUC is still wanted,
> should I add this patch to the next commitfest?
>

I am hoping that this will be committed to 9.6, but I think it is good to register it in next CF.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


I'm late to the party on this thread, and most of the discussion seems to be about setting parallel levels based on tables, which I think is wise.

What I haven't seen is any talk about setting parallel degree relative to how many CPUs exist on the machine. Clearly we don't need it right away, but when we do, I'm happy to report that CPU discovery is as easy as 

(int)sysconf(_SC_NPROCESSORS_ONLN)

source: https://github.com/moat/pmpp/blob/distribute_in_c/src/num_cpus.c an extension I will be very happy to see declared obsolete.

But even that would probably be consulted only at system startup time, and used to dynamically compute whatever GUCs and system settings will be used until restart. 

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 04/04/2016 08:55, Amit Kapila wrote:

Thanks for the review!

> Few comments:
> 1.
> +      limited according to the <xref linkend="gux-max-parallel-degree">
>
> A. typo.
>    /gux-max-parallel-degree/guc-max-parallel-degree
>    /worker/workers

Oops, fixed.

> B. +     <para>
> +      Number of workers wanted for this table. The number of worker will be
> +      limited according to
> the <xref linkend="gux-max-parallel-degree">
> +      parameter.
> +     </para>
>
> How about writing the above as:
> Sets the degree of parallelism for an individual relation.  The
> requested number of workers will be limited by <xref
> linkend="guc-max-parallel-degree">
>

That's clearly better, changed.

> 2.
> +{
> +{
> +"parallel_degree",
> +"Number of parallel processes
> per executor node wanted for this relation.",
> +RELOPT_KIND_HEAP,
> +
> AccessExclusiveLock
> +},
> +-1, 1, INT_MAX
> +},
>
> I think here min and max values should be same as for
> max_parallel_degree (I have verified that for some of the other
> reloption parameters, min and max are same as their guc values); Is
> there a reason to keep them different?
>

No reason. I put 0 and MAX_BACKENDS, as the GUC value.

> 3.
> @@ -1291,7 +1300,9 @@ default_reloptions(Datum reloptions, bool
> validate, relopt_kind kind)
>
> Comment on top of this function says:
> /*
>  * Option parser for anything that uses StdRdOptions (i.e. fillfactor and
>  * autovacuum)
>  */
>
> I think it is better to include parallel_degree in above comment along
> with fillfactor and autovacuum.
>

Agreed. BTW the user_catalog_table option isn't listed either.

>
> 4.
> /*
> + * RelationGetMaxParallelDegree
> + *Returns the relation's parallel_degree.  Note multiple eval of
> argument!
> + */
> +#define RelationGetParallelDegree(relation, defaultmpd) \
> +((relation)->rd_options ? \
> +
> ((StdRdOptions *) (relation)->rd_options)->parallel_degree : (defaultmpd))
> +
>
> There are minor in-consistencies in the above macro definition.
>
> a. RelationGetMaxParallelDegree - This should be RelationGetParallelDegree.
> b. defaultmpd - it is better to name it as defaultpd
>

Yes, I forgot to update it when I renamed the option, fixed.

>
>>
>>
>> The feature freeze is now very close.  If this GUC is still wanted,
>> should I add this patch to the next commitfest?
>>
>
> I am hoping that this will be committed to 9.6, but I think it is good
> to register it in next CF.
>

So attached v6 fixes all the problems above.

I'll add it to the next commitfest.

>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 04/04/2016 17:03, Julien Rouhaud wrote:
> On 04/04/2016 08:55, Amit Kapila wrote:
>
> Thanks for the review!
>
>> Few comments:
>> 1.
>> +      limited according to the <xref linkend="gux-max-parallel-degree">
>>
>> A. typo.
>>    /gux-max-parallel-degree/guc-max-parallel-degree
>>    /worker/workers
>
> Oops, fixed.
>

And I managed to no fix it, sorry :/ Thanks to Andreas who warned me.


--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
Amit Kapila
Дата:
On Mon, Apr 4, 2016 at 11:39 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>
> On 04/04/2016 17:03, Julien Rouhaud wrote:
> > On 04/04/2016 08:55, Amit Kapila wrote:
> >
> > Thanks for the review!
> >
> >> Few comments:
> >> 1.
> >> +      limited according to the <xref linkend="gux-max-parallel-degree">
> >>
> >> A. typo.
> >>    /gux-max-parallel-degree/guc-max-parallel-degree
> >>    /worker/workers
> >
> > Oops, fixed.
> >
>
> And I managed to no fix it, sorry :/ Thanks to Andreas who warned me.
>

Few more comments:

1.
@@ -909,6 +909,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXI
    
</varlistentry>
 
    <varlistentry>
+    <term><literal>parallel_degree</> (<type>integer</>)</term>
+    <listitem>
+     
<para>
+     Sets the degree of parallelism for an individual relation.  The requested
+     number of workers will be 
limited by <xref
+     linkend="guc-max-parallel-degree">.
+     </para>
+    </listitem>
+   </varlistentry>

All other parameters in this category are supportted by Alter table command as well, so I think this parameter should also be supported by Alter Table command (for both SET and RESET variants).

2.
+ "Number of parallel processes per executor node wanted for this relation.",

How about
Number of parallel processes that can be used for this relation per executor node.

3.
- if (rel->pages < parallel_threshold && rel->reloptkind == RELOPT_BASEREL)
+ if (rel->pages < 
parallel_threshold && rel->rel_parallel_degree == -1 &&
+ rel->reloptkind == RELOPT_BASEREL)

A. Second line should be indented with the begin of first line after bracket '(' which means with rel->pages.  Refer multiline condition in near by code.  Or you can run pgindent.
B. The comment above this condition needs slight adjustment as per new condition.

4.
+ int parallel_degree; /* max number of parallel worker */
 } StdRdOptions;

Typo in comments
/worker/workers


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 05/04/2016 06:19, Amit Kapila wrote:
>
> Few more comments:
> 
> 1.
> @@ -909,6 +909,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } |
> UNLOGGED ] TABLE [ IF NOT EXI
>     
> </varlistentry>
>  
>     <varlistentry>
> +    <term><literal>parallel_degree</> (<type>integer</>)</term>
> +    <listitem>
> +     
> <para>
> +     Sets the degree of parallelism for an individual relation.  The
> requested
> +     number of workers will be 
> limited by <xref
> +     linkend="guc-max-parallel-degree">.
> +     </para>
> +    </listitem>
> +   </varlistentry>
> 
> All other parameters in this category are supportted by Alter table
> command as well, so I think this parameter should also be supported by
> Alter Table command (for both SET and RESET variants).
>

I don't quite understand.  With the patch you can use parallel_degree in
either CREATE or ALTER table (SET and RESET) statements. Considering
documentation, the list of storage parameters only appears in
create_table.sgml, alter_table.sgml pointing to it.

In alter_table.sgml, I didn't comment the lock level needed to modify
parallel_degree since it requires an access exclusive lock for now.
While thinking about it, I think it's safe to use a share update
exclusive lock but I may be wrong.  What do you think?

> 2.
> +"Number of parallel processes per executor node wanted for this relation.",
> 
> How about
> Number of parallel processes that can be used for this relation per
> executor node.
> 

I just rephrased what was used for the max_parallel_degree GUC, which is:

"Sets the maximum number of parallel processes per executor node."

I find your version better once again, but should we keep some
consistency between them or it's not important?

> 3.
> -if (rel->pages < parallel_threshold && rel->reloptkind == RELOPT_BASEREL)
> +if (rel->pages < 
> parallel_threshold && rel->rel_parallel_degree == -1 &&
> +rel->reloptkind == RELOPT_BASEREL)
> 
> A. Second line should be indented with the begin of first line after
> bracket '(' which means with rel->pages.  Refer multiline condition in
> near by code.  Or you can run pgindent.

I ran pgindent, fixed.

> B. The comment above this condition needs slight adjustment as per new
> condition.
> 

Also fixed.

> 4.
> +intparallel_degree; /* max number of parallel worker */
>  } StdRdOptions;
> 
> Typo in comments
> /worker/workers
>

fixed.

I'll send an updated patch when I'll know what to do about the first two
points.


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
Amit Kapila
Дата:
On Tue, Apr 5, 2016 at 11:55 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>
> On 05/04/2016 06:19, Amit Kapila wrote:
> >
> > Few more comments:
> >
> > 1.
> > @@ -909,6 +909,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } |
> > UNLOGGED ] TABLE [ IF NOT EXI
> >
> > </varlistentry>
> >
> >     <varlistentry>
> > +    <term><literal>parallel_degree</> (<type>integer</>)</term>
> > +    <listitem>
> > +
> > <para>
> > +     Sets the degree of parallelism for an individual relation.  The
> > requested
> > +     number of workers will be
> > limited by <xref
> > +     linkend="guc-max-parallel-degree">.
> > +     </para>
> > +    </listitem>
> > +   </varlistentry>
> >
> > All other parameters in this category are supportted by Alter table
> > command as well, so I think this parameter should also be supported by
> > Alter Table command (for both SET and RESET variants).
> >
>
> I don't quite understand.  With the patch you can use parallel_degree in
> either CREATE or ALTER table (SET and RESET) statements.
>

No issues then.

>  Considering
> documentation, the list of storage parameters only appears in
> create_table.sgml, alter_table.sgml pointing to it.
>
> In alter_table.sgml, I didn't comment the lock level needed to modify
> parallel_degree since it requires an access exclusive lock for now.
> While thinking about it, I think it's safe to use a share update
> exclusive lock but I may be wrong.  What do you think?
>

We require to keep AccessExclusiveLock for operations which can impact Select operation which I think this operation does, so lets retain AccessExclusiveLock for now.  If somebody else thinks, we should not bother about Selects, then we can change it.

> > 2.
> > +"Number of parallel processes per executor node wanted for this relation.",
> >
> > How about
> > Number of parallel processes that can be used for this relation per
> > executor node.
> >
>
> I just rephrased what was used for the max_parallel_degree GUC, which is:
>
> "Sets the maximum number of parallel processes per executor node."
>
> I find your version better once again, but should we keep some
> consistency between them or it's not important?
>

I think consistency is good, but this is different from max_parallel_degree, so I would prefer to use something on lines of what I have mentioned.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 06/04/2016 07:38, Amit Kapila wrote:
> On Tue, Apr 5, 2016 at 11:55 PM, Julien Rouhaud
>>
>> In alter_table.sgml, I didn't comment the lock level needed to modify
>> parallel_degree since it requires an access exclusive lock for now.
>> While thinking about it, I think it's safe to use a share update
>> exclusive lock but I may be wrong.  What do you think?
>>
>
> We require to keep AccessExclusiveLock for operations which can impact
> Select operation which I think this operation does, so lets
> retain AccessExclusiveLock for now.  If somebody else thinks, we should
> not bother about Selects, then we can change it.
>

Ok. Isn't there also some considerations about forcing replanning of
prepared statements using the table for instance?

>>
>> I find your version better once again, but should we keep some
>> consistency between them or it's not important?
>>
>
> I think consistency is good, but this is different from
> max_parallel_degree, so I would prefer to use something on lines of what
> I have mentioned.
>

Agreed, changed in attached v8 (including fix for previous mail).

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
Amit Kapila
Дата:
On Wed, Apr 6, 2016 at 10:49 PM, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:
>
> On 06/04/2016 07:38, Amit Kapila wrote:
> > On Tue, Apr 5, 2016 at 11:55 PM, Julien Rouhaud
> >>
> >> In alter_table.sgml, I didn't comment the lock level needed to modify
> >> parallel_degree since it requires an access exclusive lock for now.
> >> While thinking about it, I think it's safe to use a share update
> >> exclusive lock but I may be wrong.  What do you think?
> >>
> >
> > We require to keep AccessExclusiveLock for operations which can impact
> > Select operation which I think this operation does, so lets
> > retain AccessExclusiveLock for now.  If somebody else thinks, we should
> > not bother about Selects, then we can change it.
> >
>
> Ok. Isn't there also some considerations about forcing replanning of
> prepared statements using the table for instance?
>

not sure, what you mean by that.
 
>
> >>
> >> I find your version better once again, but should we keep some
> >> consistency between them or it's not important?
> >>
> >
> > I think consistency is good, but this is different from
> > max_parallel_degree, so I would prefer to use something on lines of what
> > I have mentioned.
> >
>
> Agreed, changed in attached v8 (including fix for previous mail).
>

+ /*
+ * Limit the degree of parallelism logarithmically based on the size
+ * of the relation.  This probably needs to be a good deal more
+ * sophisticated, but we need something here for now.


I think your pgindent run has changed something unrelated. It is not preferred to change unrelated things in patch.

Other than that, patch looks good and I have marked it as Ready For Committer.  Hope, we get this for 9.6.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Other than that, patch looks good and I have marked it as Ready For
> Committer.  Hope, we get this for 9.6.

Committed.  I think this is likely to make parallel query
significantly more usable in 9.6.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
Paul Ramsey
Дата:
On Fri, Apr 8, 2016 at 8:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Other than that, patch looks good and I have marked it as Ready For
>> Committer.  Hope, we get this for 9.6.
>
> Committed.  I think this is likely to make parallel query
> significantly more usable in 9.6.

I'm kind of worried that it will make it yet less usable for PostGIS,
since approaches that ignore costs in favour of relpages will
dramatically under-resource our queries. I can spin a query for
multiple seconds on a table with less than 100K records, not even
trying very hard.

Functions have very unequal CPU costs, and we're talking here about
using CPUs more effectively, why are costs being given the see-no-evil
treatment? This is as true in core as it is in PostGIS, even if our
case is a couple orders of magnitude more extreme: a filter based on a
complex combination of regex queries will use an order of magnitude
more CPU than one that does a little math, why plan and execute them
like they are the same?

As it stands now, it seems like out of the box PostGIS users will
actually not see much benefit from parallelism unless they  manhandle
their configuration settings to force it.

ATB,

P



Re: Choosing parallel_degree

От
Simon Riggs
Дата:
On 8 April 2016 at 17:00, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
On Fri, Apr 8, 2016 at 8:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Other than that, patch looks good and I have marked it as Ready For
>> Committer.  Hope, we get this for 9.6.
>
> Committed.  I think this is likely to make parallel query
> significantly more usable in 9.6.

I'm kind of worried that it will make it yet less usable for PostGIS,
since approaches that ignore costs in favour of relpages will
dramatically under-resource our queries. I can spin a query for
multiple seconds on a table with less than 100K records, not even
trying very hard.

Doesn't sound good.
 
Functions have very unequal CPU costs, and we're talking here about
using CPUs more effectively, why are costs being given the see-no-evil
treatment? This is as true in core as it is in PostGIS, even if our
case is a couple orders of magnitude more extreme: a filter based on a
complex combination of regex queries will use an order of magnitude
more CPU than one that does a little math, why plan and execute them
like they are the same?

Functions have user assignable costs.
 
As it stands now, it seems like out of the box PostGIS users will
actually not see much benefit from parallelism unless they  manhandle
their configuration settings to force it.

Does this concern apply to this patch, or to the general situation for 9.6.

Please suggest what you would like to see. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Choosing parallel_degree

От
Paul Ramsey
Дата:
On Fri, Apr 8, 2016 at 9:06 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 8 April 2016 at 17:00, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
>>
>> On Fri, Apr 8, 2016 at 8:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila <amit.kapila16@gmail.com>
>> > wrote:
>> >> Other than that, patch looks good and I have marked it as Ready For
>> >> Committer.  Hope, we get this for 9.6.
>> >
>> > Committed.  I think this is likely to make parallel query
>> > significantly more usable in 9.6.
>>
>> I'm kind of worried that it will make it yet less usable for PostGIS,
>> since approaches that ignore costs in favour of relpages will
>> dramatically under-resource our queries. I can spin a query for
>> multiple seconds on a table with less than 100K records, not even
>> trying very hard.
>
> Doesn't sound good.

I admit, it's not a "usual" database thing, but it's right in the
meaty middle of use cases that parallelism can crushingly awesomely
defeat. It's also probably not too unusual for extension use cases,
where complex data are held in user defined types, whether they be
image fragments, music samples, genetic data, raster data or LIDAR
point clouds. PostGIS is just one voice of many in the Symphony of
Crazy Shit in the Database.

>> Functions have very unequal CPU costs, and we're talking here about
>> using CPUs more effectively, why are costs being given the see-no-evil
>> treatment? This is as true in core as it is in PostGIS, even if our
>> case is a couple orders of magnitude more extreme: a filter based on a
>> complex combination of regex queries will use an order of magnitude
>> more CPU than one that does a little math, why plan and execute them
>> like they are the same?
>
> Functions have user assignable costs.

We have done a relatively bad job of globally costing our functions
thus far, because it mostly didn't make any difference. In my testing
[1], I found that costing could push better plans for parallel
sequence scans and parallel aggregates, though at very extreme cost
values (1000 for sequence scans and 10000 for aggregates)

Obviously, if costs can make a difference for 9.6 and parallelism
we'll rigorously ensure we have good, useful costs. I've already
costed many functions in my parallel postgis test branch [2]. Perhaps
the avoidance of cost so far is based on the relatively nebulous
definition it has: about the only thing in the docs is "If the cost is
not specified, 1 unit is assumed for C-language and internal
functions, and 100 units for functions in all other languages. Larger
values cause the planner to try to avoid evaluating the function more
often than necessary."

So what about C functions then? Should a string comparison be 5 and a
multiplication 1? An image histogram 1000?

>> As it stands now, it seems like out of the box PostGIS users will
>> actually not see much benefit from parallelism unless they  manhandle
>> their configuration settings to force it.
>
> Does this concern apply to this patch, or to the general situation for 9.6.

Insofar as the patch is throttling how many parallel workers you get
based solely on your relsize, it does concern this patch, but it's a
general issue in both the extreme and not obviously related costings
needed to trip parallel sequence and parallel aggregate plans. The
parallel join seems to not take function/operator costs into account
at all [3], at least I couldn't plump up a high enough cost to trip it
without also adjusting the global parallel tuple cost configuration.

I've seen a number of asides to the effect that "yes, costs are
important, but we probably can't do anything about that for 9.6" in
parallel patch threads, including this one, so I'm getting concerned
that the core improvement we've been hoping for for years won't
actually address our use cases when it is first released. That may
just be the way it is, c'est la vie, but it would be unfortunate.

P

[1] http://blog.cleverelephant.ca/2016/03/parallel-postgis.html
[2] https://github.com/pramsey/postgis/tree/parallel
[3] http://blog.cleverelephant.ca/2016/03/parallel-postgis-joins.html



Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Fri, Apr 8, 2016 at 12:27 PM, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
> Insofar as the patch is throttling how many parallel workers you get
> based solely on your relsize, it does concern this patch, but it's a
> general issue in both the extreme and not obviously related costings
> needed to trip parallel sequence and parallel aggregate plans. The
> parallel join seems to not take function/operator costs into account
> at all [3], at least I couldn't plump up a high enough cost to trip it
> without also adjusting the global parallel tuple cost configuration.
>
> I've seen a number of asides to the effect that "yes, costs are
> important, but we probably can't do anything about that for 9.6" in
> parallel patch threads, including this one, so I'm getting concerned
> that the core improvement we've been hoping for for years won't
> actually address our use cases when it is first released. That may
> just be the way it is, c'est la vie, but it would be unfortunate.

So, I don't think that the patch makes anything worse for PostGIS.  At
most, it doesn't make anything better for PostGIS.  Without the patch,
the number of workers in a parallel plan is based strictly on the size
of the relation upon which the parallel plan performs a parallel
sequential scan.  With the patch, you can - if you wish - substitute
some other number for the one the planner comes up with.  But you
don't have to do that, so it can't be worse than before.  To the
contrary, I'm fairly confident that it is useful, if not everything
you might ever have wanted.

Now, I think you really want is for the planner to be much smarter on
a broad level in determining when to use parallelism and how many
workers to pick on a per-query basis, and to take function and
operator costs into account in making that determination.  I agree
that's something that the planner should do, but it's definitely not
going to happen in 9.6.  It's going to require major new planner
infrastructure that will probably take years to build and get right,
unless Tom gets interested (he said hopefully).

There are basically two ways that a query can use too many workers.
First, at some point, when you add workers to a plan tree, the new and
larger number of workers don't produce tuples any faster in the
aggregate than a smaller number of workers would have done.  For
example, if a scan becomes I/O bound, adding more workers probably
won't help much.  Second, at some point, the workers you already have
are producing tuples as fast as the leader can consume them, and
producing them faster doesn't do you any good because the extra
workers will just spend time waiting for the leader to get around to
servicing them.

I have an idea about how to solve the first problem, which I've
started to mock up as a PoC.  What it does is track the portion of the
cost of each node that is believed to represent non-parallelizable
work, hereinafter serial_cost.  So if the query involves a lot of
CPU-intensive functions, then the ration of total_cost / serial_cost
will be high, suggesting that many workers will be useful.  If that
ratio is low, it suggests that the query is mostly I/O-bound or
IPC-bound anyway, and adding workers isn't going to do much.  Details
are still a bit fuzzy, and I don't have this developed enough that
it's actually testable in any way yet, but that's my idea.

The second problem is substantially harder, from what I can see.  To
judge how whether the leader can keep up, you need to know how much
CPU work it will have to do per gathered tuple.  In order to know
that, you'd need to know the costs of the plan nodes above the Gather
before knowing the cost of the Gather itself.  That's obviously a bit
challenging since our costing code doesn't work anything like that.
To some extent, this circularity is also an issue for the first
problem, since the costing of the lower nodes depending on how many
workers you have, but the number of workers you want at the point
where you Gather is dependent on the costing of those same lower
nodes.

I think there's probably a way through this mess - other people have
written good parallel optimizers before - but it is not like there is
some reasonably small tweak that will give you what you are hoping for
here.  Figuring out how all of this needs to work is a massive project
in its own right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
tushar
Дата:
<div class="moz-cite-prefix">On 04/08/2016 08:53 PM, Robert Haas wrote:<br /></div><blockquote
cite="mid:CA+TgmoaVcydkmML4nKk0wJ0roZf-WKLH8VdvWjAnQ=PjoEeqxQ@mail.gmail.com"type="cite"><pre wrap="">On Fri, Apr 8,
2016at 1:22 AM, Amit Kapila <a class="moz-txt-link-rfc2396E"
href="mailto:amit.kapila16@gmail.com"><amit.kapila16@gmail.com></a>wrote:
 
</pre><blockquote type="cite"><pre wrap="">Other than that, patch looks good and I have marked it as Ready For
Committer.  Hope, we get this for 9.6.
</pre></blockquote><pre wrap="">
Committed.  I think this is likely to make parallel query
significantly more usable in 9.6.

</pre></blockquote> While testing ,I observed couple of things - <br /><br /> Case 1 =Not accepting parallel seq scan
whenparallel_degree is set to 0<br /><br /> postgres=# create table fok2(n int) with (parallel_degree=0);<br /> CREATE
TABLE<br/> postgres=# insert into fok2 values (generate_series(1,1000000)); analyze fok2; vacuum fok2;<br /> INSERT 0
1000000<br/> ANALYZE<br /> VACUUM<br /> postgres=# set max_parallel_degree =5;<br /> SET<br /> postgres=# explain
analyzeverbose   select * from fok2  where n<=10;<br />                                                   QUERY
PLAN                                                 <br />
--------------------------------------------------------------------------------------------------------------<br/>
 SeqScan on public.fok2  (cost=0.00..16925.00 rows=100 width=4) (actual time=0.027..217.882 rows=10 loops=1)<br />   
Output:n<br />    Filter: (fok2.n <= 10)<br />    Rows Removed by Filter: 999990<br />  Planning time: 0.084 ms<br
/> Execution time: 217.935 ms<br /> (6 rows)<br /><br /> I am assuming parallel_degree=0 is as same as not using it  ,
i.e<br /> create table fok2(n int) with (parallel_degree=0);  = create table fok2(n int);<br /><br /> so in this case
itshould have accepted the parallel seq .scan.<br /><br /> Case 2=Total no# of workers are NOT matching with the
workersinformation -<br /><br /> postgres=# alter table fok set (parallel_degree=10);<br /> ALTER TABLE<br />
postgres=#set max_parallel_degree =9;<br /> SET<br /> postgres=# explain analyze verbose   select * from fok  where
n<=1;<br/>                                                        QUERY
PLAN                                                       <br />
-------------------------------------------------------------------------------------------------------------------------<br
/> Gather  (cost=1000.00..6823.89 rows=100 width=4) (actual time=0.621..107.755 rows=1 loops=1)<br />    Output: n<br
/><b>  Number of Workers: 9</b><br />    ->  Parallel Seq Scan on public.fok  (cost=0.00..5814.00 rows=11 width=4)
(actualtime=83.382..95.157 rows=0 loops=9)<br />          Output: n<br />          Filter: (fok.n <= 1)<br />
        Rows Removed by Filter: 111111<br />          Worker 0: actual time=82.181..82.181 rows=0 loops=1<br />
        Worker 1: actual time=97.236..97.236 rows=0 loops=1<br />          Worker 2: actual time=93.586..93.586 rows=0
loops=1<br/>          Worker 3: actual time=94.159..94.159 rows=0 loops=1<br />          Worker 4: actual
time=88.459..88.459rows=0 loops=1<br />          Worker 5: actual time=90.245..90.245 rows=0 loops=1<br />         
Worker6: actual time=101.577..101.577 rows=0 loops=1<br />          Worker 7: actual time=102.955..102.955 rows=0
loops=1<br/>  Planning time: 0.119 ms<br />  Execution time: 108.585 ms<br /> (17 rows)<br /><br /> Expected =
Expectingworker8 information , also loops=10 (including the Master)<br /><br /> Case 3=Getting error if we set the max
valuein max_parallel_degree  as well in parallel_degree  .<br /><br /> postgres=# create table abd(n int) with
(parallel_degree=262144);<br/> ERROR:  value 262144 out of bounds for option "parallel_degree"<br /> DETAIL:  Valid
valuesare between "0" and "262143".<br /><br /> postgres=# create table abd(n int) with (parallel_degree=262143);<br />
CREATETABLE<br /> postgres=# insert into abd values (generate_series(1,1000000)); analyze abd; vacuum abd;<br /> INSERT
01000000<br /> ANALYZE<br /><br /> postgres=# set max_parallel_degree =2624444;<br /> ERROR:  2624444 is outside the
validrange for parameter "max_parallel_degree" (0 .. 262143)<br /><br /> postgres=# set max_parallel_degree =262143;<br
/>SET<br /> postgres=# <br /><br /> postgres=# explain analyze verbose select * from abd  where n<=1;<br /> ERROR: 
requestedshared memory size overflows size_t<br /><br /> if we remove the analyze keyword then query running
successfully.<br/><br /> Expected = Is it not better to throw the error at the time of setting max_parallel_degree, if
notsupported ?<br /><pre class="moz-signature" cols="72">-- 
 
regards,tushar
</pre>

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 11/04/2016 15:56, tushar wrote:
> On 04/08/2016 08:53 PM, Robert Haas wrote:
>> On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> Other than that, patch looks good and I have marked it as Ready For
>>> Committer.  Hope, we get this for 9.6.
>> Committed.  I think this is likely to make parallel query
>> significantly more usable in 9.6.
>>
> While testing ,I observed couple of things -
> 
> Case 1 =Not accepting parallel seq scan when parallel_degree is set to 0
> 
> postgres=# create table fok2(n int) with (parallel_degree=0);
> CREATE TABLE
> postgres=# insert into fok2 values (generate_series(1,1000000)); analyze
> fok2; vacuum fok2;
> INSERT 0 1000000
> ANALYZE
> VACUUM
> postgres=# set max_parallel_degree =5;
> SET
> postgres=# explain analyze verbose   select * from fok2  where n<=10;
>                                                   QUERY
> PLAN                                                 
> --------------------------------------------------------------------------------------------------------------
>  Seq Scan on public.fok2  (cost=0.00..16925.00 rows=100 width=4) (actual
> time=0.027..217.882 rows=10 loops=1)
>    Output: n
>    Filter: (fok2.n <= 10)
>    Rows Removed by Filter: 999990
>  Planning time: 0.084 ms
>  Execution time: 217.935 ms
> (6 rows)
> 
> I am assuming parallel_degree=0 is as same as not using it  , i.e
> create table fok2(n int) with (parallel_degree=0);  = create table
> fok2(n int);
> 
> so in this case it should have accepted the parallel seq .scan.
> 

No, setting it to 0 means to force not using parallel workers (but
considering the parallel path IIRC).

> Case 2=Total no# of workers are NOT matching with the workers information -
> 
> postgres=# alter table fok set (parallel_degree=10);
> ALTER TABLE
> postgres=# set max_parallel_degree =9;
> SET
> postgres=# explain analyze verbose   select * from fok  where n<=1;
>                                                        QUERY
> PLAN                                                       
>
-------------------------------------------------------------------------------------------------------------------------
>  Gather  (cost=1000.00..6823.89 rows=100 width=4) (actual
> time=0.621..107.755 rows=1 loops=1)
>    Output: n
> *   Number of Workers: 9*
>    ->  Parallel Seq Scan on public.fok  (cost=0.00..5814.00 rows=11
> width=4) (actual time=83.382..95.157 rows=0 loops=9)
>          Output: n
>          Filter: (fok.n <= 1)
>          Rows Removed by Filter: 111111
>          Worker 0: actual time=82.181..82.181 rows=0 loops=1
>          Worker 1: actual time=97.236..97.236 rows=0 loops=1
>          Worker 2: actual time=93.586..93.586 rows=0 loops=1
>          Worker 3: actual time=94.159..94.159 rows=0 loops=1
>          Worker 4: actual time=88.459..88.459 rows=0 loops=1
>          Worker 5: actual time=90.245..90.245 rows=0 loops=1
>          Worker 6: actual time=101.577..101.577 rows=0 loops=1
>          Worker 7: actual time=102.955..102.955 rows=0 loops=1
>  Planning time: 0.119 ms
>  Execution time: 108.585 ms
> (17 rows)
> 
> Expected = Expecting worker8 information , also loops=10 (including the
> Master)
> 

Even if you set a per-table parallel_degree higher than
max_parallel_degree, it'll be maxed at max_parallel_degree.

Then, the explain shows that the planner assumed it'll launch 9 workers,
but only 8 were available (or needed perhaps) at runtime.

> Case 3=Getting error if we set the max value in max_parallel_degree  as
> well in parallel_degree  .
> 
> postgres=# create table abd(n int) with (parallel_degree=262144);
> ERROR:  value 262144 out of bounds for option "parallel_degree"
> DETAIL:  Valid values are between "0" and "262143".
> 
> postgres=# create table abd(n int) with (parallel_degree=262143);
> CREATE TABLE
> postgres=# insert into abd values (generate_series(1,1000000)); analyze
> abd; vacuum abd;
> INSERT 0 1000000
> ANALYZE
> 
> postgres=# set max_parallel_degree =2624444;
> ERROR:  2624444 is outside the valid range for parameter
> "max_parallel_degree" (0 .. 262143)
> 
> postgres=# set max_parallel_degree =262143;
> SET
> postgres=#
> 
> postgres=# explain analyze verbose select * from abd  where n<=1;
> ERROR:  requested shared memory size overflows size_t
> 
> if we remove the analyze keyword then query running successfully.
> 
> Expected = Is it not better to throw the error at the time of setting
> max_parallel_degree, if not supported ?

+1


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Mon, Apr 11, 2016 at 11:27 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 11/04/2016 15:56, tushar wrote:
>> On 04/08/2016 08:53 PM, Robert Haas wrote:
>>> On Fri, Apr 8, 2016 at 1:22 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> Other than that, patch looks good and I have marked it as Ready For
>>>> Committer.  Hope, we get this for 9.6.
>>> Committed.  I think this is likely to make parallel query
>>> significantly more usable in 9.6.
>>>
>> While testing ,I observed couple of things -
>>
>> Case 1 =Not accepting parallel seq scan when parallel_degree is set to 0
>>
>> postgres=# create table fok2(n int) with (parallel_degree=0);
>> CREATE TABLE
>> postgres=# insert into fok2 values (generate_series(1,1000000)); analyze
>> fok2; vacuum fok2;
>> INSERT 0 1000000
>> ANALYZE
>> VACUUM
>> postgres=# set max_parallel_degree =5;
>> SET
>> postgres=# explain analyze verbose   select * from fok2  where n<=10;
>>                                                   QUERY
>> PLAN
>> --------------------------------------------------------------------------------------------------------------
>>  Seq Scan on public.fok2  (cost=0.00..16925.00 rows=100 width=4) (actual
>> time=0.027..217.882 rows=10 loops=1)
>>    Output: n
>>    Filter: (fok2.n <= 10)
>>    Rows Removed by Filter: 999990
>>  Planning time: 0.084 ms
>>  Execution time: 217.935 ms
>> (6 rows)
>>
>> I am assuming parallel_degree=0 is as same as not using it  , i.e
>> create table fok2(n int) with (parallel_degree=0);  = create table
>> fok2(n int);
>>
>> so in this case it should have accepted the parallel seq .scan.
>>
>
> No, setting it to 0 means to force not using parallel workers (but
> considering the parallel path IIRC).

I'm not sure what the parenthesized bit means, because you can't use
parallelism without workers.  But I think I should have made the docs
more clear that 0 = don't parallelize scans of this table while
committing this.  Maybe we should go add a sentence about that.

> Even if you set a per-table parallel_degree higher than
> max_parallel_degree, it'll be maxed at max_parallel_degree.
>
> Then, the explain shows that the planner assumed it'll launch 9 workers,
> but only 8 were available (or needed perhaps) at runtime.

We should probably add the number of workers actually obtained to the
EXPLAIN ANALYZE output.  That's been requested before.

>> postgres=# set max_parallel_degree =2624444;
>> ERROR:  2624444 is outside the valid range for parameter
>> "max_parallel_degree" (0 .. 262143)
>>
>> postgres=# set max_parallel_degree =262143;
>> SET
>> postgres=#
>>
>> postgres=# explain analyze verbose select * from abd  where n<=1;
>> ERROR:  requested shared memory size overflows size_t
>>
>> if we remove the analyze keyword then query running successfully.
>>
>> Expected = Is it not better to throw the error at the time of setting
>> max_parallel_degree, if not supported ?
>
> +1

It surprises me that that request overflowed size_t.  I guess we
should look into why that's happening.  Did you test this on a 32-bit
system?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 11/04/2016 17:44, Robert Haas wrote:
> On Mon, Apr 11, 2016 at 11:27 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> On 11/04/2016 15:56, tushar wrote:
>>>
>>> I am assuming parallel_degree=0 is as same as not using it  , i.e
>>> create table fok2(n int) with (parallel_degree=0);  = create table
>>> fok2(n int);
>>>
>>> so in this case it should have accepted the parallel seq .scan.
>>>
>>
>> No, setting it to 0 means to force not using parallel workers (but
>> considering the parallel path IIRC).
> 
> I'm not sure what the parenthesized bit means, because you can't use
> parallelism without workers.

Obvious mistake, sorry.

>  But I think I should have made the docs
> more clear that 0 = don't parallelize scans of this table while
> committing this.  Maybe we should go add a sentence about that.
> 

What about

-      the setting of <xref linkend="guc-max-parallel-degree">.
+      the setting of <xref linkend="guc-max-parallel-degree">.  Setting
this
+      parameter to 0 will disable parallelism for that table.

>> Even if you set a per-table parallel_degree higher than
>> max_parallel_degree, it'll be maxed at max_parallel_degree.
>>
>> Then, the explain shows that the planner assumed it'll launch 9 workers,
>> but only 8 were available (or needed perhaps) at runtime.
> 
> We should probably add the number of workers actually obtained to the
> EXPLAIN ANALYZE output.  That's been requested before.
> 

If it's not too late for 9.6, it would be very great.

>>> postgres=# set max_parallel_degree =2624444;
>>> ERROR:  2624444 is outside the valid range for parameter
>>> "max_parallel_degree" (0 .. 262143)
>>>
>>> postgres=# set max_parallel_degree =262143;
>>> SET
>>> postgres=#
>>>
>>> postgres=# explain analyze verbose select * from abd  where n<=1;
>>> ERROR:  requested shared memory size overflows size_t
>>>
>>> if we remove the analyze keyword then query running successfully.
>>>
>>> Expected = Is it not better to throw the error at the time of setting
>>> max_parallel_degree, if not supported ?
>>
>> +1
> 
> It surprises me that that request overflowed size_t.  I guess we
> should look into why that's happening.  Did you test this on a 32-bit
> system?
> 

I can reproduce the same issue on a 64 bits system. Setting
max_parallel_degree to 32768 or above raise this error:

ERROR:  could not resize shared memory segment "/PostgreSQL.44279285" to
18446744072113360072 bytes: Invalid argument

On a 32 bits system, following assert fails:

TRAP: FailedAssertion("!(offset < total_bytes)", File: "shm_toc.c",
Line: 192)

After some gdb, it looks like the overflow comes from

/* Estimate space for tuple queues. */
shm_toc_estimate_chunk(&pcxt->estimator,                        
PARALLEL_TUPLE_QUEUE_SIZE * pcxt->nworkers);

372        shm_toc_estimate_chunk(&pcxt->estimator,
(gdb) p pcxt->estimator
$2 = {space_for_chunks = 3671712, number_of_keys = 3}
(gdb) n
374        shm_toc_estimate_keys(&pcxt->estimator, 1);
(gdb) p pcxt->estimator
$3 = {space_for_chunks = 18446744071565739680, number_of_keys = 3}


Following change fix the issue:

diff --git a/src/backend/executor/execParallel.c
b/src/backend/executor/execParallel.c
index 572a77b..0a5210e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -370,7 +370,7 @@ ExecInitParallelPlan(PlanState *planstate, EState
*estate, int nworkers)
   /* Estimate space for tuple queues. */   shm_toc_estimate_chunk(&pcxt->estimator,
-                          PARALLEL_TUPLE_QUEUE_SIZE * pcxt->nworkers);
+                          (Size) PARALLEL_TUPLE_QUEUE_SIZE *
pcxt->nworkers);   shm_toc_estimate_keys(&pcxt->estimator, 1);

But the query still fails with "ERROR:  out of shared memory".

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org



Re: Choosing parallel_degree

От
Simon Riggs
Дата:
On 8 April 2016 at 17:27, Paul Ramsey <pramsey@cleverelephant.ca> wrote:
 
PostGIS is just one voice...

We're listening.
 
>> Functions have very unequal CPU costs, and we're talking here about
>> using CPUs more effectively, why are costs being given the see-no-evil
>> treatment? This is as true in core as it is in PostGIS, even if our
>> case is a couple orders of magnitude more extreme: a filter based on a
>> complex combination of regex queries will use an order of magnitude
>> more CPU than one that does a little math, why plan and execute them
>> like they are the same?
>
> Functions have user assignable costs.

We have done a relatively bad job of globally costing our functions
thus far, because it mostly didn't make any difference. In my testing
[1], I found that costing could push better plans for parallel
sequence scans and parallel aggregates, though at very extreme cost
values (1000 for sequence scans and 10000 for aggregates)

Obviously, if costs can make a difference for 9.6 and parallelism
we'll rigorously ensure we have good, useful costs. I've already
costed many functions in my parallel postgis test branch [2]. Perhaps
the avoidance of cost so far is based on the relatively nebulous
definition it has: about the only thing in the docs is "If the cost is
not specified, 1 unit is assumed for C-language and internal
functions, and 100 units for functions in all other languages. Larger
values cause the planner to try to avoid evaluating the function more
often than necessary."

So what about C functions then? Should a string comparison be 5 and a
multiplication 1? An image histogram 1000?

We don't have a clear methodology for how to do this.

It's a single parameter to allow you to achieve the plans that work optimally. Hopefully that is simple enough for everyone to use and yet flexible enough to make a difference.

If its not what you need, show us and it may make the case for change.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Choosing parallel_degree

От
Simon Riggs
Дата:
On 8 April 2016 at 17:49, Robert Haas <robertmhaas@gmail.com> wrote:
 
With the patch, you can - if you wish - substitute
some other number for the one the planner comes up with.

I saw you're using AccessExclusiveLock, the reason being it affects SELECTs.

That is supposed to apply when things might change the answer from a SELECT, whereas this affects only the default for a plan.

Can I change this to a lower setting? I would have done this before applying the patch, but you beat me to it. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Choosing parallel_degree

От
Amit Kapila
Дата:
On Tue, Apr 12, 2016 at 3:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 8 April 2016 at 17:49, Robert Haas <robertmhaas@gmail.com> wrote:
 
With the patch, you can - if you wish - substitute
some other number for the one the planner comes up with.

I saw you're using AccessExclusiveLock, the reason being it affects SELECTs.

That is supposed to apply when things might change the answer from a SELECT, whereas this affects only the default for a plan.


By this theory, shouldn't any other parameter like n_distinct_inherited which just effects the plan required lower lock level?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Choosing parallel_degree

От
tushar
Дата:
On 04/11/2016 09:14 PM, Robert Haas wrote:
>>> postgres=# explain analyze verbose select * from abd  where n<=1;
>>> >>ERROR:  requested shared memory size overflows size_t
>>> >>
>>> >>if we remove the analyze keyword then query running successfully.
>>> >>
>>> >>Expected = Is it not better to throw the error at the time of setting
>>> >>max_parallel_degree, if not supported ?
>> >
>> >+1
> It surprises me that that request overflowed size_t.  I guess we
> should look into why that's happening.  Did you test this on a 32-bit
> system?
No, I tested on 64 bit machine.

-- 
regards,tushar




Re: Choosing parallel_degree

От
Simon Riggs
Дата:
On 12 April 2016 at 07:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 12, 2016 at 3:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 8 April 2016 at 17:49, Robert Haas <robertmhaas@gmail.com> wrote:
 
With the patch, you can - if you wish - substitute
some other number for the one the planner comes up with.

I saw you're using AccessExclusiveLock, the reason being it affects SELECTs.

That is supposed to apply when things might change the answer from a SELECT, whereas this affects only the default for a plan.


By this theory, shouldn't any other parameter like n_distinct_inherited which just effects the plan required lower lock level?

It should, yes, and I'm as surprised to see it isn't as you are.

Thread: Fabrizio was asked by Robert to provide or document an analysis of why each setting was OK to change; 9 days later he had not done so or replied, so I committed a reduced version of the patch that matched existing tests and code comments.

I guess we could have salvaged some more from it, but we didn't and there's never enough time. 

If RMT allows, that can be changed or it can wait.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Choosing parallel_degree

От
tushar
Дата:
On 04/11/2016 08:57 PM, Julien Rouhaud wrote:
>> >Expected = Expecting worker8 information , also loops=10 (including the
>> >Master)
>> >
> Even if you set a per-table parallel_degree higher than
> max_parallel_degree, it'll be maxed at max_parallel_degree.
>
> Then, the explain shows that the planner assumed it'll launch 9 workers,
> but only 8 were available (or needed perhaps) at runtime.
>
Right, if we increase max_worker_processes value in postgresql.conf file 
then
we are able to see the worker information in explain plan.

if parallel_degree value is higher than max_parallel_degree i.e

parallel_degree = 20,  max_parallel_degree=10  => [ select query 
accepting 10 workers ]

but in general  where table doesn't have parallel_degree set and 
max_parallel_degree is
set to 10 then select query is showing only 2 workers .

-- 
regards,tushar




Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Mon, Apr 11, 2016 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 8 April 2016 at 17:49, Robert Haas <robertmhaas@gmail.com> wrote:
>> With the patch, you can - if you wish - substitute
>> some other number for the one the planner comes up with.
>
> I saw you're using AccessExclusiveLock, the reason being it affects SELECTs.
>
> That is supposed to apply when things might change the answer from a SELECT,
> whereas this affects only the default for a plan.
>
> Can I change this to a lower setting? I would have done this before applying
> the patch, but you beat me to it.

I don't have a problem with reducing the lock level there, if we're
convinced that it's safe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
Simon Riggs
Дата:
On 12 April 2016 at 13:53, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Apr 11, 2016 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 8 April 2016 at 17:49, Robert Haas <robertmhaas@gmail.com> wrote:
>> With the patch, you can - if you wish - substitute
>> some other number for the one the planner comes up with.
>
> I saw you're using AccessExclusiveLock, the reason being it affects SELECTs.
>
> That is supposed to apply when things might change the answer from a SELECT,
> whereas this affects only the default for a plan.
>
> Can I change this to a lower setting? I would have done this before applying
> the patch, but you beat me to it.

I don't have a problem with reducing the lock level there, if we're
convinced that it's safe.

I'll run up a patch, with appropriate comments. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 11/04/2016 22:53, Julien Rouhaud wrote:
> On 11/04/2016 17:44, Robert Haas wrote:
>>
>> We should probably add the number of workers actually obtained to the
>> EXPLAIN ANALYZE output.  That's been requested before.
>>
>
> If it's not too late for 9.6, it would be very great.
>

Just in case I attach a patch to implement it. I'll add it to the next
commitfest.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Tue, Apr 12, 2016 at 6:31 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 11/04/2016 22:53, Julien Rouhaud wrote:
>> On 11/04/2016 17:44, Robert Haas wrote:
>>> We should probably add the number of workers actually obtained to the
>>> EXPLAIN ANALYZE output.  That's been requested before.
>>
>> If it's not too late for 9.6, it would be very great.
>
> Just in case I attach a patch to implement it. I'll add it to the next
> commitfest.

I think we should go with "Workers Planned" and "Workers Launched",
capitalized exactly that way, and lose "Number Of".

I would be inclined to view this as a reasonable 9.6 cleanup of
parallel query, but other people may wish to construe things more
strictly than I would.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
Julien Rouhaud
Дата:
On 13/04/2016 19:17, Robert Haas wrote:
> On Tue, Apr 12, 2016 at 6:31 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> On 11/04/2016 22:53, Julien Rouhaud wrote:
>>> On 11/04/2016 17:44, Robert Haas wrote:
>>>> We should probably add the number of workers actually obtained to the
>>>> EXPLAIN ANALYZE output.  That's been requested before.
>>>
>>> If it's not too late for 9.6, it would be very great.
>>
>> Just in case I attach a patch to implement it. I'll add it to the next
>> commitfest.
>
> I think we should go with "Workers Planned" and "Workers Launched",
> capitalized exactly that way, and lose "Number Of".
>

Fixed

> I would be inclined to view this as a reasonable 9.6 cleanup of
> parallel query, but other people may wish to construe things more
> strictly than I would.
>

FWIW, I also see it as a reasonable cleanup.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

Re: Choosing parallel_degree

От
Amit Kapila
Дата:
On Wed, Apr 13, 2016 at 10:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
>
> I would be inclined to view this as a reasonable 9.6 cleanup of
> parallel query, but other people may wish to construe things more
> strictly than I would.
>

+1.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Wed, Apr 13, 2016 at 2:21 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
>> I think we should go with "Workers Planned" and "Workers Launched",
>> capitalized exactly that way, and lose "Number Of".
>
> Fixed
>
>> I would be inclined to view this as a reasonable 9.6 cleanup of
>> parallel query, but other people may wish to construe things more
>> strictly than I would.
>
> FWIW, I also see it as a reasonable cleanup.

Hearing no dissent, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
Simon Riggs
Дата:
On 12 April 2016 at 14:11, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 12 April 2016 at 13:53, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Mon, Apr 11, 2016 at 5:45 PM, Simon Riggs <simon@2ndquadrant.com>
>> wrote:
>> > On 8 April 2016 at 17:49, Robert Haas <robertmhaas@gmail.com> wrote:
>> >> With the patch, you can - if you wish - substitute
>> >> some other number for the one the planner comes up with.
>> >
>> > I saw you're using AccessExclusiveLock, the reason being it affects
>> > SELECTs.
>> >
>> > That is supposed to apply when things might change the answer from a
>> > SELECT,
>> > whereas this affects only the default for a plan.
>> >
>> > Can I change this to a lower setting? I would have done this before
>> > applying
>> > the patch, but you beat me to it.
>>
>> I don't have a problem with reducing the lock level there, if we're
>> convinced that it's safe.
>
>
> I'll run up a patch, with appropriate comments.

Attached

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Thu, Sep 1, 2016 at 9:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> > Can I change this to a lower setting? I would have done this before
>>> > applying
>>> > the patch, but you beat me to it.
>>>
>>> I don't have a problem with reducing the lock level there, if we're
>>> convinced that it's safe.
>>
>>
>> I'll run up a patch, with appropriate comments.
>
> Attached

This should really be posted on a new thread, since it changes a bunch
of reloptions, not only parallel_workers.  I can't immediately think
of a reason why the changes wouldn't be safe, but I've failed to fully
apprehend all of the possible dangers multiple times previously, so we
should try to give everyone who might have ideas about this topic a
chance to chime in with anything we might be missing.

I do think this comment is confusing:

+ *        This value is not locked by the transaction, so this value may
+ *        be changed while a SELECT that has used these values for planning
+ *        is still executing.

I don't know what it means for "this value" to be locked, or not
locked, by the transaction.  Basically, I have no idea what this is
trying to explain.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Choosing parallel_degree

От
Simon Riggs
Дата:
On 14 September 2016 at 14:48, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 1, 2016 at 9:39 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> > Can I change this to a lower setting? I would have done this before
>>>> > applying
>>>> > the patch, but you beat me to it.
>>>>
>>>> I don't have a problem with reducing the lock level there, if we're
>>>> convinced that it's safe.
>>>
>>>
>>> I'll run up a patch, with appropriate comments.
>>
>> Attached
>
> This should really be posted on a new thread, since it changes a bunch
> of reloptions, not only parallel_workers.  I can't immediately think
> of a reason why the changes wouldn't be safe, but I've failed to fully
> apprehend all of the possible dangers multiple times previously, so we
> should try to give everyone who might have ideas about this topic a
> chance to chime in with anything we might be missing.

OK. Will post on new thread.

> I do think this comment is confusing:
>
> + *        This value is not locked by the transaction, so this value may
> + *        be changed while a SELECT that has used these values for planning
> + *        is still executing.
>
> I don't know what it means for "this value" to be locked, or not
> locked, by the transaction.  Basically, I have no idea what this is
> trying to explain.

You're quoting that without context from the line above, which is
"get_tablespace_io_concurrency"

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Choosing parallel_degree

От
Robert Haas
Дата:
On Wed, Sep 14, 2016 at 3:54 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I do think this comment is confusing:
>>
>> + *        This value is not locked by the transaction, so this value may
>> + *        be changed while a SELECT that has used these values for planning
>> + *        is still executing.
>>
>> I don't know what it means for "this value" to be locked, or not
>> locked, by the transaction.  Basically, I have no idea what this is
>> trying to explain.
>
> You're quoting that without context from the line above, which is
> "get_tablespace_io_concurrency"

Sure, but it doesn't make any sense to talk about
tablespace_io_concurrency being locked by a transaction.  At least not
that I can see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company