Обсуждение: Rename max_parallel_degree?

Поиск

Список

Период

Сортировка

Rename max_parallel_degree?

От

Bruce Momjian

Дата:

24 апреля 2016 г., 06:59:04

Why is the parallelism variable called "max_parallel_degree"?  Is that a
descriptive name?  What does "degree" mean?  Why is it not called
"max_parallel_workers"?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

24 апреля 2016 г., 08:51:07

On Sun, Apr 24, 2016 at 9:28 AM, Bruce Momjian <bruce@momjian.us> wrote:
>
> Why is the parallelism variable called "max_parallel_degree"? Is that a
> descriptive name? What does "degree" mean?

It is to denote amount of parallelism at node level.

>
> Why is it not called
> "max_parallel_workers"?
>

Degree of Parallelism is a term used in many of the other popular databases for the similar purpose, so I think that is another reason to prefer max_parallel_degree.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

24 апреля 2016 г., 09:01:17

On Sat, Apr 23, 2016 at 8:58 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Why is the parallelism variable called "max_parallel_degree"?  Is that a
> descriptive name?  What does "degree" mean?  Why is it not called
> "max_parallel_workers"?

I also think that "max_parallel_workers" works better.

While certain other systems have a concept of a "maximum parallel
degree", it is not the same. Those other systems disable parallelism
altogether when "max parallel degree" is 1, whereas Postgres uses 1
parallel worker along with a leader process. And so, parallelism
*will* still be used on Postgres. That's a pretty significant
difference.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

"Joshua D. Drake"

Дата:

24 апреля 2016 г., 20:01:34

On 04/23/2016 11:00 PM, Peter Geoghegan wrote:
> On Sat, Apr 23, 2016 at 8:58 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> Why is the parallelism variable called "max_parallel_degree"?  Is that a
>> descriptive name?  What does "degree" mean?  Why is it not called
>> "max_parallel_workers"?
>
> I also think that "max_parallel_workers" works better.

+1.

JD


-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

24 апреля 2016 г., 21:08:46

On Sat, Apr 23, 2016 at 11:58 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Why is the parallelism variable called "max_parallel_degree"?  Is that a
> descriptive name?  What does "degree" mean?  Why is it not called
> "max_parallel_workers"?

Because "degree of parallelism" is standard terminology, I guess.

Also, consider that we have the related but actually sorta different
GUC max_worker_processes.  I think max_parallel_workers to control the
per-query behavior and max_worker_processes to control the global
system behavior would be confusing - those names are very close
together.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

24 апреля 2016 г., 21:23:56

Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Apr 23, 2016 at 11:58 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> Why is the parallelism variable called "max_parallel_degree"?  Is that a
>> descriptive name?  What does "degree" mean?  Why is it not called
>> "max_parallel_workers"?

> Because "degree of parallelism" is standard terminology, I guess.

FWIW, I agree with Bruce that using "degree" here is a poor choice.
It's an unnecessary dependence on technical terminology that many people
will not be familiar with.

> Also, consider that we have the related but actually sorta different
> GUC max_worker_processes.  I think max_parallel_workers to control the
> per-query behavior and max_worker_processes to control the global
> system behavior would be confusing - those names are very close
> together.

Well, that just says that we'd better reconsider *both* of those names.
Frankly, neither of them is well chosen, and the fact that they currently
sound unrelated is a bug not a feature.  What about something like
"max_overall_worker_processes" and "max_session_worker_processes"?
        regards, tom lane

Re: Rename max_parallel_degree?

От

Magnus Hagander

Дата:

24 апреля 2016 г., 21:35:03

On Sun, Apr 24, 2016 at 8:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Apr 23, 2016 at 11:58 PM, Bruce Momjian <bruce@momjian.us> wrote:
>> Why is the parallelism variable called "max_parallel_degree"? Is that a
>> descriptive name? What does "degree" mean? Why is it not called
>> "max_parallel_workers"?

> Because "degree of parallelism" is standard terminology, I guess.

FWIW, I agree with Bruce that using "degree" here is a poor choice.
It's an unnecessary dependence on technical terminology that many people
will not be familiar with.

FWIW, SQL Server calls it "degree of parallelism" as well (https://technet.microsoft.com/en-us/library/ms188611(v=sql.105).aspx). And their configuration option is "max degree of parallelism": https://technet.microsoft.com/en-us/library/ms181007(v=sql.105).aspx.

Oracle also call it "degree of parallelism" (https://docs.oracle.com/cd/E18283_01/server.112/e17110/initparams176.htm).

So it's certainly not a made-up term. And I'd go as far as to say that most people coming from other databases would be familiar with it. It may not be a standard, but clearly it's very close to being that.

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

24 апреля 2016 г., 21:43:05

Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Apr 24, 2016 at 8:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> FWIW, I agree with Bruce that using "degree" here is a poor choice.
>> It's an unnecessary dependence on technical terminology that many people
>> will not be familiar with.

> FWIW, SQL Server calls it "degree of parallelism" as well (
> https://technet.microsoft.com/en-us/library/ms188611(v=sql.105).aspx). And
> their configuration option is "max degree of parallelism":
> https://technet.microsoft.com/en-us/library/ms181007(v=sql.105).aspx.

Yes, but both they and Oracle appear to consider "degree" to mean the
total number of processors used, not the number of secondary jobs in
addition to the main one.  The only thing worse than employing obscure
technical terminology is employing it incorrectly: that way, you get to
confuse both the users who know what it means and those who don't.

The fact that we couldn't get this right seems to me to be sufficient
evidence that we should stay away from the term "degree".
        regards, tom lane

Re: Rename max_parallel_degree?

От

Bruce Momjian

Дата:

25 апреля 2016 г., 03:26:06

On Sun, Apr 24, 2016 at 02:23:43PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Also, consider that we have the related but actually sorta different
> > GUC max_worker_processes.  I think max_parallel_workers to control the
> > per-query behavior and max_worker_processes to control the global
> > system behavior would be confusing - those names are very close
> > together.
> 
> Well, that just says that we'd better reconsider *both* of those names.
> Frankly, neither of them is well chosen, and the fact that they currently
> sound unrelated is a bug not a feature.  What about something like
> "max_overall_worker_processes" and "max_session_worker_processes"?

I don't think "overall" works.  I am think
"max_cluster_worker_processes" and "max_session_worker_processes" works.
I guess you could also use "max_server_worker_processes", but that is
referring to the database server, not the OS-level server.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

25 апреля 2016 г., 05:44:38

On Sun, Apr 24, 2016 at 2:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> FWIW, I agree with Bruce that using "degree" here is a poor choice.
> It's an unnecessary dependence on technical terminology that many people
> will not be familiar with.

And many others will.  Some made-up term that is entirely
PostgreSQL-specific is not going to be better.

> Well, that just says that we'd better reconsider *both* of those names.
> Frankly, neither of them is well chosen, and the fact that they currently
> sound unrelated is a bug not a feature.  What about something like
> "max_overall_worker_processes" and "max_session_worker_processes"?

The first one is fine except for the IMHO-fatal defect that
max_worker_processes has been around since 9.4, and I think it's far
too late to rename it.  Should we rename max_connections to
max_overall_connections at the same time?

The second one is wrong for at least two reasons: it's not a
per-session maximum, and it's not a maximum of all worker processes,
just parallel workers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

25 апреля 2016 г., 06:01:48

On Sun, Apr 24, 2016 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Sun, Apr 24, 2016 at 8:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> FWIW, I agree with Bruce that using "degree" here is a poor choice.
>>> It's an unnecessary dependence on technical terminology that many people
>>> will not be familiar with.
>
>> FWIW, SQL Server calls it "degree of parallelism" as well (
>> https://technet.microsoft.com/en-us/library/ms188611(v=sql.105).aspx). And
>> their configuration option is "max degree of parallelism":
>> https://technet.microsoft.com/en-us/library/ms181007(v=sql.105).aspx.
>
> Yes, but both they and Oracle appear to consider "degree" to mean the
> total number of processors used, not the number of secondary jobs in
> addition to the main one.  The only thing worse than employing obscure
> technical terminology is employing it incorrectly: that way, you get to
> confuse both the users who know what it means and those who don't.

This is not so clear-cut as you are making it out to be.  For example,
see http://www.akadia.com/services/ora_parallel_processing.html - viz
"The number of parallel slave processes associated with an operation
is called its degree of parallelism", which is pretty close to what
the parameter currently called max_parallel_degree actually does.

See also http://searchitchannel.techtarget.com/feature/Using-parallel-SQL-to-improve-Oracle-database-performance
- "The Degree of Parallelism (DOP) defines the number of parallel
streams of execution that will be created. In the simplest case, this
translates to the number of parallel slave processes enlisted to
support your SQL's execution. However, the number of parallel
processes is more often twice the DOP. This is because each step in a
nontrivial execution plan needs to feed data into the subsequent step,
so two sets of processes are required to maintain the parallel stream
of processing."

Other sources disagree, of course, as is often the case with Oracle.  Wahoo.

Of course, we could make this value 1-based rather than 0-based, as
Peter Geoghegan suggested a while back.  But as I think I said at the
time, I think that's more misleading than helpful.  The leader
participates in the parallel plan, but typically does far less of the
work beneath the Gather node than the other nodes involved in the
query, often almost none.  In short, the leader is special.
Pretending that it's just another process involved in the parallel
group isn't doing anyone a favor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Magnus Hagander

Дата:

25 апреля 2016 г., 13:18:02

On Mon, Apr 25, 2016 at 5:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Apr 24, 2016 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Sun, Apr 24, 2016 at 8:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> FWIW, I agree with Bruce that using "degree" here is a poor choice.
>>> It's an unnecessary dependence on technical terminology that many people
>>> will not be familiar with.
>
>> FWIW, SQL Server calls it "degree of parallelism" as well (
>> https://technet.microsoft.com/en-us/library/ms188611(v=sql.105).aspx). And
>> their configuration option is "max degree of parallelism":
>> https://technet.microsoft.com/en-us/library/ms181007(v=sql.105).aspx.
>
> Yes, but both they and Oracle appear to consider "degree" to mean the
> total number of processors used, not the number of secondary jobs in
> addition to the main one. The only thing worse than employing obscure
> technical terminology is employing it incorrectly: that way, you get to
> confuse both the users who know what it means and those who don't.

This is not so clear-cut as you are making it out to be. For example,
see http://www.akadia.com/services/ora_parallel_processing.html - viz
"The number of parallel slave processes associated with an operation
is called its degree of parallelism", which is pretty close to what
the parameter currently called max_parallel_degree actually does.

So maybe something like session_parallel_degree, to add another color to the bikeshed?

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: Rename max_parallel_degree?

От

Geoff Winkless

Дата:

25 апреля 2016 г., 15:18:40

On 25 April 2016 at 03:44, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Apr 24, 2016 at 2:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> FWIW, I agree with Bruce that using "degree" here is a poor choice.
>> It's an unnecessary dependence on technical terminology that many people
>> will not be familiar with.
>
> And many others will.  Some made-up term that is entirely
> PostgreSQL-specific is not going to be better.

Just to add my 2c, "degree" implies some sort of graded scale. So
setting it to (say) 10 would be "maximum", setting to 0 would be
"none" and setting it to anything in between would be relative to the
maximum.

eg in Vol26 "Encyclopedia of Computer Science and Technology" (the
first compsci reference that appeared for a google search) there are
three levels of granularity of degrees of parallelism.

https://books.google.co.uk/books?id=z4KECsT59NwC&pg=PA41&lpg=PA41&dq=degree+of+parallelism

Frankly it seems that the SQL crowd stole the computer science term
and applied it incorrectly.

Having a configuration option "_workers" makes much more sense to me.
It absolutely describes what it does without needing to refer to a
manual, and it removes ambiguity.

Geoff

Re: Rename max_parallel_degree?

От

Bruce Momjian

Дата:

25 апреля 2016 г., 17:04:05

On Mon, Apr 25, 2016 at 12:17:56PM +0200, Magnus Hagander wrote:
> 
> 
> On Mon, Apr 25, 2016 at 5:01 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> 
>     On Sun, Apr 24, 2016 at 2:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>     > Magnus Hagander <magnus@hagander.net> writes:
>     >> On Sun, Apr 24, 2016 at 8:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>     >>> FWIW, I agree with Bruce that using "degree" here is a poor choice.
>     >>> It's an unnecessary dependence on technical terminology that many
>     people
>     >>> will not be familiar with.
>     >
>     >> FWIW, SQL Server calls it "degree of parallelism" as well (
>     >> https://technet.microsoft.com/en-us/library/ms188611(v=sql.105).aspx).
>     And
>     >> their configuration option is "max degree of parallelism":
>     >> https://technet.microsoft.com/en-us/library/ms181007(v=sql.105).aspx.
>     >
>     > Yes, but both they and Oracle appear to consider "degree" to mean the
>     > total number of processors used, not the number of secondary jobs in
>     > addition to the main one.  The only thing worse than employing obscure
>     > technical terminology is employing it incorrectly: that way, you get to
>     > confuse both the users who know what it means and those who don't.
> 
>     This is not so clear-cut as you are making it out to be.  For example,
>     see http://www.akadia.com/services/ora_parallel_processing.html - viz
>     "The number of parallel slave processes associated with an operation
>     is called its degree of parallelism", which is pretty close to what
>     the parameter currently called max_parallel_degree actually does.
> 
> 
> 
> So maybe something like  session_parallel_degree, to add another color to the
> bikeshed?

I think Robert said it is per-executor node, not per session, similar to
work_mem.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Re: Rename max_parallel_degree?

От

Bruce Momjian

Дата:

25 апреля 2016 г., 17:08:29

On Sun, Apr 24, 2016 at 11:21:01AM +0530, Amit Kapila wrote:
> On Sun, Apr 24, 2016 at 9:28 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > Why is the parallelism variable called "max_parallel_degree"?  Is that a
> > descriptive name?  What does "degree" mean?
> 
> It is to denote amount of parallelism at node level.                                          ----------

Sorry, it was Amit who said the parallelism is per-node, meaning calling
the variable "session" would not work.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

Re: Rename max_parallel_degree?

От

"Joshua D. Drake"

Дата:

25 апреля 2016 г., 18:27:31

On 04/25/2016 07:08 AM, Bruce Momjian wrote:
> On Sun, Apr 24, 2016 at 11:21:01AM +0530, Amit Kapila wrote:
>> On Sun, Apr 24, 2016 at 9:28 AM, Bruce Momjian <bruce@momjian.us> wrote:
>>>
>>> Why is the parallelism variable called "max_parallel_degree"?  Is that a
>>> descriptive name?  What does "degree" mean?
>>
>> It is to denote amount of parallelism at node level.
>                                             ----------
>
> Sorry, it was Amit who said the parallelism is per-node, meaning calling
> the variable "session" would not work.

max_parallel_nodes

jD


>


-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

25 апреля 2016 г., 19:04:29

On Mon, Apr 25, 2016 at 11:27 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
> max_parallel_nodes

I hope you are trolling me.  It does not bound the maximum number of
parallel nodes, but rather the maximum number of workers per parallel
node.  In most cases, a query is only going to have one parallel node,
but sometimes it could have more than one.  In all of the actual cases
tested by me thus far, the workers for one node terminate before we
try to launch workers for another node, but it is theoretically
possible to have a query plan where this isn't the case.  It is
getting a bit irritating to me to people who seem not even to have
read the existing documentation for this GUC, let alone tried it out
and gotten familiar with what it does, are convinced they know how it
should be changed.

On the underlying issue, the max_parallel_degree naming is tied into
the parallel_degree reloption and the parallel_degree field that is
part of each Path.  If you want to rename any of those, you are going
to need to rename them all, I think.  So we could do
max_parallel_degree -> max_parallel_workers_per_executor_node and
parallel_degree -> target_number_of_parallel_workers, but that is
rather longwinded and it's very difficult to shorten those very much
without losing clarity.  I grant that there is some learning curve in
getting familiar with new terms of art, like "parallel degree", but in
the long run it's better to ask people to learn a few new terms than
to trying to repeat the whole description of what the thing is every
time you refer to it.  We've got existing terms of art like
"multixact" or even "wraparound" that are far, far more obscure than
this, and those are entirely PostgreSQL-specific things without a hint
of any parallel in other products.

The concept of parallel degree is not going away, and its reach
extends well beyond the GUC.  We can rename it to some other term, but
this one is pretty standard.  Trying to avoid having a term for it is
not going to work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

"Joshua D. Drake"

Дата:

25 апреля 2016 г., 21:38:17

On 04/25/2016 09:04 AM, Robert Haas wrote:
> On Mon, Apr 25, 2016 at 11:27 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
>> max_parallel_nodes
>
> I hope you are trolling me.

Actually I wasn't. It is usually a little more obvious when I troll, 
subtlety is not exactly my strong suit ;)

The only reason I suggested that name was because of this comment:

"I think Robert said it is per-executor node, not per session, similar 
to work_mem." (from Bruce)

If it doesn't make sense, I apologize for the noise.

Sincerely,

JD

-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

Re: Rename max_parallel_degree?

От

Alvaro Herrera

Дата:

25 апреля 2016 г., 22:08:02

Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > On Sun, Apr 24, 2016 at 8:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> FWIW, I agree with Bruce that using "degree" here is a poor choice.
> >> It's an unnecessary dependence on technical terminology that many people
> >> will not be familiar with.
> 
> > FWIW, SQL Server calls it "degree of parallelism" as well (
> > https://technet.microsoft.com/en-us/library/ms188611(v=sql.105).aspx). And
> > their configuration option is "max degree of parallelism":
> > https://technet.microsoft.com/en-us/library/ms181007(v=sql.105).aspx.
> 
> Yes, but both they and Oracle appear to consider "degree" to mean the
> total number of processors used, not the number of secondary jobs in
> addition to the main one.  The only thing worse than employing obscure
> technical terminology is employing it incorrectly:

What about calling it something even simpler, such as "max_parallelism"?
This avoids such cargo cult, and there's no implication that it's
per-query.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

25 апреля 2016 г., 23:10:41

On Mon, Apr 25, 2016 at 2:58 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Tom Lane wrote:
>> Magnus Hagander <magnus@hagander.net> writes:
>> > On Sun, Apr 24, 2016 at 8:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> FWIW, I agree with Bruce that using "degree" here is a poor choice.
>> >> It's an unnecessary dependence on technical terminology that many people
>> >> will not be familiar with.
>>
>> > FWIW, SQL Server calls it "degree of parallelism" as well (
>> > https://technet.microsoft.com/en-us/library/ms188611(v=sql.105).aspx). And
>> > their configuration option is "max degree of parallelism":
>> > https://technet.microsoft.com/en-us/library/ms181007(v=sql.105).aspx.
>>
>> Yes, but both they and Oracle appear to consider "degree" to mean the
>> total number of processors used, not the number of secondary jobs in
>> addition to the main one.  The only thing worse than employing obscure
>> technical terminology is employing it incorrectly:
>
> What about calling it something even simpler, such as "max_parallelism"?
> This avoids such cargo cult, and there's no implication that it's
> per-query.

So what would we call the "parallel_degree" member of the Path data
structure, and the "parallel_degree" reloption?  I don't think
renaming either of those to "parallelism" is going to be an
improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Gavin Flower

Дата:

25 апреля 2016 г., 23:20:56

On 26/04/16 06:38, Joshua D. Drake wrote:
> On 04/25/2016 09:04 AM, Robert Haas wrote:
>> On Mon, Apr 25, 2016 at 11:27 AM, Joshua D. Drake 
>> <jd@commandprompt.com> wrote:
>>> max_parallel_nodes
>>
>> I hope you are trolling me.
>
> Actually I wasn't. It is usually a little more obvious when I troll, 
> subtlety is not exactly my strong suit ;)
>
[...]

I've been reading the pg mailings lists for over ten years...

Just when I've noticed tempers flaring, and a flame war starting - it is 
already over!  In essence, I've never seen any genuine trolling or 
flamewars - people are just too friendly & cooperative.  Though I have 
seen strong disagreements, people are very careful to keep the tone from 
getting too heated, yet still manage to show their passion.

Cheers,
Gavin

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

25 апреля 2016 г., 23:24:55

Robert Haas <robertmhaas@gmail.com> writes:
>> What about calling it something even simpler, such as "max_parallelism"?
>> This avoids such cargo cult, and there's no implication that it's
>> per-query.

> So what would we call the "parallel_degree" member of the Path data
> structure, and the "parallel_degree" reloption?  I don't think
> renaming either of those to "parallelism" is going to be an
> improvement.

I think we should rename all of these to something based on the concept of
"number of worker processes", and adjust the code if necessary to match.
I think the "degree" terminology is fundamentally tainted by the question
of whether or not it counts the leader, and that we will have bugs (or
indeed may have them today) caused by getting that wrong.  Your arguments
for not changing it seem to me not to address that point; you've merely
focused on the question of whether we have the replacement terminology
right.  If we don't, let's make it so, but the current situation is not
good.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

25 апреля 2016 г., 23:32:53

On Mon, Apr 25, 2016 at 1:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think we should rename all of these to something based on the concept of
> "number of worker processes", and adjust the code if necessary to match.
> I think the "degree" terminology is fundamentally tainted by the question
> of whether or not it counts the leader, and that we will have bugs (or
> indeed may have them today) caused by getting that wrong.

FWIW, my concern was always limited to that. I don't actually mind if
we use the "degree" terminology, as long as our usage is consistent
with that of other major systems. Since the GUC's behavior isn't going
to change now, the terminology should change. I'm fine with that. I'm
not particularly concerned with the specifics of some new terminology,
as long as it's consistent with the idea of auxiliary worker processes
that cooperate with a leader process.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

25 апреля 2016 г., 23:45:27

On Mon, Apr 25, 2016 at 4:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>>> What about calling it something even simpler, such as "max_parallelism"?
>>> This avoids such cargo cult, and there's no implication that it's
>>> per-query.
>
>> So what would we call the "parallel_degree" member of the Path data
>> structure, and the "parallel_degree" reloption?  I don't think
>> renaming either of those to "parallelism" is going to be an
>> improvement.
>
> I think we should rename all of these to something based on the concept of
> "number of worker processes", and adjust the code if necessary to match.

"worker processes" are a more general term than "parallel workers".  I
think if you conflate those two things, which I've tried quite hard to
keep separate in code and comments, there will be no end of confusion.
All parallel workers are background workers, aka worker processes, but
the reverse is false.

Also, when you think about the parallel degree of a path or plan, it
is the number of workers for which that path or plan has been costed,
which may or may not match the number we actually get.  I think of the
parallel_degree as the DESIRED number of workers for a path or plan
more than the actual number.  I think if we call it "number of
parallel workers" or something like that we had better think about
whether we're implying something other than that.  Perhaps the meaning
of the existing term "parallel degree" isn't as fully explicated as
would be desirable, but changing it to something else that definitely
doesn't mean what this was intended to mean won't be better.

> I think the "degree" terminology is fundamentally tainted by the question
> of whether or not it counts the leader, and that we will have bugs (or
> indeed may have them today) caused by getting that wrong.

This theory does not seem very plausible to me.  I don't really see
how that could happen, although perhaps I'm blinded by being too close
to the feature.  Also, you haven't presented entirely convincing
evidence that other people are all united in the way they view this
and that that way is different than PostgreSQL; and I've submitted
some contrary evidence.  Even if Oracle for example does do it
differently than what I've done here, slavishly following Oracle has
never been a prerequisite for regarding a PostgreSQL feature as
well-designed.  I think it is far more likely that going and
offsetting the value of parallel_degree by 1 everywhere, as Peter has
proposed, is going to introduce subtle bugs.

BTW, I am told by some of my colleagues that hinting /* PARALLEL 0 */
in Oracle hints no-parallelism, and that hinting /* PARALLEL 1 */
hints the use of one worker in addition to the main process.  I
haven't tried to verify this myself.

> Your arguments
> for not changing it seem to me not to address that point; you've merely
> focused on the question of whether we have the replacement terminology
> right.  If we don't, let's make it so, but the current situation is not
> good.

I don't agree that the current situation isn't good, but I'm willing
to change it anyway if we can come up with something that's actually
better.  In the absence of that, it makes no sense to change anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

26 апреля 2016 г., 00:16:02

On Mon, Apr 25, 2016 at 1:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think the "degree" terminology is fundamentally tainted by the question
>> of whether or not it counts the leader, and that we will have bugs (or
>> indeed may have them today) caused by getting that wrong.
>
> This theory does not seem very plausible to me.  I don't really see
> how that could happen, although perhaps I'm blinded by being too close
> to the feature.  Also, you haven't presented entirely convincing
> evidence that other people are all united in the way they view this
> and that that way is different than PostgreSQL; and I've submitted
> some contrary evidence.

SQL Server definitely disables parallel query when max degree of
parallelism = 1. The default, 0, is "auto". I think that a DoP of 1 in
Oracle also disables parallelism.

> Even if Oracle for example does do it
> differently than what I've done here, slavishly following Oracle has
> never been a prerequisite for regarding a PostgreSQL feature as
> well-designed.  I think it is far more likely that going and
> offsetting the value of parallel_degree by 1 everywhere, as Peter has
> proposed, is going to introduce subtle bugs.

That was one approach that I mentioned as a theoretically sound way of
maintaining the "degree" terminology. This was several months back.
I'm not arguing for that.

I'm also not arguing for slavishly following Oracle or SQL Server.
Rather, my position is that as long as we're going to use their
terminology, we should also offer something roughly compatible with
their behavior. I think that not using their terminology is also a
reasonable solution.

I'm not sure about anyone else, but my complaint is entirely about the
baggage that the term "degree of parallelism" happens to have, and how
effectively that has been managed.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

Alvaro Herrera

Дата:

26 апреля 2016 г., 18:22:29

Robert Haas wrote:
> On Mon, Apr 25, 2016 at 2:58 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:

> > What about calling it something even simpler, such as "max_parallelism"?
> > This avoids such cargo cult, and there's no implication that it's
> > per-query.
> 
> So what would we call the "parallel_degree" member of the Path data
> structure, and the "parallel_degree" reloption?  I don't think
> renaming either of those to "parallelism" is going to be an
> improvement.

I think we should define the UI first, *then* decide what to call the
internal variable names.  In most cases we're able to call the variables
the same as the user-visible names, but not always and there's no rule
that it must be so.  Having source code variable names determine what
the user visible name is seems to me like putting the cart before the
horse.

I think the word "degree" is largely seen as a bad idea: it would become
a somewhat better idea only if we change how it works so that it matches
what other DBMSs do, but you oppose that.  Hence my proposal to get rid
of that word in the UI.  (My first thought yesterday was to look for
synonyms for the "degree" word, so I got as far as "amount of
parallelism" when I realized that such accompanying words add no value
and so we might as well not have any word there.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

26 апреля 2016 г., 18:29:08

On Tue, Apr 26, 2016 at 11:22 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>> On Mon, Apr 25, 2016 at 2:58 PM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>
>> > What about calling it something even simpler, such as "max_parallelism"?
>> > This avoids such cargo cult, and there's no implication that it's
>> > per-query.
>>
>> So what would we call the "parallel_degree" member of the Path data
>> structure, and the "parallel_degree" reloption?  I don't think
>> renaming either of those to "parallelism" is going to be an
>> improvement.
>
> I think we should define the UI first, *then* decide what to call the
> internal variable names.  In most cases we're able to call the variables
> the same as the user-visible names, but not always and there's no rule
> that it must be so.  Having source code variable names determine what
> the user visible name is seems to me like putting the cart before the
> horse.
>
> I think the word "degree" is largely seen as a bad idea: it would become
> a somewhat better idea only if we change how it works so that it matches
> what other DBMSs do, but you oppose that.  Hence my proposal to get rid
> of that word in the UI.  (My first thought yesterday was to look for
> synonyms for the "degree" word, so I got as far as "amount of
> parallelism" when I realized that such accompanying words add no value
> and so we might as well not have any word there.)

Well I agree with that up to a point, but I think ALTER TABLE foo SET
(parallelism = 4) is not a model of clarity.  "parallelism" or
"parallel" is not obviously an integer quality.  I guess we could
s/parallel_degree/parallel_workers/g.  I find that terminology less
elegant than "parallel degree", but I can live with it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

26 апреля 2016 г., 18:35:00

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Apr 26, 2016 at 11:22 AM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>> I think the word "degree" is largely seen as a bad idea: it would become
>> a somewhat better idea only if we change how it works so that it matches
>> what other DBMSs do, but you oppose that.  Hence my proposal to get rid
>> of that word in the UI.

> Well I agree with that up to a point, but I think ALTER TABLE foo SET
> (parallelism = 4) is not a model of clarity.  "parallelism" or
> "parallel" is not obviously an integer quality.  I guess we could
> s/parallel_degree/parallel_workers/g.  I find that terminology less
> elegant than "parallel degree", but I can live with it.

Shouldn't it be "max_parallel_workers", at least in some contexts?
Otherwise, I'd read it as a promise that exactly that many workers
will be used.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

26 апреля 2016 г., 18:41:44

On Tue, Apr 26, 2016 at 11:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Apr 26, 2016 at 11:22 AM, Alvaro Herrera
>> <alvherre@2ndquadrant.com> wrote:
>>> I think the word "degree" is largely seen as a bad idea: it would become
>>> a somewhat better idea only if we change how it works so that it matches
>>> what other DBMSs do, but you oppose that.  Hence my proposal to get rid
>>> of that word in the UI.
>
>> Well I agree with that up to a point, but I think ALTER TABLE foo SET
>> (parallelism = 4) is not a model of clarity.  "parallelism" or
>> "parallel" is not obviously an integer quality.  I guess we could
>> s/parallel_degree/parallel_workers/g.  I find that terminology less
>> elegant than "parallel degree", but I can live with it.
>
> Shouldn't it be "max_parallel_workers", at least in some contexts?
> Otherwise, I'd read it as a promise that exactly that many workers
> will be used.

Yeah, I guess it would be parallel_degree -> parallel_workers and
max_parallel_degree -> max_parallel_workers.  I still think
max_parallel_workers is confusingly similar to max_worker_processes,
but nothing's going to make everyone completely happy here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

26 апреля 2016 г., 18:44:46

Robert Haas <robertmhaas@gmail.com> writes:
> I still think
> max_parallel_workers is confusingly similar to max_worker_processes,
> but nothing's going to make everyone completely happy here.

Well, what was suggested upthread was to change all of these to follow
the pattern max_foo_workers or max_foo_worker_processes, where foo would
(hopefully) clarify the scope in which the limitation applies.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

26 апреля 2016 г., 18:49:06

On Tue, Apr 26, 2016 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I still think
>> max_parallel_workers is confusingly similar to max_worker_processes,
>> but nothing's going to make everyone completely happy here.
>
> Well, what was suggested upthread was to change all of these to follow
> the pattern max_foo_workers or max_foo_worker_processes, where foo would
> (hopefully) clarify the scope in which the limitation applies.

Well, I don't like max_node_parallel_degree much.  We don't call it
max_node_work_mem.  And node is not exactly a term that's going to be
more familiar to the average PostgreSQL user than parallel degree is
to (apparently) the average PostgreSQL developer.  I think at some
point adding noise words hurts more than it helps, and you've just got
to ask people to RTFM if they really want to understand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

"Daniel Verite"

Дата:

27 апреля 2016 г., 20:05:40

Robert Haas wrote:

> Of course, we could make this value 1-based rather than 0-based, as
> Peter Geoghegan suggested a while back.  But as I think I said at the
> time, I think that's more misleading than helpful.  The leader
> participates in the parallel plan, but typically does far less of the
> work beneath the Gather node than the other nodes involved in the
> query, often almost none.  In short, the leader is special.
> Pretending that it's just another process involved in the parallel
> group isn't doing anyone a favor.

FWIW, that's not how it looks from the outside (top or vmstat).
I'm ignorant about how parallel tasks are assigned in the planner,
but when trying various values for max_parallel_degree and running
simple aggregates on large tables on a single 4 core CPU doing
nothing else, I'm only ever seeing max_parallel_degree+1 processes
indiscriminately at work, often in the same state (R running or
D waiting for disk).

Also when looking at exec times, for a CPU-bound sample query, I get
for instance the results below, when increasing parallelism one step
at a time, on a 4-core CPU.
I've checked with EXPLAIN that the planner allocates each time
a number of workers that's exactly equal to max_parallel_degree.
("Workers Planned" under the Gather node).
mp_degree | exec_time | speedup (against degree=0)
-----------+-----------+--------- 0 | 10850.835 |    1.00 1 |  5833.208 |    1.86 2 |  3990.144 |    2.72 3 |  3165.606
|   3.43 4 |  3315.153 |    3.27 5 |  3333.931 |    3.25 6 |  3354.995 |    3.23 

If the leader didn't do much work here, how would degree=1 produce
such a speedup (1.86) ?

There's also the fact that allowing 4 workers does not help compared
to 3, even though there are 4 cores here. Again, doesn't it make sense
only if the leader is as important as the workers in terms of CPU
usage?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

28 апреля 2016 г., 17:41:29

On Wed, Apr 27, 2016 at 1:05 PM, Daniel Verite <daniel@manitou-mail.org> wrote:
>         Robert Haas wrote:
>> Of course, we could make this value 1-based rather than 0-based, as
>> Peter Geoghegan suggested a while back.  But as I think I said at the
>> time, I think that's more misleading than helpful.  The leader
>> participates in the parallel plan, but typically does far less of the
>> work beneath the Gather node than the other nodes involved in the
>> query, often almost none.  In short, the leader is special.
>> Pretending that it's just another process involved in the parallel
>> group isn't doing anyone a favor.
>
> FWIW, that's not how it looks from the outside (top or vmstat).
> I'm ignorant about how parallel tasks are assigned in the planner,
> but when trying various values for max_parallel_degree and running
> simple aggregates on large tables on a single 4 core CPU doing
> nothing else, I'm only ever seeing max_parallel_degree+1 processes
> indiscriminately at work, often in the same state (R running or
> D waiting for disk).

Right, but they're probably not doing the SAME work.  You can look at
EXPLAIN (ANALYZE, VERBOSE, BUFFERS) to see.  Of course, all the work
above the Gather node is being done by the leader, but the stuff below
the Gather node often has a bit of participation from the leader, but
is mostly the workers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

David Rowley

Дата:

30 апреля 2016 г., 11:54:20

On 29 April 2016 at 02:41, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Apr 27, 2016 at 1:05 PM, Daniel Verite <daniel@manitou-mail.org> wrote:
>>         Robert Haas wrote:
>>> Of course, we could make this value 1-based rather than 0-based, as
>>> Peter Geoghegan suggested a while back.  But as I think I said at the
>>> time, I think that's more misleading than helpful.  The leader
>>> participates in the parallel plan, but typically does far less of the
>>> work beneath the Gather node than the other nodes involved in the
>>> query, often almost none.  In short, the leader is special.
>>> Pretending that it's just another process involved in the parallel
>>> group isn't doing anyone a favor.
>>
>> FWIW, that's not how it looks from the outside (top or vmstat).
>> I'm ignorant about how parallel tasks are assigned in the planner,
>> but when trying various values for max_parallel_degree and running
>> simple aggregates on large tables on a single 4 core CPU doing
>> nothing else, I'm only ever seeing max_parallel_degree+1 processes
>> indiscriminately at work, often in the same state (R running or
>> D waiting for disk).
>
> Right, but they're probably not doing the SAME work.  You can look at
> EXPLAIN (ANALYZE, VERBOSE, BUFFERS) to see.  Of course, all the work
> above the Gather node is being done by the leader, but the stuff below
> the Gather node often has a bit of participation from the leader, but
> is mostly the workers.

Robert, I'd imagine that most of your tests to make you think what you
do would have come from testing parallel seq scan, where perhaps
Daniel's comes from testing something like parallel aggregates, or at
least something that gives the workers a decent amount of work per
tuple returned.

With the setting;

# set max_parallel_degree = 8;

Given a table like;

# create table t1 (num int not null);

Populated with;
# insert into t1 select generate_Series(1,10000000);

Given the query;

# explain (analyze, verbose) select count(*) from t1;

if we think about what'll happen here, each worker will go off and
grab all of the tuples it can and aggregate each one, meanwhile the
main process would otherwise be quite idle waiting for the workers to
come back with their partially aggregated results, so instead, to keep
itself busy, goes off and helps them out. We can see this is true in
the explain analyze verbose output;
                                                                  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------Finalize
Aggregate (cost=80508.54..80508.55 rows=1 width=8) (actual
 
time=605.019..605.019 rows=1 loops=1)  Output: pg_catalog.count(*)  ->  Gather  (cost=80508.13..80508.53 rows=4
width=8)(actual
 
time=604.799..605.011 rows=5 loops=1)        Output: (count(*))        Workers Planned: 4        Workers Launched: 4
   ->  Partial Aggregate  (cost=79508.13..79508.13 rows=1
 
width=8) (actual time=585.099..585.100 rows=1 loops=5)              Output: count(*)              Worker 0: actual
time=579.736..579.736rows=1 loops=1              Worker 1: actual time=580.669..580.669 rows=1 loops=1
Worker2: actual time=580.512..580.513 rows=1 loops=1              Worker 3: actual time=580.649..580.649 rows=1 loops=1
            ->  Parallel Seq Scan on public.t1
 
(cost=0.00..72456.10 rows=2820810 width=0) (actual time=2.310..404.978
rows=2000000 loops=5)                    Output: num                    Worker 0: actual time=2.231..397.251
rows=1892702loops=1                    Worker 1: actual time=3.107..403.436 rows=1983602 loops=1
Worker2: actual time=3.030..403.082 rows=1952188 loops=1                    Worker 3: actual time=3.135..404.756
rows=2039650loops=1
 

If we look at total the number of rows that each of the workers
managed to chew through, and subtract from the total rows in t1;

# select 10000000 - (1892702 + 1983602 + 1952188 + 2039650);?column?
---------- 2131858
(1 row)

So the main process managed to get through 2131858 rows while waiting
for the helpers finishing their work. Daniel looks right.

Another example, this time no aggregation;

postgres=# explain (analyze,verbose) select * from t1 where num > 9500000;
             QUERY PLAN
 

------------------------------------------------------------------------------------------------------------------------------------Gather
(cost=1000.00..126718.15 rows=502200 width=4) (actual
 
time=454.071..694.967 rows=500000 loops=1)  Output: num  Workers Planned: 4  Workers Launched: 4  ->  Parallel Seq Scan
onpublic.t1  (cost=0.00..76753.65
 
rows=125550 width=4) (actual time=430.240..451.990 rows=100000
loops=5)        Output: num        Filter: (t1.num > 9500000)        Rows Removed by Filter: 1900000        Worker 0:
actualtime=421.220..449.077 rows=130128 loops=1        Worker 1: actual time=428.549..456.059 rows=125430 loops=1
Worker 2: actual time=421.372..447.751 rows=113904 loops=1        Worker 3: actual time=427.140..454.118 rows=130402
loops=1

# select 500000 - (130128 + 125430 + 113904 + 130402);?column?
----------     136
(1 row)

The main process only managed to get time for 136 rows! Robert is right.

So I'd say this very much depends on how busy the main process is
pulling rows from each worker.

It would also be quite nice if we could see at a glance how much the
main process did, without having to go subtracting what all the
workers managed to do.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

02 мая 2016 г., 19:53:01

On Sat, Apr 30, 2016 at 4:54 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> Right, but they're probably not doing the SAME work.  You can look at
>> EXPLAIN (ANALYZE, VERBOSE, BUFFERS) to see.  Of course, all the work
>> above the Gather node is being done by the leader, but the stuff below
>> the Gather node often has a bit of participation from the leader, but
>> is mostly the workers.
>
> Robert, I'd imagine that most of your tests to make you think what you
> do would have come from testing parallel seq scan, where perhaps
> Daniel's comes from testing something like parallel aggregates, or at
> least something that gives the workers a decent amount of work per
> tuple returned.

Sure, if you're doing parallel aggregate with a small number of output
groups, the work distribution will be much more nearly equal.  I
wasn't trying to dispute that.  There are many cases where it doesn't
work out that way, though.

> So I'd say this very much depends on how busy the main process is
> pulling rows from each worker.

Absolutely.

> It would also be quite nice if we could see at a glance how much the
> main process did, without having to go subtracting what all the
> workers managed to do.

I agree, but that looked considerably more invasive, because of the
way that the worker results get aggregated into the leader's result.
There may be a clever way to avoid that, but I didn't see it.  Or we
may want to do it anyway despite it requiring more invasive changes.
I just wasn't in a hurry to rush into all that while trying to get
parallel query v1 out the door.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

02 мая 2016 г., 21:17:34

On Tue, Apr 26, 2016 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Apr 26, 2016 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I still think
>>> max_parallel_workers is confusingly similar to max_worker_processes,
>>> but nothing's going to make everyone completely happy here.
>>
>> Well, what was suggested upthread was to change all of these to follow
>> the pattern max_foo_workers or max_foo_worker_processes, where foo would
>> (hopefully) clarify the scope in which the limitation applies.
>
> Well, I don't like max_node_parallel_degree much.  We don't call it
> max_node_work_mem.  And node is not exactly a term that's going to be
> more familiar to the average PostgreSQL user than parallel degree is
> to (apparently) the average PostgreSQL developer.  I think at some
> point adding noise words hurts more than it helps, and you've just got
> to ask people to RTFM if they really want to understand.

If we're going to change this before beta1, we need to do it soon.
IMHO, the only reasonably sane idea that's been proposed thus far is
to rename as follows:

max_parallel_degree -> max_parallel_workers
parallel_degree -> parallel_workers

I would prefer to keep it as "degree".  It's a reasonable term of art,
and it also improves grep-ability.  But I'm willing to go do the above
renaming if there is a clear consensus behind it.  Alternatively, I'm
willing to make it 1-based rather than 0-based if there is a clear
consensus on that option, though unsurprisingly I prefer it the way it
is now.  Do we have such a consensus?

To summarize the positions as I understand them:

Magnus seems OK with the way things are.
Peter wants to change either the fact that it is 0-based or the fact
that it is called degree, but is OK with either.
Tom doesn't like "degree" and also thinks anything called degree
should 1-based, but it sounds like he's more interested in changing
the first thing than the second one
Bruce and JD seemed to like degree -> workers.
JD also suggested another option that nobody else endorsed.
Alvaro suggested another option that nobody else endorsed.

Does anyone else want to vote?  Does anyone who previously cast a vote
wish to change it or clarify their position?  I think I am reading
that degree -> workers will please most people (just not me).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

02 мая 2016 г., 21:44:41

On Sun, Apr 24, 2016 at 8:01 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Of course, we could make this value 1-based rather than 0-based, as
Peter Geoghegan suggested a while back. But as I think I said at the
time, I think that's more misleading than helpful. The leader
participates in the parallel plan, but typically does far less of the
work beneath the Gather node than the other nodes involved in the
query, often almost none. In short, the leader is special.
Pretending that it's just another process involved in the parallel
group isn't doing anyone a favor.

Does this apply to the extent that a value of 1 is likely worse than 0 since the leader is now tasked with accumulating but there is only one process actually working to provide the leader data?

I'm inclined to accept max_parallel_workers where a value of 0 means no parallelism and the non-zero counts indicate the number of workers in addition to the required leader.

Though that does suggest "additional" as a valid option. Something like "max_additional_workers". Just how overloaded is the term "worker". If worker is understood to mean "a process which implements execution of [part of] a query plan" the word additional leaves no ambiguity as to where the leader is accounted for.

It does significantly reduce grep-ability though :(

max_additional_parallel_workers...

David J.

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

02 мая 2016 г., 22:14:59

On Mon, May 2, 2016 at 2:44 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Does this apply to the extent that a value of 1 is likely worse than 0 since
> the leader is now tasked with accumulating but there is only one process
> actually working to provide the leader data?

I don't know what that means, but it doesn't work like that.  If the
worker can't generate data fast enough, the leader will also run the
parallel portion of the plan.  So 1 is unlikely to be worse than 0; in
fact it's often a lot better.

> I'm inclined to accept max_parallel_workers where a value of 0 means no
> parallelism and the non-zero counts indicate the number of workers in
> addition to the required leader.

That's how it works now.

> Though that does suggest "additional" as a valid option.  Something like
> "max_additional_workers".  Just how overloaded is the term "worker".  If
> worker is understood to mean "a process which implements execution of [part
> of] a query plan" the word additional leaves no ambiguity as to where the
> leader is accounted for.
>
> It does significantly reduce grep-ability though :(
>
> max_additional_parallel_workers...

I don't think that it's likely to be very clear what "additional"
refers to in this context.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Christoph Berg

Дата:

02 мая 2016 г., 22:18:43

Re: Robert Haas 2016-05-02 <CA+TgmobRmK649eDYvF6dgnQJNJVJvZffDz674wD+GWqCcb=YjQ@mail.gmail.com>
> max_parallel_degree -> max_parallel_workers
> parallel_degree -> parallel_workers
> 
> I would prefer to keep it as "degree".  It's a reasonable term of art,
> and it also improves grep-ability.  But I'm willing to go do the above
> renaming if there is a clear consensus behind it.  Alternatively, I'm
> willing to make it 1-based rather than 0-based if there is a clear
> consensus on that option, though unsurprisingly I prefer it the way it
> is now.  Do we have such a consensus?

Fwiw the one thing I remember from when I read first about the feature
was a big "wtf if I set that to 1, I'll actually get 2 processes?". So
+1 on doing *something* about it.

Christoph

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

02 мая 2016 г., 22:22:57

On Mon, May 2, 2016 at 3:18 PM, Christoph Berg <myon@debian.org> wrote:
> Re: Robert Haas 2016-05-02 <CA+TgmobRmK649eDYvF6dgnQJNJVJvZffDz674wD+GWqCcb=YjQ@mail.gmail.com>
>> max_parallel_degree -> max_parallel_workers
>> parallel_degree -> parallel_workers
>>
>> I would prefer to keep it as "degree".  It's a reasonable term of art,
>> and it also improves grep-ability.  But I'm willing to go do the above
>> renaming if there is a clear consensus behind it.  Alternatively, I'm
>> willing to make it 1-based rather than 0-based if there is a clear
>> consensus on that option, though unsurprisingly I prefer it the way it
>> is now.  Do we have such a consensus?
>
> Fwiw the one thing I remember from when I read first about the feature
> was a big "wtf if I set that to 1, I'll actually get 2 processes?". So
> +1 on doing *something* about it.

To be clear, you'll get 1 new process in addition to the 1 that is
already running.  But vote noted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

02 мая 2016 г., 22:49:19

On Mon, May 2, 2016 at 12:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, May 2, 2016 at 2:44 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Does this apply to the extent that a value of 1 is likely worse than 0 since
> the leader is now tasked with accumulating but there is only one process
> actually working to provide the leader data?

I don't know what that means, but it doesn't work like that. If the
worker can't generate data fast enough, the leader will also run the
parallel portion of the plan. So 1 is unlikely to be worse than 0; in
fact it's often a lot better.

> I'm inclined to accept max_parallel_workers where a value of 0 means no
> parallelism and the non-zero counts indicate the number of workers in
> addition to the required leader.

That's how it works now.

Ok, that's basically what I was trying to figure out. Whether the leader will take on the role of "worker" at lower levels of "parallelism" since if it was dedicated to only aggregating data from other workers it would be a net loss if only one other worker existed.

> Though that does suggest "additional" as a valid option. Something like
> "max_additional_workers". Just how overloaded is the term "worker". If
> worker is understood to mean "a process which implements execution of [part
> of] a query plan" the word additional leaves no ambiguity as to where the
> leader is accounted for.
>
> It does significantly reduce grep-ability though :(
>
> max_additional_parallel_workers...

I don't think that it's likely to be very clear what "additional"
refers to in this context.

I'm not sure how "what" could be interpreted as anything other than "parallel worker"...which I presumed is recognized by the user otherwise "max_parallel_workers" itself exhibits the same problem.

I could see how code usage could be annoying, having to always "+1" the value to get total number of potential workers, but the UI, for me, removes any question of of whether the leader is counted in the set of "parallel workers".

David J.

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

04 мая 2016 г., 06:12:41

On Mon, May 2, 2016 at 11:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Apr 26, 2016 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > On Tue, Apr 26, 2016 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Robert Haas <robertmhaas@gmail.com> writes:
>
> To summarize the positions as I understand them:
>
> Magnus seems OK with the way things are.
> Peter wants to change either the fact that it is 0-based or the fact
> that it is called degree, but is OK with either.
> Tom doesn't like "degree" and also thinks anything called degree
> should 1-based, but it sounds like he's more interested in changing
> the first thing than the second one
> Bruce and JD seemed to like degree -> workers.
> JD also suggested another option that nobody else endorsed.
> Alvaro suggested another option that nobody else endorsed.
>
> Does anyone else want to vote?

I think the way it is currently in HEAD seems easy to correlate how the feature works, but may be it appears to me that way because I am involved from long time with this implementation. I also think one can easily confused among max_parallel_workers and max_worker_processes, so if we want to change, my vote goes with changing the default of max_parallel_degree to 1 (as suggested by Peter G.).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

David Rowley

Дата:

04 мая 2016 г., 06:42:46

On 4 May 2016 at 15:12, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, May 2, 2016 at 11:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Tue, Apr 26, 2016 at 11:49 AM, Robert Haas <robertmhaas@gmail.com>
>> wrote:
>> > On Tue, Apr 26, 2016 at 11:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> Robert Haas <robertmhaas@gmail.com> writes:
>>
>> To summarize the positions as I understand them:
>>
>> Magnus seems OK with the way things are.
>> Peter wants to change either the fact that it is 0-based or the fact
>> that it is called degree, but is OK with either.
>> Tom doesn't like "degree" and also thinks anything called degree
>> should 1-based, but it sounds like he's more interested in changing
>> the first thing than the second one
>> Bruce and JD seemed to like degree -> workers.
>> JD also suggested another option that nobody else endorsed.
>> Alvaro suggested another option that nobody else endorsed.
>>
>> Does anyone else want to vote?
>>
>
> I think the way it is currently in HEAD seems easy to correlate how the
> feature works, but may be it appears to me that way because I am involved
> from long time with this implementation.   I also think one can easily
> confused among max_parallel_workers and max_worker_processes, so if we want
> to change, my vote goes with changing the default of max_parallel_degree to
> 1 (as suggested by Peter G.).

I'd like to put my +1 against making the current GUCs with their
current names 1-based, rather than 0-based. Doing anything else like
giving them new names seems like reinventing the wheel.

My reasoning is that the only gripe that I understand against the
current names is that the "degree" term appears not to be aligned with
what other databases do.

I think that actual rows / (degree+1) might get confusing for people,
such as in the following EXPLAIN ANALYZE output.
        Workers Launched: 2        ->  Partial Aggregate  (cost=96331.21..96331.22 rows=1
width=8) (actual time=584.297..584.297 rows=1 loops=3)              ->  Parallel Seq Scan on a  (cost=0.00..85914.57
rows=4166657 width=0) (actual time=1.566..347.091 rows=3333333
loops=3)

The above would make more sense with max_parallel_degree=3.

I also think that the parallel query, at its best will have the
workers working hard for their tuples. In such cases the main process
will be helping out much more, and the more it helps the more a
1-based degree makes sense. Also I can't stretch my imagination enough
to imagine how any other database can handle worker tuples any
differently than us... Surely their main process/thread must marshal
worker's tuples the same as what we do? And if they use a 1-based
degree for that, then surely we can too.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

06 мая 2016 г., 21:52:36

On Tue, May 3, 2016 at 11:42 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>>> Magnus seems OK with the way things are.
>>> Peter wants to change either the fact that it is 0-based or the fact
>>> that it is called degree, but is OK with either.
>>> Tom doesn't like "degree" and also thinks anything called degree
>>> should 1-based, but it sounds like he's more interested in changing
>>> the first thing than the second one
>>> Bruce and JD seemed to like degree -> workers.
>>> JD also suggested another option that nobody else endorsed.
>>> Alvaro suggested another option that nobody else endorsed.
>>>
>>> Does anyone else want to vote?
>>
>> I think the way it is currently in HEAD seems easy to correlate how the
>> feature works, but may be it appears to me that way because I am involved
>> from long time with this implementation.   I also think one can easily
>> confused among max_parallel_workers and max_worker_processes, so if we want
>> to change, my vote goes with changing the default of max_parallel_degree to
>> 1 (as suggested by Peter G.).
>
> I'd like to put my +1 against making the current GUCs with their
> current names 1-based, rather than 0-based. Doing anything else like
> giving them new names seems like reinventing the wheel.
>
> My reasoning is that the only gripe that I understand against the
> current names is that the "degree" term appears not to be aligned with
> what other databases do.
>
> I think that actual rows / (degree+1) might get confusing for people,
> such as in the following EXPLAIN ANALYZE output.
>
>          Workers Launched: 2
>          ->  Partial Aggregate  (cost=96331.21..96331.22 rows=1
> width=8) (actual time=584.297..584.297 rows=1 loops=3)
>                ->  Parallel Seq Scan on a  (cost=0.00..85914.57
> rows=4166657 width=0) (actual time=1.566..347.091 rows=3333333
> loops=3)
>
> The above would make more sense with max_parallel_degree=3.
>
> I also think that the parallel query, at its best will have the
> workers working hard for their tuples. In such cases the main process
> will be helping out much more, and the more it helps the more a
> 1-based degree makes sense. Also I can't stretch my imagination enough
> to imagine how any other database can handle worker tuples any
> differently than us... Surely their main process/thread must marshal
> worker's tuples the same as what we do? And if they use a 1-based
> degree for that, then surely we can too.

OK, my reading of this thread is that there is a consensus is to
redefine max_parallel_degree=1 as "no parallelism" and
max_parallel_degree>1 as "parallelism using a leader plus N-1
workers", and along with that, to keep the names unchanged.  However,
I don't think I can get that done before beta1, at least not without a
serious risk of breaking stuff.  I can look at this post-beta1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Noah Misch

Дата:

29 мая 2016 г., 08:33:09

On Fri, May 06, 2016 at 02:52:30PM -0400, Robert Haas wrote:
> OK, my reading of this thread is that there is a consensus is to
> redefine max_parallel_degree=1 as "no parallelism" and
> max_parallel_degree>1 as "parallelism using a leader plus N-1
> workers", and along with that, to keep the names unchanged.  However,
> I don't think I can get that done before beta1, at least not without a
> serious risk of breaking stuff.  I can look at this post-beta1.

[This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

31 мая 2016 г., 19:16:00

On Sun, May 29, 2016 at 1:33 AM, Noah Misch <noah@leadboat.com> wrote:
> On Fri, May 06, 2016 at 02:52:30PM -0400, Robert Haas wrote:
>> OK, my reading of this thread is that there is a consensus is to
>> redefine max_parallel_degree=1 as "no parallelism" and
>> max_parallel_degree>1 as "parallelism using a leader plus N-1
>> workers", and along with that, to keep the names unchanged.  However,
>> I don't think I can get that done before beta1, at least not without a
>> serious risk of breaking stuff.  I can look at this post-beta1.
>
> [This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.

Here is a patch.  Note that I still don't agree with this change, but
I'm bowing to the will of the group.

I think that some of the people who were in favor of this change
should review this patch, including especially the language I wrote
for the documentation.  If that happens, and the reviews are positive,
then I will commit this.  If that does not happen, then I will
interpret that to mean that there isn't actually all that much
interest in changing this after all and will accordingly recommend
that this open item be removed without further action.

Here is a test which shows how it works:

rhaas=# set max_parallel_degree = 100;
SET
rhaas=# alter table pgbench_accounts set (parallel_degree = 10);
ALTER TABLE
rhaas=# explain (analyze) select count(*) from pgbench_accounts;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=177436.04..177436.05 rows=1 width=8)
(actual time=383.244..383.244 rows=1 loops=1)
   ->  Gather  (cost=177435.00..177436.01 rows=10 width=8) (actual
time=383.040..383.237 rows=9 loops=1)
         Workers Planned: 9
         Workers Launched: 8
         ->  Partial Aggregate  (cost=176435.00..176435.01 rows=1
width=8) (actual time=375.569..375.569 rows=1 loops=9)
               ->  Parallel Seq Scan on pgbench_accounts
(cost=0.00..173935.00 rows=1000000 width=0) (actual
time=0.103..260.352 rows=1111111 loops=9)
 Planning time: 0.135 ms
 Execution time: 384.387 ms
(8 rows)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

parallel-degree-1based.patch

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 19:41:19

On 05/31/2016 09:15 AM, Robert Haas wrote:
> On Sun, May 29, 2016 at 1:33 AM, Noah Misch <noah@leadboat.com> wrote:
>> On Fri, May 06, 2016 at 02:52:30PM -0400, Robert Haas wrote:
>>> OK, my reading of this thread is that there is a consensus is to
>>> redefine max_parallel_degree=1 as "no parallelism" and
>>> max_parallel_degree>1 as "parallelism using a leader plus N-1
>>> workers", and along with that, to keep the names unchanged.  However,
>>> I don't think I can get that done before beta1, at least not without a
>>> serious risk of breaking stuff.  I can look at this post-beta1.
>>
>> [This is a generic notification.]
>>
>> The above-described topic is currently a PostgreSQL 9.6 open item.  Robert,
>> since you committed the patch believed to have created it, you own this open
>> item.  If some other commit is more relevant or if this does not belong as a
>> 9.6 open item, please let us know.  Otherwise, please observe the policy on
>> open item ownership[1] and send a status update within 72 hours of this
>> message.  Include a date for your subsequent status update.  Testers may
>> discover new open items at any time, and I want to plan to get them all fixed
>> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
>> efforts toward speedy resolution.  Thanks.
> 
> Here is a patch.  Note that I still don't agree with this change, but
> I'm bowing to the will of the group.
> 
> I think that some of the people who were in favor of this change
> should review this patch, including especially the language I wrote
> for the documentation.  If that happens, and the reviews are positive,
> then I will commit this.  If that does not happen, then I will
> interpret that to mean that there isn't actually all that much
> interest in changing this after all and will accordingly recommend
> that this open item be removed without further action.
> 
> Here is a test which shows how it works:
> 
> rhaas=# set max_parallel_degree = 100;
> SET
> rhaas=# alter table pgbench_accounts set (parallel_degree = 10);
> ALTER TABLE
> rhaas=# explain (analyze) select count(*) from pgbench_accounts;
> 
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------
>  Finalize Aggregate  (cost=177436.04..177436.05 rows=1 width=8)
> (actual time=383.244..383.244 rows=1 loops=1)
>    ->  Gather  (cost=177435.00..177436.01 rows=10 width=8) (actual
> time=383.040..383.237 rows=9 loops=1)
>          Workers Planned: 9
>          Workers Launched: 8


I realize there's a lot of water under the bridge here, but I think
we're going to get 1000 questions on -general of the type:  "I asked for
8 parallel workers, why did I only get 7?".  I believe we will regret
this change.

So, one vote from me to revert.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 20:03:40

Josh berkus <josh@agliodbs.com> writes:
> I realize there's a lot of water under the bridge here, but I think
> we're going to get 1000 questions on -general of the type:  "I asked for
> 8 parallel workers, why did I only get 7?".  I believe we will regret
> this change.
> So, one vote from me to revert.

Well, that gets back to the question of whether average users will
understand the "degree" terminology.  For the record, while I do not
like the current behavior either, this was not the solution I favored.
I thought we should rename the GUC and keep it as meaning the number
of additional worker processes.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 20:10:40

On 05/31/2016 10:03 AM, Tom Lane wrote:
> Josh berkus <josh@agliodbs.com> writes:
>> I realize there's a lot of water under the bridge here, but I think
>> we're going to get 1000 questions on -general of the type:  "I asked for
>> 8 parallel workers, why did I only get 7?".  I believe we will regret
>> this change.
>> So, one vote from me to revert.
> 
> Well, that gets back to the question of whether average users will
> understand the "degree" terminology.  For the record, while I do not
> like the current behavior either, this was not the solution I favored.
> I thought we should rename the GUC and keep it as meaning the number
> of additional worker processes.

I will happily bet anyone a nice dinner in Ottawa that most users will
not understand it.

Compare this:

"max_parallel is the maximum number of parallel workers which will work
on each stage of the query which is parallizable.  If you set it to 4,
you get up to 4 workers."

with this:

"max_parallel_degree is the amount of parallelism in the query, with the
understanding that the original parent process counts as 1, which means
that if you set it to 1 you get no parallelism, and if you want 4
parallel workers you need to set it to 5."

Which one of those is going to require more explanations on -general and
-novice?  Bets?

Let's not be complicated for the sake of being complicated.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

"Joshua D. Drake"

Дата:

31 мая 2016 г., 20:14:28

On 05/31/2016 10:10 AM, Josh berkus wrote:

> Compare this:
>
> "max_parallel is the maximum number of parallel workers which will work
> on each stage of the query which is parallizable.  If you set it to 4,
> you get up to 4 workers."
>
> with this:
>
> "max_parallel_degree is the amount of parallelism in the query, with the
> understanding that the original parent process counts as 1, which means
> that if you set it to 1 you get no parallelism, and if you want 4
> parallel workers you need to set it to 5."
>
> Which one of those is going to require more explanations on -general and
> -novice?  Bets?
>
> Let's not be complicated for the sake of being complicated.
>

+1

-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

31 мая 2016 г., 20:17:00

On Tue, May 31, 2016 at 10:10 AM, Josh berkus <josh@agliodbs.com> wrote:
> "max_parallel_degree is the amount of parallelism in the query, with the
> understanding that the original parent process counts as 1, which means
> that if you set it to 1 you get no parallelism, and if you want 4
> parallel workers you need to set it to 5."
>
> Which one of those is going to require more explanations on -general and
> -novice?  Bets?
>
> Let's not be complicated for the sake of being complicated.

But the distinction between parallel workers and backends that can
participate in parallel query does need to be user-visible. Worker
processes are a commodity (i.e. the user must consider
max_worker_processes).

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 20:23:21

On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 10:10 AM, Josh berkus <josh@agliodbs.com> wrote:
>> "max_parallel_degree is the amount of parallelism in the query, with the
>> understanding that the original parent process counts as 1, which means
>> that if you set it to 1 you get no parallelism, and if you want 4
>> parallel workers you need to set it to 5."
>>
>> Which one of those is going to require more explanations on -general and
>> -novice?  Bets?
>>
>> Let's not be complicated for the sake of being complicated.
> 
> But the distinction between parallel workers and backends that can
> participate in parallel query does need to be user-visible. Worker
> processes are a commodity (i.e. the user must consider
> max_worker_processes).

It's still WAY simpler to understand "max_parallel is the number of
parallel workers I requested".

Any system where you set it to 2 and get only 1 worker on an idle system
is going to cause endless queries on the mailing lists.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 20:30:49

Josh berkus <josh@agliodbs.com> writes:
> On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
>> But the distinction between parallel workers and backends that can
>> participate in parallel query does need to be user-visible. Worker
>> processes are a commodity (i.e. the user must consider
>> max_worker_processes).

> It's still WAY simpler to understand "max_parallel is the number of
> parallel workers I requested".

> Any system where you set it to 2 and get only 1 worker on an idle system
> is going to cause endless queries on the mailing lists.

I really think that a GUC named "max_parallel_workers", which in fact
limits the number of workers and not something else, is the way to go.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

31 мая 2016 г., 20:38:30

On Tue, May 31, 2016 at 10:23 AM, Josh berkus <josh@agliodbs.com> wrote:
> It's still WAY simpler to understand "max_parallel is the number of
> parallel workers I requested".

(Sorry Josh, somehow hit reply, not reply-all)

Yes, it is. But as long as parallel workers are not really that
distinct to the leader-as-worker when executing a parallel query, then
you have another consideration. Which is that you need to care about
how many cores your query uses first and foremost, and not the number
of parallel workers used. I don't think that having only one worker
will cause too much confusion, because users will trust that we won't
allow something that simply makes no sense to happen.

In my parallel create index patch, the leader participates as a worker
to scan and sort runs. It's identical to a worker, practically
speaking, at least until time comes to merge those runs. Similarly,
parallel aggregate does not really have much for the leader process to
do other than act as a worker.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

"Joshua D. Drake"

Дата:

31 мая 2016 г., 20:41:18

On 05/31/2016 10:30 AM, Tom Lane wrote:
> Josh berkus <josh@agliodbs.com> writes:
>> On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
>>> But the distinction between parallel workers and backends that can
>>> participate in parallel query does need to be user-visible. Worker
>>> processes are a commodity (i.e. the user must consider
>>> max_worker_processes).
>
>> It's still WAY simpler to understand "max_parallel is the number of
>> parallel workers I requested".
>
>> Any system where you set it to 2 and get only 1 worker on an idle system
>> is going to cause endless queries on the mailing lists.
>
> I really think that a GUC named "max_parallel_workers", which in fact
> limits the number of workers and not something else, is the way to go.
>

I agree with Tom here. If we are being pedantic for the sake of being 
pedantic we are just causing frustration. The term max_parallel_workers 
is simple, easy to understand and accurate enough.

Sincerely,

JD


-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 20:46:44

On 05/31/2016 10:38 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 10:23 AM, Josh berkus <josh@agliodbs.com> wrote:
>> It's still WAY simpler to understand "max_parallel is the number of
>> parallel workers I requested".
> 
> (Sorry Josh, somehow hit reply, not reply-all)
> 
> Yes, it is. But as long as parallel workers are not really that
> distinct to the leader-as-worker when executing a parallel query, then
> you have another consideration. Which is that you need to care about
> how many cores your query uses first and foremost, and not the number
> of parallel workers used. I don't think that having only one worker
> will cause too much confusion, because users will trust that we won't
> allow something that simply makes no sense to happen.
> 
> In my parallel create index patch, the leader participates as a worker
> to scan and sort runs. It's identical to a worker, practically
> speaking, at least until time comes to merge those runs. Similarly,
> parallel aggregate does not really have much for the leader process to
> do other than act as a worker.

In parallel seq scan and join, do the "masters" behave as workers as well?


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

31 мая 2016 г., 20:51:33

On Tue, May 31, 2016 at 10:46 AM, Josh berkus <josh@agliodbs.com> wrote:
> In parallel seq scan and join, do the "masters" behave as workers as well?

It depends. They will if they can. If the parallel seq scan leader
isn't getting enough work to do from workers (enough tuples to process
from the shared memory queue), it will start acting as a worker fairly quickly.
With parallel aggregate, and some other cases, that will always happen.

Even when the leader is consuming input from workers, that's still perhaps
pegging one CPU core. So, it doesn't really invalidate what I said about
the number of cores being the primary consideration.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 21:00:47

I wrote:
> I really think that a GUC named "max_parallel_workers", which in fact
> limits the number of workers and not something else, is the way to go.

To be concrete, I suggest comparing the attached documentation patch
with Robert's.  Which one is more understandable?

(I have not bothered preparing code changes to go with this, but am
willing to do so.)

            regards, tom lane

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 72b5920..fccde28 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*************** include_dir 'conf.d'
*** 1998,2019 ****
         </listitem>
        </varlistentry>

!       <varlistentry id="guc-max-parallel-degree" xreflabel="max_parallel_degree">
!        <term><varname>max_parallel_degree</varname> (<type>integer</type>)
         <indexterm>
!         <primary><varname>max_parallel_degree</> configuration parameter</primary>
         </indexterm>
         </term>
         <listitem>
          <para>
!          Sets the maximum number of workers that can be started for an
!          individual parallel operation.  Parallel workers are taken from the
!          pool of processes established by
!          <xref linkend="guc-max-worker-processes">.  Note that the requested
!          number of workers may not actually be available at runtime.  If this
!          occurs, the plan will run with fewer workers than expected, which may
!          be inefficient.  The default value is 2.  Setting this value to 0
!          disables parallel query execution.
          </para>
         </listitem>
        </varlistentry>
--- 1998,2020 ----
         </listitem>
        </varlistentry>

!       <varlistentry id="guc-max-parallel-workers" xreflabel="max_parallel_workers">
!        <term><varname>max_parallel_workers</varname> (<type>integer</type>)
         <indexterm>
!         <primary><varname>max_parallel_workers</> configuration parameter</primary>
         </indexterm>
         </term>
         <listitem>
          <para>
!          Sets the maximum number of worker processes that can be used to
!          assist an individual parallelizable operation.  Parallel workers are
!          taken from the cluster-wide pool of processes established by
!          <xref linkend="guc-max-worker-processes">.  Depending on the size of
!          the pool and the number of other parallel queries active, the
!          requested number of workers may not actually be available at runtime.
!          If this occurs, the plan will run with fewer workers than expected,
!          which may be inefficient.  The default value is 2.  Setting this
!          value to 0 disables parallel query execution.
          </para>
         </listitem>
        </varlistentry>
diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index d1807ed..d73cf39 100644
*** a/doc/src/sgml/ref/create_table.sgml
--- b/doc/src/sgml/ref/create_table.sgml
*************** CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY
*** 909,922 ****
     </varlistentry>

     <varlistentry>
!     <term><literal>parallel_degree</> (<type>integer</>)</term>
      <listitem>
       <para>
!       The parallel degree for a table is the number of workers that should
!       be used to assist a parallel scan of that table.  If not set, the
!       system will determine a value based on the relation size.  The actual
!       number of workers chosen by the planner may be less, for example due to
!       the setting of <xref linkend="guc-max-parallel-degree">.
       </para>
      </listitem>
     </varlistentry>
--- 909,923 ----
     </varlistentry>

     <varlistentry>
!     <term><literal>parallel_workers</> (<type>integer</>)</term>
      <listitem>
       <para>
!       This setting limits the number of parallel worker processes that should
!       be used to assist a parallel scan of this table.  If not set, the
!       planner will select a value based on the relation size.  The actual
!       number of workers chosen by the planner may be less, for example
!       because it is also constrained by
!       <xref linkend="guc-max-parallel-workers">; it will not be more.
       </para>
      </listitem>
     </varlistentry>
diff --git a/doc/src/sgml/release-9.6.sgml b/doc/src/sgml/release-9.6.sgml
index 38e1967..3d78b00 100644
*** a/doc/src/sgml/release-9.6.sgml
--- b/doc/src/sgml/release-9.6.sgml
***************
*** 153,159 ****
         <para>
          Use of parallel query execution can be controlled through the new
          configuration parameters
!         <xref linkend="guc-max-parallel-degree">,
          <xref linkend="guc-force-parallel-mode">,
          <xref linkend="guc-parallel-setup-cost">, and
          <xref linkend="guc-parallel-tuple-cost">.
--- 153,159 ----
         <para>
          Use of parallel query execution can be controlled through the new
          configuration parameters
!         <xref linkend="guc-max-parallel-workers">,
          <xref linkend="guc-force-parallel-mode">,
          <xref linkend="guc-parallel-setup-cost">, and
          <xref linkend="guc-parallel-tuple-cost">.

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 21:03:03

On 05/31/2016 10:51 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 10:46 AM, Josh berkus <josh@agliodbs.com> wrote:
>> In parallel seq scan and join, do the "masters" behave as workers as well?
> 
> It depends. They will if they can. If the parallel seq scan leader
> isn't getting enough work to do from workers (enough tuples to process
> from the shared memory queue), it will start acting as a worker fairly quickly.
> With parallel aggregate, and some other cases, that will always happen.
> 
> Even when the leader is consuming input from workers, that's still perhaps
> pegging one CPU core. So, it doesn't really invalidate what I said about
> the number of cores being the primary consideration.
> 

I get where you're coming from, but I think Haas's query plan output is
going to show us the confusion we're going to get.  So we need to either
change the parameter, the explain output, or brace ourselves for endless
repeated questions.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Logic behind parallel default? WAS: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 21:06:04

On 05/31/2016 11:00 AM, Tom Lane wrote:
> !          If this occurs, the plan will run with fewer workers than expected,
> !          which may be inefficient.  The default value is 2.  Setting this
> !          value to 0 disables parallel query execution.

Is there a thread on how we determined this default of 2?  I can't find
one under likely search terms.

I'm concerned about the effect of overallocating parallel workers on
systems which are already running out of cores (e.g. AWS instances), and
run with default settings.  Possibly max_parallel_workers takes care of
this, which is why I want to understand the logic here.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

31 мая 2016 г., 21:07:12

On Tuesday, May 31, 2016, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Josh berkus <josh@agliodbs.com> writes:
> On 05/31/2016 10:16 AM, Peter Geoghegan wrote:
>> But the distinction between parallel workers and backends that can
>> participate in parallel query does need to be user-visible. Worker
>> processes are a commodity (i.e. the user must consider
>> max_worker_processes).

> It's still WAY simpler to understand "max_parallel is the number of
> parallel workers I requested".

> Any system where you set it to 2 and get only 1 worker on an idle system
> is going to cause endless queries on the mailing lists.

I really think that a GUC named "max_parallel_workers", which in fact
limits the number of workers and not something else, is the way to go.

What is your opinion on the internal name for this? Leave it as "degree"? Skimming the patch the number of times that "parallel_degree - 1" is used would concern me.

I would also probably reword "Parallel workers are taken from the pool...": changing parallel with the word additional. "Additional workers are taken from the pool...".

I'm tending to think we are using the word "parallel" too often and that selective removal would be better. "Sets the maximum number of workers (i.e., processes) that can be used to perform an operation." An introductory section on parallelism can lay out the framework and the concept of parallelism. Sentences like the one above can assume that the reader understands that they work in parallel if the operation allows and there are workers available. The same observation can be applied to "leader". When more than one process is involved a leader is chosen. This is general background to cover upfront. Elsewhere just talk of workers and let the leader be implied. No need to keep repeating "including the leader". In many ways leader is an implementation detail the user doesn't care about. The only impact is that 2 is not twice as good as 1 due to overhead - whatever form that overhead takes.

The change from two to three for the default sneaked in there...

David J.

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 21:09:10

Peter Geoghegan <pg@heroku.com> writes:
> Even when the leader is consuming input from workers, that's still perhaps
> pegging one CPU core. So, it doesn't really invalidate what I said about
> the number of cores being the primary consideration.

Agreed, but if we think that people need to be thinking in those terms,
maybe the parameter should be "max_parallel_cores".

The alternate docs patch I just posted tries to deal with this by
describing max_parallel_workers as being the max number of worker
processes used to "assist" a parallel query.  That was terminology
already being used in one place, but not consistently.  If we use it
consistently, I think it would be sufficient to remind people that
they need to figure on one more core for the leader.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

31 мая 2016 г., 21:09:14

On Tue, May 31, 2016 at 11:02 AM, Josh berkus <josh@agliodbs.com> wrote:
> I get where you're coming from, but I think Haas's query plan output is
> going to show us the confusion we're going to get.  So we need to either
> change the parameter, the explain output, or brace ourselves for endless
> repeated questions.

I get where you're coming from, too -- I think our positions are very close.

The only reason I favor defining parallel_degree = 1, rather than
doing what Tom proposes to do with that patch, is that we might as
well use the prevailing terminology used by SQL Server and Oracle (as
long as we match those semantics). Also, I think that number of cores
used is a more important consideration for users than the number of
workers used.

Users will definitely be confused about workers used vs. cores used,
but I don't think that any proposal fixes that.

-- 
Peter Geoghegan

Re: Logic behind parallel default? WAS: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 21:10:54

Josh berkus <josh@agliodbs.com> writes:
> Is there a thread on how we determined this default of 2?  I can't find
> one under likely search terms.

The 9.6 open-items list cites

https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5bau5@alap3.anarazel.de
        regards, tom lane

Re: Logic behind parallel default? WAS: Rename max_parallel_degree?

От

"Joshua D. Drake"

Дата:

31 мая 2016 г., 21:12:30

On 05/31/2016 11:05 AM, Josh berkus wrote:
> On 05/31/2016 11:00 AM, Tom Lane wrote:
>> !          If this occurs, the plan will run with fewer workers than expected,
>> !          which may be inefficient.  The default value is 2.  Setting this
>> !          value to 0 disables parallel query execution.
>
> Is there a thread on how we determined this default of 2?  I can't find
> one under likely search terms.
>
> I'm concerned about the effect of overallocating parallel workers on
> systems which are already running out of cores (e.g. AWS instances), and
> run with default settings.  Possibly max_parallel_workers takes care of
> this, which is why I want to understand the logic here.
>

I don't remember the thread but I seem to recall that the default of 2 
is explicitly during Beta, RC etc... so that we can see what happens. 
Robert could speak better to that.

JD

-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

31 мая 2016 г., 21:13:24

On Tue, May 31, 2016 at 11:09 AM, Peter Geoghegan <pg@heroku.com> wrote:
> The only reason I favor defining parallel_degree = 1

I meant "redefining max_parallel_degree =1 to effectively disable
parallel query", of course.


-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

31 мая 2016 г., 21:13:39

On Tuesday, May 31, 2016, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> I really think that a GUC named "max_parallel_workers", which in fact
> limits the number of workers and not something else, is the way to go.

To be concrete, I suggest comparing the attached documentation patch
with Robert's. Which one is more understandable?

(I have not bothered preparing code changes to go with this, but am
willing to do so.)

If going this route I'd still rather add the word "assisting" or "additional" directly into the guc name so the need to read the docs to determine inclusive or exclusive of the leader is alleviated.

Robert that it would be confusing but it cannot be worse than this...

David J.

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 21:14:14

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tuesday, May 31, 2016, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I really think that a GUC named "max_parallel_workers", which in fact
>> limits the number of workers and not something else, is the way to go.

> What is your opinion on the internal name for this?  Leave it as "degree"?

If that proposal is accepted, I am willing to go through the code and
rename everything internally to match (and, in fact, would insist on
that happening).  But I think the current debate is primarily around
how we present this to users, so talking about the documentation seems
sufficient for the moment.

> The change from two to three for the default sneaked in there...

Well, that would be the same effective value under Robert's patch.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 21:17:15

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Tuesday, May 31, 2016, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I really think that a GUC named "max_parallel_workers", which in fact
>>> limits the number of workers and not something else, is the way to go.

> If going this route I'd still rather add the word "assisting"
> or "additional" directly into the guc name so the need to read the docs to
> determine inclusive or exclusive of the leader is alleviated.

Dunno, "max_assisting_parallel_workers" seems awfully wordy and not
remarkably clearer.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Peter Eisentraut

Дата:

31 мая 2016 г., 21:17:38

On 5/31/16 2:02 PM, Josh berkus wrote:
> I get where you're coming from, but I think Haas's query plan output is
> going to show us the confusion we're going to get.  So we need to either
> change the parameter, the explain output, or brace ourselves for endless
> repeated questions.

Changing the explain output doesn't sound so bad to me.

The users' problem is that the parameter setting ought to match the 
EXPLAIN output.

The developers' problem is that the EXPLAIN output actually corresponds 
to leader + (N-1) workers internally.

I think we can hope that developers are going to be less confused about 
that than users.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

31 мая 2016 г., 21:17:40

On Tuesday, May 31, 2016, Peter Geoghegan <pg@heroku.com> wrote:

On Tue, May 31, 2016 at 11:02 AM, Josh berkus <josh@agliodbs.com> wrote:
> I get where you're coming from, but I think Haas's query plan output is
> going to show us the confusion we're going to get. So we need to either
> change the parameter, the explain output, or brace ourselves for endless
> repeated questions.

I get where you're coming from, too -- I think our positions are very close.

The only reason I favor defining parallel_degree = 1, rather than
doing what Tom proposes to do with that patch, is that we might as
well use the prevailing terminology used by SQL Server and Oracle (as
long as we match those semantics). Also, I think that number of cores
used is a more important consideration for users than the number of
workers used.

Users will definitely be confused about workers used vs. cores used,
but I don't think that any proposal fixes that.

Ideally each worker gets its own core - though obviously physical limitations apply. If that's not the case then, yes, at least one user is confused...

David J.

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

31 мая 2016 г., 21:18:18

On Tue, May 31, 2016 at 11:17 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> Changing the explain output doesn't sound so bad to me.
>
> The users' problem is that the parameter setting ought to match the EXPLAIN
> output.
>
> The developers' problem is that the EXPLAIN output actually corresponds to
> leader + (N-1) workers internally.
>
> I think we can hope that developers are going to be less confused about that
> than users.

+1.


-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 21:22:16

On 05/31/2016 11:17 AM, Peter Eisentraut wrote:
> On 5/31/16 2:02 PM, Josh berkus wrote:
>> I get where you're coming from, but I think Haas's query plan output is
>> going to show us the confusion we're going to get.  So we need to either
>> change the parameter, the explain output, or brace ourselves for endless
>> repeated questions.
> 
> Changing the explain output doesn't sound so bad to me.
> 
> The users' problem is that the parameter setting ought to match the
> EXPLAIN output.
> 
> The developers' problem is that the EXPLAIN output actually corresponds
> to leader + (N-1) workers internally.
> 
> I think we can hope that developers are going to be less confused about
> that than users.

Makes sense.

One more consistency question: what's the effect of running out of
max_parallel_workers?

That is, say max_parallel_workers is set to 10, and 8 are already
allocated.  If I ask for max_parallel_X = 4, how many cores to I use?

Presumably the leader isn't counted towards max_parallel_workers?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

31 мая 2016 г., 21:27:20

On Tue, May 31, 2016 at 11:22 AM, Josh berkus <josh@agliodbs.com> wrote:
>> I think we can hope that developers are going to be less confused about
>> that than users.
>
> Makes sense.

Maybe EXPLAIN doesn't have to use the term parallel worker at all. It
can instead use a slightly broader terminology, possibly including the
term "core".

> One more consistency question: what's the effect of running out of
> max_parallel_workers?
>
> That is, say max_parallel_workers is set to 10, and 8 are already
> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?

Well, it depends on the planner, of course. But when constrained only
by the availability of worker processes, then your example could use 3
cores -- the 2 remaining parallel workers, plus the leader itself.

> Presumably the leader isn't counted towards max_parallel_workers?

Exactly.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

31 мая 2016 г., 21:29:16

On Tue, May 31, 2016 at 2:22 PM, Josh berkus <josh@agliodbs.com> wrote:

On 05/31/2016 11:17 AM, Peter Eisentraut wrote:
> On 5/31/16 2:02 PM, Josh berkus wrote:
>> I get where you're coming from, but I think Haas's query plan output is
>> going to show us the confusion we're going to get. So we need to either
>> change the parameter, the explain output, or brace ourselves for endless
>> repeated questions.
>
> Changing the explain output doesn't sound so bad to me.
>
> The users' problem is that the parameter setting ought to match the
> EXPLAIN output.
>
> The developers' problem is that the EXPLAIN output actually corresponds
> to leader + (N-1) workers internally.
>
> I think we can hope that developers are going to be less confused about
> that than users.

Makes sense.

One more consistency question: what's the effect of running out of
max_parallel_workers?

That is, say max_parallel_workers is set to 10, and 8 are already
allocated. If I ask for max_parallel_X = 4, how many cores to I use?

Presumably the leader isn't counted towards max_parallel_workers?

You'd have three O/S processes - the one dedicated to your session and you'd pick up two additional processes from the worker pool to assist.

How the O/S assigns those to cores is outside PostgreSQL's jurisdiction.

David J.

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 21:30:10

Josh berkus <josh@agliodbs.com> writes:
> One more consistency question: what's the effect of running out of
> max_parallel_workers?

ITYM max_worker_processes (ie, the cluster-wide pool size)?

> That is, say max_parallel_workers is set to 10, and 8 are already
> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?

One of my reasons for liking max_parallel_workers is that you can sensibly
compare it to max_worker_processes to figure out how many workers you're
likely to get.  If you think in terms of "degree" it takes some additional
mental arithmetic to understand what will happen.

> Presumably the leader isn't counted towards max_parallel_workers?

Not under what I'm proposing.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 21:37:37

On 05/31/2016 11:27 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 11:22 AM, Josh berkus <josh@agliodbs.com> wrote:
>>> I think we can hope that developers are going to be less confused about
>>> that than users.
>>
>> Makes sense.
>
> Maybe EXPLAIN doesn't have to use the term parallel worker at all. It
> can instead use a slightly broader terminology, possibly including the
> term "core".
>
>> One more consistency question: what's the effect of running out of
>> max_parallel_workers?
>>
>> That is, say max_parallel_workers is set to 10, and 8 are already
>> allocated.  If I ask for max_parallel_X = 4, how many cores to I use?
>
> Well, it depends on the planner, of course. But when constrained only
> by the availability of worker processes, then your example could use 3
> cores -- the 2 remaining parallel workers, plus the leader itself.
>
>> Presumably the leader isn't counted towards max_parallel_workers?
^^^^ (oops, I meant max_worker_processes above)

So, there's six things we'd like to make consistent to limit user confusion:

1. max_parallel_X and number of cores used

2. max_parallel_X and EXPLAIN output

3. max_parallel_X and gatekeeping via max_worker_processes

4. max_parallel_X and parallelism of operations (i.e. 2 == 2 parallel
scanners)

5. settings in other similar databases (does someone have specific
citations for this)?

6. consistency with other GUCs (0 or -1 to disable settings)

We can't actually make all five things consistent, as some of them are
contradictory; for example, (1) and (3) cannot be reconciled.  So we
need to evaluate which things are more likely to cause confusion.

My general evaluation would be to make the GUC be the number of
additional workers used, not the total number (including the leader).
From my perspective, that makes (2), (3) and (6) consistent, and (4)
cannot be made consistent because different types of parallel nodes
behave different ways (i.e, some are parallel with only 1 additional
worker and some are not).

However, I'm resigned to the fact that user confusion is inevitable
whichever way we choose, and could be persuaded the other way.

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 21:38:36

On 05/31/2016 11:29 AM, Tom Lane wrote:
> Josh berkus <josh@agliodbs.com> writes:
>> One more consistency question: what's the effect of running out of
>> max_parallel_workers?
> 
> ITYM max_worker_processes (ie, the cluster-wide pool size)?

Yes.  Sorry for contributing to the confusion.  Too many
similar-sounding parameter names.


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Logic behind parallel default? WAS: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 21:46:04

On 05/31/2016 11:10 AM, Tom Lane wrote:
> Josh berkus <josh@agliodbs.com> writes:
>> Is there a thread on how we determined this default of 2?  I can't find
>> one under likely search terms.
>
> The 9.6 open-items list cites
>
> https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5bau5@alap3.anarazel.de

Looks like we didn't decide for the release, just the beta.

I can see two ways to go for the final release:

1. Ship with max_parallel_X = 2 (or similar) and a very low
max_worker_processes (e.g. 4).

2. Ship with max_parallel_X = 1 (or 0, depending), and with a generous
max_worker_processes (e.g. 16).

Argument in favor of (1): we want parallelism to work out of the gate
for users running on low-concurrency systems.  These settings would let
some parallelism happen immediately, without overwhelming a 4-to-8-core
system/vm.  Tuning for the user would then be fairly easy, as we could
just tell them "set max_worker_processes to half the number of cores you
have".

Argument in favor of (2): parallelism is potentially risky for .0, and
as a result we want it disabled unless users choose to enable it.
Also, defaulting to off lets users make more use of the parallel_degree
table attribute to just enable parallelism on select tables.

Thoughts?

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 21:47:00

Josh berkus <josh@agliodbs.com> writes:
> On 05/31/2016 11:29 AM, Tom Lane wrote:
>> Josh berkus <josh@agliodbs.com> writes:
>>> One more consistency question: what's the effect of running out of
>>> max_parallel_workers?

>> ITYM max_worker_processes (ie, the cluster-wide pool size)?

> Yes.  Sorry for contributing to the confusion.  Too many
> similar-sounding parameter names.

Yeah, I was thinking the same thing while preparing my docs patch.
At the risk of opening another can of worms, what about renaming
max_worker_processes as well?  It would be a good thing if that
had "cluster" in it somewhere, or something that indicates it's a
system-wide value not a per-session value.  "max_workers_per_cluster"
would answer, though I'm not in love with it particularly.
        regards, tom lane

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

31 мая 2016 г., 21:52:03

On Tue, May 31, 2016 at 2:37 PM, Josh berkus <josh@agliodbs.com> wrote:

On 05/31/2016 11:27 AM, Peter Geoghegan wrote:
> On Tue, May 31, 2016 at 11:22 AM, Josh berkus <josh@agliodbs.com> wrote:
>>> I think we can hope that developers are going to be less confused about
>>> that than users.
>>
>> Makes sense.
>
> Maybe EXPLAIN doesn't have to use the term parallel worker at all. It
> can instead use a slightly broader terminology, possibly including the
> term "core".

2. max_parallel_X and EXPLAIN output

This is helped greatly if we can say:

Leader: xxx

Worker 0: yyy

Worker 1: zzz

David Rowley's response on April 30th provides an example explain output should anyone want to an easy source to look at.

https://www.postgresql.org/message-id/CAKJS1f8_3y6XHp2xP5YOvPetXMuCmiWP6zWR_eteRQy2fCn9mQ@mail.gmail.com

Getting across the point that session process are not worker processes seems doable. The implementation detail that they do indeed perform work can be made evident in the explain but otherwise left an implementation detail.

David J.

Re: Logic behind parallel default? WAS: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 21:57:50

Josh berkus <josh@agliodbs.com> writes:
> On 05/31/2016 11:10 AM, Tom Lane wrote:
>> The 9.6 open-items list cites
>> https://www.postgresql.org/message-id/flat/20160420174631.3qjjhpwsvvx5bau5@alap3.anarazel.de

> Looks like we didn't decide for the release, just the beta.

Indeed.  I think it's premature to have this discussion.  The plan
was to evaluate near the end of beta, when we (hopefully) have a
better feeling for how buggy parallel query is likely to be.

> Also, defaulting to off lets users make more use of the parallel_degree
> table attribute to just enable parallelism on select tables.

Well, that's an interesting point.  The current coding is that
parallel_degree is an upper limit on per-table workers, and
max_parallel_degree also limits it.  So if you want parallel scans only on
a small set of tables, parallel_degree is not an especially convenient way
to get to that.  Whether we measure it in workers or cores doesn't change
this conclusion.

It might be worth reconsidering what per-table knobs we should provide
exactly, but that's orthogonal to the main point under discussion.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 22:19:29

I wrote:
> At the risk of opening another can of worms, what about renaming
> max_worker_processes as well?  It would be a good thing if that
> had "cluster" in it somewhere, or something that indicates it's a
> system-wide value not a per-session value.  "max_workers_per_cluster"
> would answer, though I'm not in love with it particularly.

Actually, after a bit more thought, maybe "max_background_workers" would
be a good name?  That matches its existing documentation more closely:
        Sets the maximum number of background processes that the system        can support.  This parameter can only be
setat server start.  The        default is 8.
 

However, that would still leave us with max_background_workers as the
cluster-wide limit and max_parallel_workers as the per-query-node limit.
That naming isn't doing all that much to clarify the distinction.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

31 мая 2016 г., 22:36:03

On Tue, May 31, 2016 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> At the risk of opening another can of worms, what about renaming
>> max_worker_processes as well?  It would be a good thing if that
>> had "cluster" in it somewhere, or something that indicates it's a
>> system-wide value not a per-session value.  "max_workers_per_cluster"
>> would answer, though I'm not in love with it particularly.
>
> Actually, after a bit more thought, maybe "max_background_workers" would
> be a good name?  That matches its existing documentation more closely:
>
>          Sets the maximum number of background processes that the system
>          can support.  This parameter can only be set at server start.  The
>          default is 8.
>
> However, that would still leave us with max_background_workers as the
> cluster-wide limit and max_parallel_workers as the per-query-node limit.
> That naming isn't doing all that much to clarify the distinction.

It sure isn't.  Also, I think that we might actually want to add an
additional GUC to prevent the parallel query system from consuming the
entire pool of processes established by max_worker_processes.  If
you're doing anything else with worker processes on your system, you
might well want to say, well, it's OK to have up to 20 worker
processes, but at most 10 of those can be used for parallel queries,
so that the other 10 are guaranteed to be available for whatever other
stuff I'm running that uses the background process facility.  It's
worth remembering that the background worker stuff was originally
invented by Alvaro to allow users to run daemons, not for parallel
query.  So I think in the long run we should have three limits:

1. Cluster-wide limit on number of worker processes for all purposes
(currently, max_worker_processes).

2. Cluster-wide limit on number of worker processes for parallelism
(don't have this yet).

3. Per-operation limit on number of worker processes for parallelism
(currently, max_parallel_degree).

Whatever we rename, there needs to be enough semantic space between #1
and #3 to allow for the possibility - I think the very likely
possibility - that we will eventually also want #2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

31 мая 2016 г., 22:37:58

On Tue, May 31, 2016 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> At the risk of opening another can of worms, what about renaming
> max_worker_processes as well? It would be a good thing if that
> had "cluster" in it somewhere, or something that indicates it's a
> system-wide value not a per-session value. "max_workers_per_cluster"
> would answer, though I'm not in love with it particularly.

Actually, after a bit more thought, maybe "max_background_workers" would
be a good name? That matches its existing documentation more closely:

Sets the maximum number of background processes that the system
can support. This parameter can only be set at server start. The
default is 8.

However, that would still leave us with max_background_workers as the
cluster-wide limit and max_parallel_workers as the per-query-node limit.
That naming isn't doing all that much to clarify the distinction.

Node execution is serial, right?

I know work_mem has a provision about possibly using multiples of the maximum in a single query...

The per-node dynamic does seem important to at least well-document so that when users see 3 workers on one explain node and 2 on another they aren't left scratching their heads. We don't reserve a set number of workers per-statement but rather they are retrieved as needed during query execution.

I think I see the same amount of added value in tacking on 'per_cluster" as you do in adding "additional" to get "max_additional_parallel_workers" - though further refinement of trying to actively NOT consider a leader a worker would support the omission of addition[al]...

If we left max_worker_processes as defining only the maximum allowed non-parallel workers and added a new "max_cluster_workers" that would cap the combination of both "max_worker_processes" and "max_parallel_workers"...

David J.

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

31 мая 2016 г., 22:43:27

On Tue, May 31, 2016 at 3:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, May 31, 2016 at 3:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> At the risk of opening another can of worms, what about renaming
>> max_worker_processes as well? It would be a good thing if that
>> had "cluster" in it somewhere, or something that indicates it's a
>> system-wide value not a per-session value. "max_workers_per_cluster"
>> would answer, though I'm not in love with it particularly.
>
> Actually, after a bit more thought, maybe "max_background_workers" would
> be a good name? That matches its existing documentation more closely:
>
> Sets the maximum number of background processes that the system
> can support. This parameter can only be set at server start. The
> default is 8.
>
> However, that would still leave us with max_background_workers as the
> cluster-wide limit and max_parallel_workers as the per-query-node limit.
> That naming isn't doing all that much to clarify the distinction.

It sure isn't. Also, I think that we might actually want to add an
additional GUC to prevent the parallel query system from consuming the
entire pool of processes established by max_worker_processes.

My mind started going here too...though the question is whether you need a reserved pool or whether we apply some algorithm if (max_parallel + max_other > max_cluster). That algorithm could be configurable (i.e., target ratio 20:10 where 20+10 == max_cluster).

David J.

Re: Rename max_parallel_degree?

От

Alvaro Herrera

Дата:

31 мая 2016 г., 22:56:06

Robert Haas wrote:

> Also, I think that we might actually want to add an
> additional GUC to prevent the parallel query system from consuming the
> entire pool of processes established by max_worker_processes.  If
> you're doing anything else with worker processes on your system, you
> might well want to say, well, it's OK to have up to 20 worker
> processes, but at most 10 of those can be used for parallel queries,
> so that the other 10 are guaranteed to be available for whatever other
> stuff I'm running that uses the background process facility.  It's
> worth remembering that the background worker stuff was originally
> invented by Alvaro to allow users to run daemons, not for parallel
> query.

Agreed -- things like pglogical and BDR rely on background workers to do
their jobs.  Many other users of bgworkers have popped up, so I think
it'd be a bad idea if parallel queries are able to monopolize all the
available slots.

> So I think in the long run we should have three limits:
> 
> 1. Cluster-wide limit on number of worker processes for all purposes
> (currently, max_worker_processes).
> 
> 2. Cluster-wide limit on number of worker processes for parallelism
> (don't have this yet).
> 
> 3. Per-operation limit on number of worker processes for parallelism
> (currently, max_parallel_degree).
> 
> Whatever we rename, there needs to be enough semantic space between #1
> and #3 to allow for the possibility - I think the very likely
> possibility - that we will eventually also want #2.

max_background_workers sounds fine to me for #1, and I propose to add #2
in 9.6 rather than wait.  max_total_parallel_query_workers ?  I already
presented my proposal for #3 which, as you noted, nobody endorsed.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

31 мая 2016 г., 22:57:56

On Tue, May 31, 2016 at 12:55 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Agreed -- things like pglogical and BDR rely on background workers to do
> their jobs.  Many other users of bgworkers have popped up, so I think
> it'd be a bad idea if parallel queries are able to monopolize all the
> available slots.

That never occurred to me. I'd say it could be a serious problem, that
should be fixed promptly.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

31 мая 2016 г., 23:04:55

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Robert Haas wrote:
>> So I think in the long run we should have three limits:
>> 
>> 1. Cluster-wide limit on number of worker processes for all purposes
>> (currently, max_worker_processes).
>> 
>> 2. Cluster-wide limit on number of worker processes for parallelism
>> (don't have this yet).
>> 
>> 3. Per-operation limit on number of worker processes for parallelism
>> (currently, max_parallel_degree).
>> 
>> Whatever we rename, there needs to be enough semantic space between #1
>> and #3 to allow for the possibility - I think the very likely
>> possibility - that we will eventually also want #2.

> max_background_workers sounds fine to me for #1, and I propose to add #2
> in 9.6 rather than wait.

+1 to both points.

> max_total_parallel_query_workers ?

The name should be closely related to what we use for #3.  I could go for
max_total_parallel_workers for #2 and max_parallel_workers for #3.
Or maybe max_parallel_workers_total?
        regards, tom lane

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

31 мая 2016 г., 23:06:13

On 05/31/2016 01:04 PM, Tom Lane wrote:
> The name should be closely related to what we use for #3.  I could go for
> max_total_parallel_workers for #2 and max_parallel_workers for #3.
> Or maybe max_parallel_workers_total?

How about parallel_worker_pool?

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

31 мая 2016 г., 23:12:46

On Tue, May 31, 2016 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Robert Haas wrote:
>>> So I think in the long run we should have three limits:
>>>
>>> 1. Cluster-wide limit on number of worker processes for all purposes
>>> (currently, max_worker_processes).
>>>
>>> 2. Cluster-wide limit on number of worker processes for parallelism
>>> (don't have this yet).
>>>
>>> 3. Per-operation limit on number of worker processes for parallelism
>>> (currently, max_parallel_degree).
>>>
>>> Whatever we rename, there needs to be enough semantic space between #1
>>> and #3 to allow for the possibility - I think the very likely
>>> possibility - that we will eventually also want #2.
>
>> max_background_workers sounds fine to me for #1, and I propose to add #2
>> in 9.6 rather than wait.
>
> +1 to both points.
>
>> max_total_parallel_query_workers ?
>
> The name should be closely related to what we use for #3.  I could go for
> max_total_parallel_workers for #2 and max_parallel_workers for #3.
> Or maybe max_parallel_workers_total?

I just want to point out that if we change #1, we're breaking
postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
I'd just leave it alone.

I would propose to call #2 max_parallel_workers and #3
max_parallel_degree, which I think is clear as daylight, but since I
invented all of these names, I guess I'm biased.

In terms of forward-compatibility, one thing to note is that we
currently have only parallel QUERY, but in the future we will no doubt
also have parallel operations that are not queries, like VACUUM and
CLUSTER and ALTER TABLE.  If we put the word "query" into the name of
a GUC, we're committing to the idea that it only applies to queries,
and that there will be some other limit for non-queries.  If we don't
put the word query in there, we are not necessarily committing to the
reverse, but we're at least leaning in that direction.  So far I've
largely dodged having to make a decision about this, because talking
about the degree of parallelism for CLUSTER makes just as much sense
as talking about the degree of parallelism for a query, but we could
also decide to have e.g. maintenance_parallel_degree instead of
max_parallel_degree for such operations.  But as we commit to names
it's something to be aware of.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Alvaro Herrera

Дата:

31 мая 2016 г., 23:27:56

Robert Haas wrote:

> I just want to point out that if we change #1, we're breaking
> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
> I'd just leave it alone.

We can add the old name as a synonym in guc.c to maintain compatibility.

> I would propose to call #2 max_parallel_workers and #3
> max_parallel_degree, which I think is clear as daylight, but since I
> invented all of these names, I guess I'm biased.

You are biased :-)

> In terms of forward-compatibility, one thing to note is that we
> currently have only parallel QUERY, but in the future we will no doubt
> also have parallel operations that are not queries, like VACUUM and
> CLUSTER and ALTER TABLE.  If we put the word "query" into the name of
> a GUC, we're committing to the idea that it only applies to queries,
> and that there will be some other limit for non-queries.  If we don't
> put the word query in there, we are not necessarily committing to the
> reverse, but we're at least leaning in that direction.  So far I've
> largely dodged having to make a decision about this, because talking
> about the degree of parallelism for CLUSTER makes just as much sense
> as talking about the degree of parallelism for a query, but we could
> also decide to have e.g. maintenance_parallel_degree instead of
> max_parallel_degree for such operations.  But as we commit to names
> it's something to be aware of.

This is a very good point.

I think parallel maintenance commands are going to require different
tuning than different queries, and I'd rather have separate parameters
for those things rather than force the same parameter being changed over
and over depending on what is the session going to execute next.

Also, I think your proposed change from "max" to "maintenance" does not
make much sense; I'd rather have max_maintenance_parallel_degree.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

31 мая 2016 г., 23:36:35

On Tue, May 31, 2016 at 1:27 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> This is a very good point.
>
> I think parallel maintenance commands are going to require different
> tuning than different queries, and I'd rather have separate parameters
> for those things rather than force the same parameter being changed over
> and over depending on what is the session going to execute next.

FWIW, as things stand my parallel CREATE INDEX patch does have a cost
model which is pretty close to the one for parallel sequential scan.
The cost model cares about max_parallel_degree. However, it also
introduces a parallel_degree index storage parameter, which overrides
the cost model to make it indicate the number of parallel workers to
use (presumably, somebody will change master to make parallel_degree
in terms of "participant processes" rather than worker processes soon,
at which point I'll follow their lead).

The storage parameter sticks around, of course, so a REINDEX will
reuse it without asking. (I think CLUSTER should do the same with the
index storage parameter.)

What's novel about this is that for utility statements, you can
override the cost model, and so disregard max_parallel_degree if you
so choose. My guess is that DBAs will frequently want to do so.

I'm not attached to any of this, but that's what I've come up with to
date. Possibly the index storage parameter concept has baggage
elsewhere, which comes up when we later go to parallelize index scans.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

31 мая 2016 г., 23:42:12

On Tue, May 31, 2016 at 4:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, May 31, 2016 at 4:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Robert Haas wrote:
>>> So I think in the long run we should have three limits:
>>>
>>> 1. Cluster-wide limit on number of worker processes for all purposes
>>> (currently, max_worker_processes).
>>>
>>> 2. Cluster-wide limit on number of worker processes for parallelism
>>> (don't have this yet).
>>>
>>> 3. Per-operation limit on number of worker processes for parallelism
>>> (currently, max_parallel_degree).
>>>
>>> Whatever we rename, there needs to be enough semantic space between #1
>>> and #3 to allow for the possibility - I think the very likely
>>> possibility - that we will eventually also want #2.
>
>> max_background_workers sounds fine to me for #1, and I propose to add #2
>> in 9.6 rather than wait.
>
> +1 to both points.
>
>> max_total_parallel_query_workers ?
>
> The name should be closely related to what we use for #3. I could go for
> max_total_parallel_workers for #2 and max_parallel_workers for #3.
> Or maybe max_parallel_workers_total?

I just want to point out that if we change #1, we're breaking
postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
I'd just leave it alone.

We shall assume that the DBA has set the value of max_worker_processes according to hardware specifications without regard to what exactly could be parallelized.

I would propose to call #2 max_parallel_workers and #3
max_parallel_degree, which I think is clear as daylight, but since I
invented all of these names, I guess I'm biased.

So we have operation, session, and cluster scope yet no way to know which is which without memorizing the documentation.

While we cannot enforce this I'd say any of these that are intended to be changed by the user should have functions published as their official API. The name of the function can be made to be user friendly. For all the others pick a name with something closer to the 64 character limit that is as expressive as possible so that seeing the name in postgresql.conf is all person really needs to see to know that they are changing the right thing.

I say you expectations are off if you want to encode these differences by using workers and degree at the same time. Lets go for a part-time DBA vocabulary here. I get the merit of degree, and by itself it might even be OK, but in our usage degree is theory while workers are actual and I'd say people are likely to relate better to the later.

David J.

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

01 июня 2016 г., 00:58:29

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Robert Haas wrote:
>> I just want to point out that if we change #1, we're breaking
>> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
>> I'd just leave it alone.

> We can add the old name as a synonym in guc.c to maintain compatibility.

I doubt this is much of an issue at this point; max_worker_processes has
only been there a release or so, and surely there are very few people
explicitly setting it, given its limited use-case up to now.  It will be
really hard to change it after 9.6, but I think we could still get away
with that today.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Peter Eisentraut

Дата:

01 июня 2016 г., 03:51:47

On 5/31/16 4:04 PM, Tom Lane wrote:
> The name should be closely related to what we use for #3.  I could go for
> max_total_parallel_workers for #2 and max_parallel_workers for #3.
> Or maybe max_parallel_workers_total?

Most cluster-wide settings like this are named max_something 
(max_connections, max_wal_senders, max_replication_slots), whereas 
things that apply on a lower level are named max_something_per_something 
(max_files_per_process, max_locks_per_transations).

So let's leave max_worker_processes mostly alone and not add any 
_total_, _absolute_, _altogether_. ;-)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

01 июня 2016 г., 04:49:05

On Tue, May 31, 2016 at 5:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Robert Haas wrote:
>>> I just want to point out that if we change #1, we're breaking
>>> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
>>> I'd just leave it alone.
>
>> We can add the old name as a synonym in guc.c to maintain compatibility.
>
> I doubt this is much of an issue at this point; max_worker_processes has
> only been there a release or so, and surely there are very few people
> explicitly setting it, given its limited use-case up to now.  It will be
> really hard to change it after 9.6, but I think we could still get away
> with that today.

max_worker_processes was added in 9.4, so it's been there for two
releases, but it probably is true that few people have set it.
Nevertheless, I don't think there's much evidence that it is a bad
enough name that we really must change it.

...Robert

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

01 июня 2016 г., 04:57:51

On Tue, May 31, 2016 at 8:51 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 5/31/16 4:04 PM, Tom Lane wrote:
>> The name should be closely related to what we use for #3.  I could go for
>> max_total_parallel_workers for #2 and max_parallel_workers for #3.
>> Or maybe max_parallel_workers_total?
>
> Most cluster-wide settings like this are named max_something
> (max_connections, max_wal_senders, max_replication_slots), whereas things
> that apply on a lower level are named max_something_per_something
> (max_files_per_process, max_locks_per_transations).
>
> So let's leave max_worker_processes mostly alone and not add any _total_,
> _absolute_, _altogether_. ;-)

That's interesting, because it suggests that max_parallel_degree might
end up being called something that doesn't begin with "max".  Which is
an interesting line of thought.  Now, it does function as a maximum of
sorts, but that doesn't necessarily imply that it has to have max in
the name.  By way of analogy, work_mem is not called max_work_mem, yet
everybody still understands that the actual memory used might be less
than the configured value.

Now, this case is a little trickier.  If we called it simply
parallel_degree rather than max_parallel_degree, then it would have
the same name as the reloption.  But the reloption sets an exact
value, and the GUC sets a cap, which is a significant difference.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

01 июня 2016 г., 05:03:05

Robert Haas <robertmhaas@gmail.com> writes:
> Now, this case is a little trickier.  If we called it simply
> parallel_degree rather than max_parallel_degree, then it would have
> the same name as the reloption.  But the reloption sets an exact
> value, and the GUC sets a cap, which is a significant difference.

The reloption does not set an exact value, according to the code:
       /*        * Use the table parallel_degree, but don't go further than        * max_parallel_degree.        */
 parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);
 

although the existing documentation for it is so vague that you
couldn't tell from that.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

01 июня 2016 г., 05:11:07

On Tue, May 31, 2016 at 10:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Now, this case is a little trickier.  If we called it simply
>> parallel_degree rather than max_parallel_degree, then it would have
>> the same name as the reloption.  But the reloption sets an exact
>> value, and the GUC sets a cap, which is a significant difference.
>
> The reloption does not set an exact value, according to the code:
>
>         /*
>          * Use the table parallel_degree, but don't go further than
>          * max_parallel_degree.
>          */
>         parallel_degree = Min(rel->rel_parallel_degree, max_parallel_degree);
>
> although the existing documentation for it is so vague that you
> couldn't tell from that.

True, max_parallel_degree is an overriding limit.  But the point is
that, without the reloption, you can't get lots of workers on a small
table.  The reloption lets you do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

01 июня 2016 г., 05:13:38

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 31, 2016 at 10:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reloption does not set an exact value, according to the code:

> True, max_parallel_degree is an overriding limit.  But the point is
> that, without the reloption, you can't get lots of workers on a small
> table.  The reloption lets you do that.

Color me skeptical that that's actually a good idea ...
        regards, tom lane

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

01 июня 2016 г., 05:15:10

On Tue, May 31, 2016 at 10:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, May 31, 2016 at 10:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The reloption does not set an exact value, according to the code:
>
>> True, max_parallel_degree is an overriding limit.  But the point is
>> that, without the reloption, you can't get lots of workers on a small
>> table.  The reloption lets you do that.
>
> Color me skeptical that that's actually a good idea ...

I'll color you as not having read the threads where multiple users asked for it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

01 июня 2016 г., 13:05:53

On Tue, May 31, 2016 at 11:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > I really think that a GUC named "max_parallel_workers", which in fact
> > limits the number of workers and not something else, is the way to go.
>
> To be concrete, I suggest comparing the attached documentation patch
> with Robert's. Which one is more understandable?
>

Your explanation is clear, however the name max_parallel_workers makes it sound like that parallelising an operation is all about workers. Yes it depends a lot on the number of workers allocated for parallel operation, but that is not everything. I think calling it max_parallelism as suggested by Alvaro upthread suits better than max_parallel_workers.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

01 июня 2016 г., 17:11:15

Amit Kapila <amit.kapila16@gmail.com> writes:
> Your explanation is clear, however the name max_parallel_workers makes it
> sound like that parallelising an operation is all about workers.  Yes it
> depends a lot on the number of workers allocated for parallel operation,
> but that is not everything.  I think calling it max_parallelism as
> suggested by Alvaro upthread suits better than max_parallel_workers.

I don't think that's a good direction at all.  This entire discussion is
caused by the fact that it's not very clear what "max_parallel_degree"
measures.  Fixing that problem by renaming the variable to something that
doesn't even pretend to tell you what it's counting is not an improvement.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Jim Nasby

Дата:

01 июня 2016 г., 18:27:32

On 5/31/16 8:48 PM, Robert Haas wrote:
> On Tue, May 31, 2016 at 5:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>> Robert Haas wrote:
>>>> I just want to point out that if we change #1, we're breaking
>>>> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
>>>> I'd just leave it alone.
>>
>>> We can add the old name as a synonym in guc.c to maintain compatibility.
>>
>> I doubt this is much of an issue at this point; max_worker_processes has
>> only been there a release or so, and surely there are very few people
>> explicitly setting it, given its limited use-case up to now.  It will be
>> really hard to change it after 9.6, but I think we could still get away
>> with that today.
>
> max_worker_processes was added in 9.4, so it's been there for two
> releases, but it probably is true that few people have set it.
> Nevertheless, I don't think there's much evidence that it is a bad
> enough name that we really must change it.

ISTM that all the confusion about parallel query would go away if the 
setting was max_parallel_assistants instead of _workers. It's exactly 
how parallel query works: there are helpers that *assist* the backend in 
executing the query.

The big downside to "assistants" is it breaks all lexical connection to 
max_worker_processes. So what if we change that to 
max_assistant_processes? I think "assistant" and "worker" are close 
enough in meaning for "stand alone" uses of BG workers so as not to be 
confusing, and I don't see any options for parallelism that are any clearer.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Rename max_parallel_degree?

От

Petr Jelinek

Дата:

01 июня 2016 г., 18:45:30

On 01/06/16 17:27, Jim Nasby wrote:
> On 5/31/16 8:48 PM, Robert Haas wrote:
>> On Tue, May 31, 2016 at 5:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>>> Robert Haas wrote:
>>>>> I just want to point out that if we change #1, we're breaking
>>>>> postgresql.conf compatibility for, IMHO, not a whole lot of benefit.
>>>>> I'd just leave it alone.
>>>
>>>> We can add the old name as a synonym in guc.c to maintain
>>>> compatibility.
>>>
>>> I doubt this is much of an issue at this point; max_worker_processes has
>>> only been there a release or so, and surely there are very few people
>>> explicitly setting it, given its limited use-case up to now.  It will be
>>> really hard to change it after 9.6, but I think we could still get away
>>> with that today.
>>
>> max_worker_processes was added in 9.4, so it's been there for two
>> releases, but it probably is true that few people have set it.
>> Nevertheless, I don't think there's much evidence that it is a bad
>> enough name that we really must change it.
>
> ISTM that all the confusion about parallel query would go away if the
> setting was max_parallel_assistants instead of _workers. It's exactly
> how parallel query works: there are helpers that *assist* the backend in
> executing the query.
>
> The big downside to "assistants" is it breaks all lexical connection to
> max_worker_processes. So what if we change that to
> max_assistant_processes? I think "assistant" and "worker" are close
> enough in meaning for "stand alone" uses of BG workers so as not to be
> confusing, and I don't see any options for parallelism that are any
> clearer.

That GUC also controls worker processes that are started by extensions, 
not just ones that parallel query starts. This is btw one thing I don't 
like at all about how the current limits work, the parallel query will 
fight for workers with extensions because they share the same limit.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

01 июня 2016 г., 18:55:24

On Wed, Jun 1, 2016 at 11:45 AM, Petr Jelinek <petr@2ndquadrant.com> wrote:

That GUC also controls worker processes that are started by extensions, not just ones that parallel query starts. This is btw one thing I don't like at all about how the current limits work, the parallel query will fight for workers with extensions because they share the same limit.

Given that this models reality the GUC is doing its job. Now, maybe we need additional knobs to give the end-user the ability to influence how those fights will turn out.

But as far as a high-level setting goes max_worker_processes seems to fit the bill - and apparently fits within our existing cluster options naming convention.

Parallel query uses workers to assist in query execution.

Background tasks use workers during execution.

Others.....

David J.

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

02 июня 2016 г., 00:21:27

On Wed, Jun 1, 2016 at 10:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Kapila <amit.kapila16@gmail.com> writes:
>> Your explanation is clear, however the name max_parallel_workers makes it
>> sound like that parallelising an operation is all about workers.  Yes it
>> depends a lot on the number of workers allocated for parallel operation,
>> but that is not everything.  I think calling it max_parallelism as
>> suggested by Alvaro upthread suits better than max_parallel_workers.
>
> I don't think that's a good direction at all.  This entire discussion is
> caused by the fact that it's not very clear what "max_parallel_degree"
> measures.  Fixing that problem by renaming the variable to something that
> doesn't even pretend to tell you what it's counting is not an improvement.

I agree with that, but I also think you're holding the name of this
GUC to a ridiculously high standard.  It's not like you can look at
"work_mem" and know what it measures without reading the fine manual.
If you lined up ten people in a room all of whom were competent
database professionals and none of whom knew anything about PostgreSQL
and asked them to guess what a setting called work_mem does and what a
setting called max_parallel_degree does, I will wager you $5 that
they'd do better on the second one.  Likewise, I bet the guesses for
max_parallel_degree would be closer to the mark than the guesses for
maintenance_work_mem or replacement_sort_tuples or commit_siblings or
bgwriter_lru_multiplier.

I've largely given up hope of coming up with an alternative that can
attract more than one vote and that is also at least mildly accurate,
but one idea is max_parallel_workers_per_gather_node.  That will be
totally clear.  Three possible problems, none of them fatal, are:

- I have plans to eventually fix it so that the workers are shared
across the whole plan rather than just the plan node.  In most cases
that won't matter, but there are corner cases where it does.  Now when
that happens, we can rename this to max_parallel_workers_per_query,
breaking backward compatibility.
- It will force us to use a different GUC for utility commands.
- It's a bit long-winded.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

02 июня 2016 г., 00:30:05

Robert Haas <robertmhaas@gmail.com> writes:
> I've largely given up hope of coming up with an alternative that can
> attract more than one vote and that is also at least mildly accurate,
> but one idea is max_parallel_workers_per_gather_node.  That will be
> totally clear.

Given the reference to Gather nodes, couldn't you drop the word
"parallel"?  "node" might not be necessary either.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

02 июня 2016 г., 00:30:48

On 06/01/2016 02:21 PM, Robert Haas wrote:
> If you lined up ten people in a room all of whom were competent
> database professionals and none of whom knew anything about PostgreSQL
> and asked them to guess what a setting called work_mem does and what a
> setting called max_parallel_degree does, I will wager you $5 that
> they'd do better on the second one.  Likewise, I bet the guesses for
> max_parallel_degree would be closer to the mark than the guesses for
> maintenance_work_mem or replacement_sort_tuples or commit_siblings or
> bgwriter_lru_multiplier.

Incidentally, the reason I didn't jump into this thread until the
patches showed up is that I don't think it actually matters what the
parameters are named.  They're going to require documentation
regardless, parallism just isn't something people grok instinctively.

I care about how the parameters *work*, and whether that's consistent
across our various resource management settings.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

02 июня 2016 г., 14:58:23

On Wed, Jun 1, 2016 at 5:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I've largely given up hope of coming up with an alternative that can
>> attract more than one vote and that is also at least mildly accurate,
>> but one idea is max_parallel_workers_per_gather_node.  That will be
>> totally clear.
>
> Given the reference to Gather nodes, couldn't you drop the word
> "parallel"?  "node" might not be necessary either.

Well, I think we could drop node, if you like.  I think parallel
wouldn't be good to drop, though, because it sounds like we want a
global limit on parallel workers also, and that can't be just
max_workers.  So I think we should keep parallel in there for all of
them, and have max_parallel_workers and
max_parallel_workers_per_gather(_node).  The reloption and the Path
struct field can be parallel_workers rather than parallel_degree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

02 июня 2016 г., 16:59:51

Robert Haas <robertmhaas@gmail.com> writes:
> Well, I think we could drop node, if you like.  I think parallel
> wouldn't be good to drop, though, because it sounds like we want a
> global limit on parallel workers also, and that can't be just
> max_workers.  So I think we should keep parallel in there for all of
> them, and have max_parallel_workers and
> max_parallel_workers_per_gather(_node).  The reloption and the Path
> struct field can be parallel_workers rather than parallel_degree.

WFM.
        regards, tom lane

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

02 июня 2016 г., 17:10:44

On Thu, Jun 2, 2016 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:
> Well, I think we could drop node, if you like. I think parallel
> wouldn't be good to drop, though, because it sounds like we want a
> global limit on parallel workers also, and that can't be just
> max_workers. So I think we should keep parallel in there for all of
> them, and have max_parallel_workers and
> max_parallel_workers_per_gather(_node). The reloption and the Path
> struct field can be parallel_workers rather than parallel_degree.

WFM.

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

02 июня 2016 г., 18:33:09

On 06/02/2016 04:58 AM, Robert Haas wrote:

> Well, I think we could drop node, if you like.  I think parallel
> wouldn't be good to drop, though, because it sounds like we want a
> global limit on parallel workers also, and that can't be just
> max_workers.  So I think we should keep parallel in there for all of
> them, and have max_parallel_workers and
> max_parallel_workers_per_gather(_node).  The reloption and the Path
> struct field can be parallel_workers rather than parallel_degree.

So does that mean we'll rename it if you manage to implement a parameter
which controls the number of workers for the whole statement?


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

02 июня 2016 г., 18:53:44

Josh berkus <josh@agliodbs.com> writes:
> On 06/02/2016 04:58 AM, Robert Haas wrote:
>> Well, I think we could drop node, if you like.  I think parallel
>> wouldn't be good to drop, though, because it sounds like we want a
>> global limit on parallel workers also, and that can't be just
>> max_workers.  So I think we should keep parallel in there for all of
>> them, and have max_parallel_workers and
>> max_parallel_workers_per_gather(_node).  The reloption and the Path
>> struct field can be parallel_workers rather than parallel_degree.

> So does that mean we'll rename it if you manage to implement a parameter
> which controls the number of workers for the whole statement?

That would fit in as something like max_parallel_workers_per_statement.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

02 июня 2016 г., 22:52:31

On 06/02/2016 08:53 AM, Tom Lane wrote:
> Josh berkus <josh@agliodbs.com> writes:
>> On 06/02/2016 04:58 AM, Robert Haas wrote:
>>> Well, I think we could drop node, if you like.  I think parallel
>>> wouldn't be good to drop, though, because it sounds like we want a
>>> global limit on parallel workers also, and that can't be just
>>> max_workers.  So I think we should keep parallel in there for all of
>>> them, and have max_parallel_workers and
>>> max_parallel_workers_per_gather(_node).  The reloption and the Path
>>> struct field can be parallel_workers rather than parallel_degree.
> 
>> So does that mean we'll rename it if you manage to implement a parameter
>> which controls the number of workers for the whole statement?
> 
> That would fit in as something like max_parallel_workers_per_statement.

ETOOMANYKNOBS

I'm trying to think of some way we can reasonably automate this for
users ...

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

02 июня 2016 г., 23:09:02

On Thu, Jun 2, 2016 at 3:52 PM, Josh berkus <josh@agliodbs.com> wrote:

On 06/02/2016 08:53 AM, Tom Lane wrote:
> Josh berkus <josh@agliodbs.com> writes:
>> On 06/02/2016 04:58 AM, Robert Haas wrote:
>>> Well, I think we could drop node, if you like. I think parallel
>>> wouldn't be good to drop, though, because it sounds like we want a
>>> global limit on parallel workers also, and that can't be just
>>> max_workers. So I think we should keep parallel in there for all of
>>> them, and have max_parallel_workers and
>>> max_parallel_workers_per_gather(_node). The reloption and the Path
>>> struct field can be parallel_workers rather than parallel_degree.
>
>> So does that mean we'll rename it if you manage to implement a parameter
>> which controls the number of workers for the whole statement?
>
> That would fit in as something like max_parallel_workers_per_statement.

ETOOMANYKNOBS

I'm trying to think of some way we can reasonably automate this for
users ...

Are you referring to right now or if we move the goal posts to making this a per-statement reservation?

Oh, and how does one measure 0.718... of a knob?

David J.

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

02 июня 2016 г., 23:35:43

On 06/02/2016 01:08 PM, David G. Johnston wrote:
> On Thu, Jun 2, 2016 at 3:52 PM, Josh berkus <josh@agliodbs.com
> <mailto:josh@agliodbs.com>>wrote:
>
>     On 06/02/2016 08:53 AM, Tom Lane wrote:
>     > Josh berkus <josh@agliodbs.com <mailto:josh@agliodbs.com>> writes:
>     >> On 06/02/2016 04:58 AM, Robert Haas wrote:
>     >>> Well, I think we could drop node, if you like.  I think parallel
>     >>> wouldn't be good to drop, though, because it sounds like we want a
>     >>> global limit on parallel workers also, and that can't be just
>     >>> max_workers.  So I think we should keep parallel in there for all of
>     >>> them, and have max_parallel_workers and
>     >>> max_parallel_workers_per_gather(_node).  The reloption and the Path
>     >>> struct field can be parallel_workers rather than parallel_degree.
>     >
>     >> So does that mean we'll rename it if you manage to implement a parameter
>     >> which controls the number of workers for the whole statement?
>     >
>     > That would fit in as something like max_parallel_workers_per_statement.
>
>     ETOOMANYKNOBS
>
>     I'm trying to think of some way we can reasonably automate this for
>     users ...
>
>
> Are you referring to right now or if we move the goal posts to making
> this a per-statement reservation?

I was assuming that we would have *both* per-operation and per-statement
limits.  I can see reasons for having both, I can see why power users
would want both, but it's going to be overwhelming to casual users.


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

02 июня 2016 г., 23:42:50

On Thu, Jun 2, 2016 at 4:35 PM, Josh berkus <josh@agliodbs.com> wrote:

On 06/02/2016 01:08 PM, David G. Johnston wrote:
> On Thu, Jun 2, 2016 at 3:52 PM, Josh berkus <josh@agliodbs.com
> <mailto:josh@agliodbs.com>>wrote:
>
> On 06/02/2016 08:53 AM, Tom Lane wrote:
> > Josh berkus <josh@agliodbs.com <mailto:josh@agliodbs.com>> writes:
> >> On 06/02/2016 04:58 AM, Robert Haas wrote:
> >>> Well, I think we could drop node, if you like. I think parallel
> >>> wouldn't be good to drop, though, because it sounds like we want a
> >>> global limit on parallel workers also, and that can't be just
> >>> max_workers. So I think we should keep parallel in there for all of
> >>> them, and have max_parallel_workers and
> >>> max_parallel_workers_per_gather(_node). The reloption and the Path
> >>> struct field can be parallel_workers rather than parallel_degree.
> >
> >> So does that mean we'll rename it if you manage to implement a parameter
> >> which controls the number of workers for the whole statement?
> >
> > That would fit in as something like max_parallel_workers_per_statement.
>
> ETOOMANYKNOBS
>
> I'm trying to think of some way we can reasonably automate this for
> users ...
>
>
> Are you referring to right now or if we move the goal posts to making
> this a per-statement reservation?

I was assuming that we would have *both* per-operation and per-statement
limits. I can see reasons for having both, I can see why power users
would want both, but it's going to be overwhelming to casual users.

Got that. The only problem on that front with the current setup is that right now we are saying: "at most use 3 of the 8 available processes": i.e., we tie ourselves to a fixed number when I think a better algorithm would be: "on/off/auto - default auto" and we detect at runtime whatever values we feel are most appropriate based upon the machine we are running on. If the user doesn't like our choices they can specify their own values. But the only specified values in the configurations would be those placed there automatically by the user. If value isn't specified but is required it gets set at startup and can be seen in pg_settings.

David J.

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

03 июня 2016 г., 04:03:02

On 06/02/2016 01:42 PM, David G. Johnston wrote:
>     Are you referring to right now or if we move the goal posts to making
>     > this a per-statement reservation?
>
>     I was assuming that we would have *both* per-operation and per-statement
>     limits.  I can see reasons for having both, I can see why power users
>     would want both, but it's going to be overwhelming to casual users.
>
>
>  Got that.  The only problem on that front with the current setup is
> that right now we are saying: "at most use 3 of the 8 available
> processes": i.e., we tie ourselves to a fixed number when I think a
> better algorithm would be: "on/off/auto - default auto" and we detect at
> runtime whatever values we feel are most appropriate based upon the
> machine we are running on.  If the user doesn't like our choices they
> can specify their own values.  But the only specified values in the
> configurations would be those placed there automatically by the user.
> If value isn't specified but is required it gets set at startup and can
> be seen in pg_settings.
>

Well, the hard part here is that the appropriate value is based on the
level of concurrency in the database as a whole.  For example, if it's a
website database with 200 active connections on a 32-core machine
already, you want zero parallelism.  Whereas if it's the only current
statement on a 16-core VM, you want like 8.

That sounds like a heuristic, except that the number of concurrent
statements can change in milleseconds.  So we'd really want to base this
on some sort of moving average, set conservatively.  This will be some
interesting code, and will probably need to be revised several times
before we get it right.  Particularly since this would involve scanning
some of the global catalogs we've been trying to move activity off of.

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Petr Jelinek

Дата:

03 июня 2016 г., 07:21:16

On 01/06/16 17:55, David G. Johnston wrote:
> On Wed, Jun 1, 2016 at 11:45 AM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>>wrote:
>
>     That GUC also controls worker processes that are started by
>     extensions, not just ones that parallel query starts. This is btw
>     one thing I don't like at all about how the current limits work, the
>     parallel query will fight for workers with extensions because they
>     share the same limit.
>
>
> Given that this models reality the GUC is doing its job.  Now, maybe we
> need additional knobs to give the end-user the ability to influence how
> those fights will turn out.

Agreed, my point is that I think we do need additional knob.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: Rename max_parallel_degree?

От

Peter Eisentraut

Дата:

03 июня 2016 г., 07:33:09

On 6/3/16 12:21 AM, Petr Jelinek wrote:
> On 01/06/16 17:55, David G. Johnston wrote:
>> On Wed, Jun 1, 2016 at 11:45 AM, Petr Jelinek <petr@2ndquadrant.com
>> <mailto:petr@2ndquadrant.com>>wrote:
>>
>>     That GUC also controls worker processes that are started by
>>     extensions, not just ones that parallel query starts. This is btw
>>     one thing I don't like at all about how the current limits work, the
>>     parallel query will fight for workers with extensions because they
>>     share the same limit.
>>
>>
>> Given that this models reality the GUC is doing its job.  Now, maybe we
>> need additional knobs to give the end-user the ability to influence how
>> those fights will turn out.
>
> Agreed, my point is that I think we do need additional knob.

We need one knob to control how many process slots to create at server 
start, and then a bunch of sliders to control how to allocate those 
between regular connections, superuser connections, replication, 
autovacuum, parallel workers, background workers (by tag/label/group), 
and so on.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Albe Laurenz

Дата:

03 июня 2016 г., 09:43:46

Robert Haas wrote:
> On Wed, Jun 1, 2016 at 5:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> I've largely given up hope of coming up with an alternative that can
> >> attract more than one vote and that is also at least mildly accurate,
> >> but one idea is max_parallel_workers_per_gather_node.  That will be
> >> totally clear.
> >
> > Given the reference to Gather nodes, couldn't you drop the word
> > "parallel"?  "node" might not be necessary either.
> 
> Well, I think we could drop node, if you like.  I think parallel
> wouldn't be good to drop, though, because it sounds like we want a
> global limit on parallel workers also, and that can't be just
> max_workers.  So I think we should keep parallel in there for all of
> them, and have max_parallel_workers and
> max_parallel_workers_per_gather(_node).  The reloption and the Path
> struct field can be parallel_workers rather than parallel_degree.

I believe that it will be impossible to find a name that makes
the meaning clear to everybody.  Those who do not read the documentation
will always find a way to misunderstand it.

These suggestions have the added disadvantage that it is hard
to remember them.  I see myself going, "I have to change the maximum
for parallel workers, what was the name again?", and having to resort
to the manual (or SHOW ALL) each time.

I suggest to follow the precedent of "work_mem", stick with
something simple like "max_parallel_workers" and accept the risk
of not being totally self-explanatory.

Yours,
Laurenz Albe

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

03 июня 2016 г., 15:20:33

On Thu, Jun 2, 2016 at 4:35 PM, Josh berkus <josh@agliodbs.com> wrote:
> I was assuming that we would have *both* per-operation and per-statement
> limits.  I can see reasons for having both, I can see why power users
> would want both, but it's going to be overwhelming to casual users.

I don't think so.  I think the fact that this is per-gather-node
rather than per-statement right now is basically a defect.  Once we
have a per-statement limit, I see no value in having the
per-gather-node setting.  So, yes, at that point, I would push to
rename the GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

"David G. Johnston"

Дата:

03 июня 2016 г., 15:30:20

On Fri, Jun 3, 2016 at 8:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 2, 2016 at 4:35 PM, Josh berkus <josh@agliodbs.com> wrote:
> I was assuming that we would have *both* per-operation and per-statement
> limits. I can see reasons for having both, I can see why power users
> would want both, but it's going to be overwhelming to casual users.

I don't think so. I think the fact that this is per-gather-node
rather than per-statement right now is basically a defect. Once we
have a per-statement limit, I see no value in having the
per-gather-node setting. So, yes, at that point, I would push to
rename the GUC.

How big is the hazard of future-naming this and documenting the present limitation? Is the casual user reading explains even going to be aware of that particular implementation detail?

David J.

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

03 июня 2016 г., 15:34:58

On Fri, Jun 3, 2016 at 8:30 AM, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> How big is the hazard of future-naming this and documenting the present
> limitation?  Is the casual user reading explains even going to be aware of
> that particular implementation detail?

Well, see, the nice thing about max_parallel_degree is that it is very
slightly ambiguous, just enough to paper over this.  But I'm going to
oppose naming the GUC inaccurately based on a hoped-for future
development project.

Another way to paper over the difference that would be to call this
max_parallel_workers_per_operation.  Then we can document that an
operation is a Gather node, and in the future we could document that
it's now a statement.  However, if we pick a name like this, then
we're sort of implying that this GUC will control DDL, too.

I think we should just go with max_parallel_workers for a limit on
total parallel workers within max_work_processes, and
max_parallel_workers_per_gather for a per-Gather limit.  It's slightly
annoying that we may end up renaming the latter GUC, but not as
annoying as spending another three weeks debating this and missing
beta2.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

03 июня 2016 г., 16:39:59

Robert Haas <robertmhaas@gmail.com> writes:
> I think we should just go with max_parallel_workers for a limit on
> total parallel workers within max_work_processes, and
> max_parallel_workers_per_gather for a per-Gather limit.  It's slightly
> annoying that we may end up renaming the latter GUC, but not as
> annoying as spending another three weeks debating this and missing
> beta2.

+1.  I'm not as convinced as you are that we'll replace the GUC later,
but in any case this is an accurate description of the current
semantics.  And I'm really *not* in favor of fudging the name with
the intent of changing the GUC's semantics later --- that would fail
all sorts of compatibility expectations.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Josh berkus

Дата:

03 июня 2016 г., 17:14:50

On 06/02/2016 09:33 PM, Peter Eisentraut wrote:
> On 6/3/16 12:21 AM, Petr Jelinek wrote:
>> On 01/06/16 17:55, David G. Johnston wrote:
>>> On Wed, Jun 1, 2016 at 11:45 AM, Petr Jelinek <petr@2ndquadrant.com
>>> <mailto:petr@2ndquadrant.com>>wrote:
>>>
>>>     That GUC also controls worker processes that are started by
>>>     extensions, not just ones that parallel query starts. This is btw
>>>     one thing I don't like at all about how the current limits work, the
>>>     parallel query will fight for workers with extensions because they
>>>     share the same limit.
>>>
>>>
>>> Given that this models reality the GUC is doing its job.  Now, maybe we
>>> need additional knobs to give the end-user the ability to influence how
>>> those fights will turn out.
>>
>> Agreed, my point is that I think we do need additional knob.
>
> We need one knob to control how many process slots to create at server
> start, and then a bunch of sliders to control how to allocate those
> between regular connections, superuser connections, replication,
> autovacuum, parallel workers, background workers (by tag/label/group),
> and so on.

Now that's crazy talk.  I mean, next thing you'll be saying that we need
the ability to monitor this, or even change it at runtime.  Where does
the madness end?  ;-)

Seriously, you have a point here; it's maybe time to stop tackling
process management per server piecemeal.  Question is, who wants to work
on this?

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

08 июня 2016 г., 06:01:14

On Fri, Jun 3, 2016 at 9:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I think we should just go with max_parallel_workers for a limit on
>> total parallel workers within max_work_processes, and
>> max_parallel_workers_per_gather for a per-Gather limit.  It's slightly
>> annoying that we may end up renaming the latter GUC, but not as
>> annoying as spending another three weeks debating this and missing
>> beta2.
>
> +1.  I'm not as convinced as you are that we'll replace the GUC later,
> but in any case this is an accurate description of the current
> semantics.  And I'm really *not* in favor of fudging the name with
> the intent of changing the GUC's semantics later --- that would fail
> all sorts of compatibility expectations.

Here's a patch change max_parallel_degree to
max_parallel_workers_per_gather, and also changing parallel_degree to
parallel_workers.  I haven't tackled adding a separate
max_parallel_workers, at least not yet.  Are people OK with this?

Note that there is a dump/restore hazard if people have set the
parallel_degree reloption on a beta1 install, or used ALTER { USER |
DATABASE } .. SET parallel_degree.  Can everybody live with that?
Should I bump catversion when applying this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

parallel-degree-to-workers-v1.patch

Re: Rename max_parallel_degree?

От

Josh Berkus

Дата:

08 июня 2016 г., 15:56:45

On 06/07/2016 11:01 PM, Robert Haas wrote:
> On Fri, Jun 3, 2016 at 9:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> I think we should just go with max_parallel_workers for a limit on
>>> total parallel workers within max_work_processes, and
>>> max_parallel_workers_per_gather for a per-Gather limit.  It's slightly
>>> annoying that we may end up renaming the latter GUC, but not as
>>> annoying as spending another three weeks debating this and missing
>>> beta2.
>>
>> +1.  I'm not as convinced as you are that we'll replace the GUC later,
>> but in any case this is an accurate description of the current
>> semantics.  And I'm really *not* in favor of fudging the name with
>> the intent of changing the GUC's semantics later --- that would fail
>> all sorts of compatibility expectations.
> 
> Here's a patch change max_parallel_degree to
> max_parallel_workers_per_gather, and also changing parallel_degree to
> parallel_workers.  I haven't tackled adding a separate
> max_parallel_workers, at least not yet.  Are people OK with this?

+1

> 
> Note that there is a dump/restore hazard if people have set the
> parallel_degree reloption on a beta1 install, or used ALTER { USER |
> DATABASE } .. SET parallel_degree.  Can everybody live with that?
> Should I bump catversion when applying this?

IMHO, we just need to call it out in the beta2 announcement.


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

08 июня 2016 г., 17:18:24

Josh Berkus <josh@agliodbs.com> writes:
> On 06/07/2016 11:01 PM, Robert Haas wrote:
>> Here's a patch change max_parallel_degree to
>> max_parallel_workers_per_gather, and also changing parallel_degree to
>> parallel_workers.  I haven't tackled adding a separate
>> max_parallel_workers, at least not yet.  Are people OK with this?

> +1

I've not read the patch in detail, but this sketch of what to do
sounds fine.

>> Note that there is a dump/restore hazard if people have set the
>> parallel_degree reloption on a beta1 install, or used ALTER { USER |
>> DATABASE } .. SET parallel_degree.  Can everybody live with that?
>> Should I bump catversion when applying this?

> IMHO, we just need to call it out in the beta2 announcement.

catversion is not relevant to GUC changes.  It's not really necessary,
because you'd get a clean, easily diagnosed and repaired failure during
postmaster startup anyway.  The point of bumping catversion is to prevent
a postmaster starting when the incompatibility is subtler or harder to
debug than that.
        regards, tom lane

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

08 июня 2016 г., 17:29:17

On Wed, Jun 8, 2016 at 10:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Note that there is a dump/restore hazard if people have set the
>>> parallel_degree reloption on a beta1 install, or used ALTER { USER |
>>> DATABASE } .. SET parallel_degree.  Can everybody live with that?
>>> Should I bump catversion when applying this?
>
>> IMHO, we just need to call it out in the beta2 announcement.
>
> catversion is not relevant to GUC changes.  It's not really necessary,
> because you'd get a clean, easily diagnosed and repaired failure during
> postmaster startup anyway.  The point of bumping catversion is to prevent
> a postmaster starting when the incompatibility is subtler or harder to
> debug than that.

The reloption is also getting renamed here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

08 июня 2016 г., 17:41:33

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jun 8, 2016 at 10:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> catversion is not relevant to GUC changes.  It's not really necessary,
>> because you'd get a clean, easily diagnosed and repaired failure during
>> postmaster startup anyway.  The point of bumping catversion is to prevent
>> a postmaster starting when the incompatibility is subtler or harder to
>> debug than that.

> The reloption is also getting renamed here.

Hmm.  I forget what the behavior is if we see an unrecognized reloption
already stored in the catalogs, but it might be worth checking.  If
that's something that's painful to get out of, maybe a catversion bump
would be appropriate.

(In practice this affects nobody, because there was already a catversion
bump since beta1.)
        regards, tom lane

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

08 июня 2016 г., 17:47:48

On Wed, Jun 8, 2016 at 10:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jun 8, 2016 at 10:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> catversion is not relevant to GUC changes.  It's not really necessary,
>>> because you'd get a clean, easily diagnosed and repaired failure during
>>> postmaster startup anyway.  The point of bumping catversion is to prevent
>>> a postmaster starting when the incompatibility is subtler or harder to
>>> debug than that.
>
>> The reloption is also getting renamed here.
>
> Hmm.  I forget what the behavior is if we see an unrecognized reloption
> already stored in the catalogs, but it might be worth checking.  If
> that's something that's painful to get out of, maybe a catversion bump
> would be appropriate.

rhaas=# create table tgl (a int);
CREATE TABLE
rhaas=# alter table tgl set (parallel_degree = 3);
ALTER TABLE
rhaas=# select reloptions from pg_class where relname = 'tgl';    reloptions
---------------------{parallel_degree=3}
(1 row)

rhaas=# update pg_class set reloptions = '{beats_me=3}' where relname = 'tgl';
UPDATE 1
rhaas=# alter table tgl reset (beats_me);
ALTER TABLE
rhaas=# select reloptions from pg_class where relname = 'tgl';reloptions
------------

(1 row)

So it's not hard to get rid of.  pg_dump does dump the unrecognized
option, which will fail at restore time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

09 июня 2016 г., 17:04:41

On Wed, Jun 8, 2016 at 10:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Josh Berkus <josh@agliodbs.com> writes:
>> On 06/07/2016 11:01 PM, Robert Haas wrote:
>>> Here's a patch change max_parallel_degree to
>>> max_parallel_workers_per_gather, and also changing parallel_degree to
>>> parallel_workers.  I haven't tackled adding a separate
>>> max_parallel_workers, at least not yet.  Are people OK with this?
>
>> +1
>
> I've not read the patch in detail, but this sketch of what to do
> sounds fine.

OK, I pushed this after re-reviewing it and fixing a number of
oversights.  There remains only the task of adding max_parallel_degree
as a system-wide limit (as opposed to max_parallel_degree now
max_parallel_workers_per_gather which is a per-Gather limit), which
I'm going to argue should be a new open item and not necessarily one
that I have to own myself.  I would like to take care of it, but I
will not put it ahead of fixing actual defects and I will not promise
to have it done in time for 9.6.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

09 июня 2016 г., 21:05:20

On Thu, Jun 9, 2016 at 7:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> OK, I pushed this after re-reviewing it and fixing a number of
> oversights.  There remains only the task of adding max_parallel_degree
> as a system-wide limit (as opposed to max_parallel_degree now
> max_parallel_workers_per_gather which is a per-Gather limit), which
> I'm going to argue should be a new open item and not necessarily one
> that I have to own myself.  I would like to take care of it, but I
> will not put it ahead of fixing actual defects and I will not promise
> to have it done in time for 9.6.

I am in favor of having something similar to
max_parallel_workers_per_gather for utility statements like CREATE
INDEX. That will need a cost model, at least where the DBA isn't
explicit about the number of workers to use.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

09 июня 2016 г., 23:09:02

On Thu, Jun 9, 2016 at 2:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Jun 9, 2016 at 7:04 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> OK, I pushed this after re-reviewing it and fixing a number of
>> oversights.  There remains only the task of adding max_parallel_degree
>> as a system-wide limit (as opposed to max_parallel_degree now
>> max_parallel_workers_per_gather which is a per-Gather limit), which
>> I'm going to argue should be a new open item and not necessarily one
>> that I have to own myself.  I would like to take care of it, but I
>> will not put it ahead of fixing actual defects and I will not promise
>> to have it done in time for 9.6.
>
> I am in favor of having something similar to
> max_parallel_workers_per_gather for utility statements like CREATE
> INDEX. That will need a cost model, at least where the DBA isn't
> explicit about the number of workers to use.

We may well need that, but I think it should be discussed in
conjunction with the patches that add parallelism for those utility
statements, rather than discussing it on a thread for a 9.6 open item.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Peter Geoghegan

Дата:

10 июня 2016 г., 01:10:00

On Thu, Jun 9, 2016 at 1:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I am in favor of having something similar to
>> max_parallel_workers_per_gather for utility statements like CREATE
>> INDEX. That will need a cost model, at least where the DBA isn't
>> explicit about the number of workers to use.
>
> We may well need that, but I think it should be discussed in
> conjunction with the patches that add parallelism for those utility
> statements, rather than discussing it on a thread for a 9.6 open item.

Of course.

I don't think it needs to be scoped to utility statements. It's just
clear that it's not appropriate to use max_parallel_workers_per_gather
within utility statements, even though something like that will be
needed.

-- 
Peter Geoghegan

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

10 июня 2016 г., 01:11:12

On Thu, Jun 9, 2016 at 6:09 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Thu, Jun 9, 2016 at 1:08 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I am in favor of having something similar to
>>> max_parallel_workers_per_gather for utility statements like CREATE
>>> INDEX. That will need a cost model, at least where the DBA isn't
>>> explicit about the number of workers to use.
>>
>> We may well need that, but I think it should be discussed in
>> conjunction with the patches that add parallelism for those utility
>> statements, rather than discussing it on a thread for a 9.6 open item.
>
> Of course.
>
> I don't think it needs to be scoped to utility statements. It's just
> clear that it's not appropriate to use max_parallel_workers_per_gather
> within utility statements, even though something like that will be
> needed.

True.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

12 июня 2016 г., 00:37:41

On 09/06/2016 16:04, Robert Haas wrote:
>
> OK, I pushed this after re-reviewing it and fixing a number of
> oversights.  There remains only the task of adding max_parallel_degree
> as a system-wide limit (as opposed to max_parallel_degree now
> max_parallel_workers_per_gather which is a per-Gather limit), which
> I'm going to argue should be a new open item and not necessarily one
> that I have to own myself.  I would like to take care of it, but I
> will not put it ahead of fixing actual defects and I will not promise
> to have it done in time for 9.6.
>

PFA a patch to fix this open item.  I used the GUC name provided in the
9.6 open item page (max_parallel_workers), with a default value of 4.
Value 0 is another way to disable parallel query.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

global_max_parallel_workers.diff

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

12 июня 2016 г., 01:24:21

On 11/06/2016 23:37, Julien Rouhaud wrote:
> On 09/06/2016 16:04, Robert Haas wrote:
>>
>> OK, I pushed this after re-reviewing it and fixing a number of
>> oversights.  There remains only the task of adding max_parallel_degree
>> as a system-wide limit (as opposed to max_parallel_degree now
>> max_parallel_workers_per_gather which is a per-Gather limit), which
>> I'm going to argue should be a new open item and not necessarily one
>> that I have to own myself.  I would like to take care of it, but I
>> will not put it ahead of fixing actual defects and I will not promise
>> to have it done in time for 9.6.
>>
>
> PFA a patch to fix this open item.  I used the GUC name provided in the
> 9.6 open item page (max_parallel_workers), with a default value of 4.
> Value 0 is another way to disable parallel query.
>

Sorry I just realized I made a stupid mistake, I didn't check in all
slots to get the number of active parallel workers. Fixed in attached v2.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

global_max_parallel_workers-v2.diff

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

13 июня 2016 г., 04:16:52

On Sat, Jun 11, 2016 at 6:24 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 11/06/2016 23:37, Julien Rouhaud wrote:
>> On 09/06/2016 16:04, Robert Haas wrote:
>>> OK, I pushed this after re-reviewing it and fixing a number of
>>> oversights.  There remains only the task of adding max_parallel_degree
>>> as a system-wide limit (as opposed to max_parallel_degree now
>>> max_parallel_workers_per_gather which is a per-Gather limit), which
>>> I'm going to argue should be a new open item and not necessarily one
>>> that I have to own myself.  I would like to take care of it, but I
>>> will not put it ahead of fixing actual defects and I will not promise
>>> to have it done in time for 9.6.
>>>
>>
>> PFA a patch to fix this open item.  I used the GUC name provided in the
>> 9.6 open item page (max_parallel_workers), with a default value of 4.
>> Value 0 is another way to disable parallel query.
>>
>
> Sorry I just realized I made a stupid mistake, I didn't check in all
> slots to get the number of active parallel workers. Fixed in attached v2.

I think instead of adding a "bool parallel" argument to
RegisterDynamicBackgroundWorker, we should just define a new constant
for bgw_flags, say BGW_IS_PARALLEL_WORKER.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

13 июня 2016 г., 12:42:31

On 13/06/2016 03:16, Robert Haas wrote:
> On Sat, Jun 11, 2016 at 6:24 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> On 11/06/2016 23:37, Julien Rouhaud wrote:
>>> On 09/06/2016 16:04, Robert Haas wrote:
>>>> There remains only the task of adding max_parallel_degree
>>>> as a system-wide limit
>>>>
>>>
>>> PFA a patch to fix this open item.  I used the GUC name provided in the
>>> 9.6 open item page (max_parallel_workers), with a default value of 4.
>>> Value 0 is another way to disable parallel query.
>>>
>>
>> Sorry I just realized I made a stupid mistake, I didn't check in all
>> slots to get the number of active parallel workers. Fixed in attached v2.
>
> I think instead of adding a "bool parallel" argument to
> RegisterDynamicBackgroundWorker, we should just define a new constant
> for bgw_flags, say BGW_IS_PARALLEL_WORKER.
>

Agreed, and fixed in attached v3.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

global_max_parallel_workers-v3.diff

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

14 июня 2016 г., 05:09:35

On Mon, Jun 13, 2016 at 5:42 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> Agreed, and fixed in attached v3.

I don't entirely like the new logic in
RegisterDynamicBackgroundWorker.  I wonder if we can't drive this off
of a couple of counters, instead of having the process registering the
background worker iterate over every slot.  Suppose we add two
counters to BackgroundWorkerArray, parallel_register_count and
parallel_terminate_count.  Whenever a backend successfully registers a
parallel worker, it increments parallel_register_count.  Whenever the
postmaster marks a parallel wokrer slot as no longer in use, it
increments parallel_terminate_count.  Then, the number of active
parallel workers is just parallel_register_count -
parallel_terminate_count.  (We can't have the postmaster and the
backends share the same counter, because then it would need locking,
and the postmaster can't try to take spinlocks - can't even use
atomics, because those might be emulated using spinlocks.)

Of course, it would be nice if we could make these counters 64-bit
integers, but we can't, because we don't rely on 64-bit reads and
writes to be atomic on all platforms.  So instead they'll have to be
uint32.  That means they could wrap (if you really work at it) but
subtraction will still return the right answer, so it's OK.  If we
want to allow the number of parallel workers started to be available
for statistical purposes, we can keep to uint32 values for that
(parallel_register_count_lo and parallel_register_count_hi, for
example), and increment the second one whenever the first one rolls
over to zero.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

14 июня 2016 г., 14:10:04

On 14/06/2016 04:09, Robert Haas wrote:
> On Mon, Jun 13, 2016 at 5:42 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> Agreed, and fixed in attached v3.
> 
> I don't entirely like the new logic in
> RegisterDynamicBackgroundWorker.

I'm not that happy with it too. We can avoid iterating over every slots
if the feature isn't activated though (max_parallel_workers >=
max_worker_processes).

> I wonder if we can't drive this off
> of a couple of counters, instead of having the process registering the
> background worker iterate over every slot.  Suppose we add two
> counters to BackgroundWorkerArray, parallel_register_count and
> parallel_terminate_count.  Whenever a backend successfully registers a
> parallel worker, it increments parallel_register_count.  Whenever the
> postmaster marks a parallel wokrer slot as no longer in use, it
> increments parallel_terminate_count.  Then, the number of active
> parallel workers is just parallel_register_count -
> parallel_terminate_count.  (We can't have the postmaster and the
> backends share the same counter, because then it would need locking,
> and the postmaster can't try to take spinlocks - can't even use
> atomics, because those might be emulated using spinlocks.)
> 

I wanted to maintain counters at first, but it seemed more invasive, and
I thought that the max_parallel_worker would be ueful in environnements
where there're lots of parallel workers and dynamic workers used, so
finding a free slot would require iterating over most of the slots most
of the time anyway.  I'm of course also ok with maintaining counters.

> If we want to allow the number of parallel workers started to be available
> for statistical purposes, we can keep to uint32 values for that
> (parallel_register_count_lo and parallel_register_count_hi, for
> example), and increment the second one whenever the first one rolls
> over to zero.
> 

I didn't think about monitoring. I'm not sure if this counter would be
really helpful without also having the number of time a parallel worker
couldn't be launched (and I'd really like to have this one).

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Rename max_parallel_degree?

От

Tom Lane

Дата:

14 июня 2016 г., 19:16:32

Robert Haas <robertmhaas@gmail.com> writes:
> Of course, it would be nice if we could make these counters 64-bit
> integers, but we can't, because we don't rely on 64-bit reads and
> writes to be atomic on all platforms.  So instead they'll have to be
> uint32.  That means they could wrap (if you really work at it) but
> subtraction will still return the right answer, so it's OK.

OK ...

> If we
> want to allow the number of parallel workers started to be available
> for statistical purposes, we can keep to uint32 values for that
> (parallel_register_count_lo and parallel_register_count_hi, for
> example), and increment the second one whenever the first one rolls
> over to zero.

And that's going to be atomic how exactly?
        regards, tom lane

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

14 июня 2016 г., 20:07:30

On Tue, Jun 14, 2016 at 12:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Of course, it would be nice if we could make these counters 64-bit
>> integers, but we can't, because we don't rely on 64-bit reads and
>> writes to be atomic on all platforms.  So instead they'll have to be
>> uint32.  That means they could wrap (if you really work at it) but
>> subtraction will still return the right answer, so it's OK.
>
> OK ...
>
>> If we
>> want to allow the number of parallel workers started to be available
>> for statistical purposes, we can keep to uint32 values for that
>> (parallel_register_count_lo and parallel_register_count_hi, for
>> example), and increment the second one whenever the first one rolls
>> over to zero.
>
> And that's going to be atomic how exactly?

The only process that can look at that structure without taking a lock
is the postmaster.  And the postmaster would only examine
parallel_register_count_lo.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

15 июня 2016 г., 18:50:06

On Tue, Jun 14, 2016 at 7:10 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
>> I don't entirely like the new logic in
>> RegisterDynamicBackgroundWorker.
>
> I'm not that happy with it too. We can avoid iterating over every slots
> if the feature isn't activated though (max_parallel_workers >=
> max_worker_processes).
>
>> I wonder if we can't drive this off
>> of a couple of counters, instead of having the process registering the
>> background worker iterate over every slot.  Suppose we add two
>> counters to BackgroundWorkerArray, parallel_register_count and
>> parallel_terminate_count.  Whenever a backend successfully registers a
>> parallel worker, it increments parallel_register_count.  Whenever the
>> postmaster marks a parallel wokrer slot as no longer in use, it
>> increments parallel_terminate_count.  Then, the number of active
>> parallel workers is just parallel_register_count -
>> parallel_terminate_count.  (We can't have the postmaster and the
>> backends share the same counter, because then it would need locking,
>> and the postmaster can't try to take spinlocks - can't even use
>> atomics, because those might be emulated using spinlocks.)
>>
>
> I wanted to maintain counters at first, but it seemed more invasive, and
> I thought that the max_parallel_worker would be ueful in environnements
> where there're lots of parallel workers and dynamic workers used, so
> finding a free slot would require iterating over most of the slots most
> of the time anyway.  I'm of course also ok with maintaining counters.

I think we should go that way.  Some day we might try to make the
process of finding a free slot more efficient than it is today; I'd
rather not double down on linear search.

Are you going to update this patch?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

15 июня 2016 г., 19:14:49

On 15/06/2016 17:49, Robert Haas wrote:
> On Tue, Jun 14, 2016 at 7:10 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>>> I don't entirely like the new logic in
>>> RegisterDynamicBackgroundWorker.
>>
>>> I wonder if we can't drive this off
>>> of a couple of counters, instead of having the process registering the
>>> background worker iterate over every slot. [...]
>>>
> 
> I think we should go that way.  Some day we might try to make the
> process of finding a free slot more efficient than it is today; I'd
> rather not double down on linear search.
> 

Ok.

> Are you going to update this patch?
> 

Sure, I'll post a new patch ASAP.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

15 июня 2016 г., 21:13:15

On 15/06/2016 18:14, Julien Rouhaud wrote:
> On 15/06/2016 17:49, Robert Haas wrote:
>> On Tue, Jun 14, 2016 at 7:10 AM, Julien Rouhaud
>> <julien.rouhaud@dalibo.com> wrote:
>>>> I don't entirely like the new logic in
>>>> RegisterDynamicBackgroundWorker.
>>>
>>>> I wonder if we can't drive this off
>>>> of a couple of counters, instead of having the process registering the
>>>> background worker iterate over every slot. [...]
>>>>
>>
>> I think we should go that way.  Some day we might try to make the
>> process of finding a free slot more efficient than it is today; I'd
>> rather not double down on linear search.
>>
>
> Ok.
>
>> Are you going to update this patch?
>>
>
> Sure, I'll post a new patch ASAP.
>

Attached v4 implements the design you suggested, I hope everything's ok.

I didn't do anything for the statistics part, because I'm not sure
having the counters separately is useful (instead of just having the
current number of parallel workers), and if it's useful I'd find strange
to have these counters get reset at restart instead of being stored and
accumulated as other stats, and that's look too much for 9.6 material.
I'd be happy to work on this though.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

global_max_parallel_workers-v4.diff

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

25 июня 2016 г., 10:33:50

On Wed, Jun 15, 2016 at 11:43 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
>
> Attached v4 implements the design you suggested, I hope everything's ok.
>

Few review comments:

1.
+ if (parallel && (BackgroundWorkerData->parallel_register_count -
+
BackgroundWorkerData->parallel_terminate_count) >=
+
max_parallel_workers)
+ return false;

I think this check should be done after acquiring
BackgroundWorkerLock, otherwise some other backend can simultaneously
increment parallel_register_count.

2.

+/*
+ * This flag is used internally for parallel queries, to keep track of the
+ * number of active
parallel workers and make sure we never launch more than
+ * max_parallel_workers parallel workers at
the same time.  Third part
+ * background workers should not use this flag.
+ */
+#define
BGWORKER_IS_PARALLEL_WORKER 0x0004
+

"Third part", do yo want to say Third party?

3.
static bool
SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
{
..
}

Isn't it better to have a check in above function such that if
bgw_flags is BGWORKER_IS_PARALLEL_WORKER and max_parallel_workers is
zero, then ereport?  Also, consider if it is better to have some other
checks related to BGWORKER_IS_PARALLEL_WORKER, like if caller sets
BGWORKER_IS_PARALLEL_WORKER, then it must have database connection and
shared memory access.

4.
+      <varlistentry id="guc-max-parallel-workers"
xreflabel="max_parallel_workers">
+       <term><varname>max_parallel_workers</varname> (<type>integer</type>)
+       <indexterm>
+        <primary><varname>max_parallel_workers</> configuration
parameter</primary>
+       </indexterm>
+       </term>
+       <listitem>
+        <para>
+         Sets the maximum number of workers that can be launched at the same
+         time for the whole server.  This parameter allows the administrator to
+         reserve background worker slots for for third part dynamic background
+         workers.  The default value is 4.  Setting this value to 0 disables
+         parallel query execution.
+        </para>
+       </listitem>
+      </varlistentry>

How about phrasing it as:
Sets the maximum number of workers that the system can support for
parallel queries.  The default value is 4.  Setting this value to 0
disables parallel query execution.

5.
<primary><varname>max_parallel_workers_per_gather</> configuration
parameter</primary>      </indexterm>      </term>      <listitem>       <para>        Sets the maximum number of
workersthat can be started by a single        <literal>Gather</literal> node.  Parallel workers are taken from the
 pool of processes established by        <xref linkend="guc-max-worker-processes">.

I think it is better to change above in documentation to indicate that
"pool of processes established by guc-max-parallel-workers".

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

25 июня 2016 г., 11:57:19

On 25/06/2016 09:33, Amit Kapila wrote:
> On Wed, Jun 15, 2016 at 11:43 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>>
>> Attached v4 implements the design you suggested, I hope everything's ok.
>>
>
> Few review comments:
>

Thanks for the review.

> 1.
> + if (parallel && (BackgroundWorkerData->parallel_register_count -
> +
> BackgroundWorkerData->parallel_terminate_count) >=
> +
> max_parallel_workers)
> + return false;
>
> I think this check should be done after acquiring
> BackgroundWorkerLock, otherwise some other backend can simultaneously
> increment parallel_register_count.
>

You're right, fixed.

> 2.
>
> +/*
> + * This flag is used internally for parallel queries, to keep track of the
> + * number of active
> parallel workers and make sure we never launch more than
> + * max_parallel_workers parallel workers at
> the same time.  Third part
> + * background workers should not use this flag.
> + */
> +#define
> BGWORKER_IS_PARALLEL_WORKER 0x0004
> +
>
> "Third part", do yo want to say Third party?
>

Yes, sorry. Fixed

> 3.
> static bool
> SanityCheckBackgroundWorker(BackgroundWorker *worker, int elevel)
> {
> ..
> }
>
> Isn't it better to have a check in above function such that if
> bgw_flags is BGWORKER_IS_PARALLEL_WORKER and max_parallel_workers is
> zero, then ereport?  Also, consider if it is better to have some other
> checks related to BGWORKER_IS_PARALLEL_WORKER, like if caller sets
> BGWORKER_IS_PARALLEL_WORKER, then it must have database connection and
> shared memory access.
>

I added these checks. I don't see another check to add right now.

> 4.
> +      <varlistentry id="guc-max-parallel-workers"
> xreflabel="max_parallel_workers">
> +       <term><varname>max_parallel_workers</varname> (<type>integer</type>)
> +       <indexterm>
> +        <primary><varname>max_parallel_workers</> configuration
> parameter</primary>
> +       </indexterm>
> +       </term>
> +       <listitem>
> +        <para>
> +         Sets the maximum number of workers that can be launched at the same
> +         time for the whole server.  This parameter allows the administrator to
> +         reserve background worker slots for for third part dynamic background
> +         workers.  The default value is 4.  Setting this value to 0 disables
> +         parallel query execution.
> +        </para>
> +       </listitem>
> +      </varlistentry>
>
> How about phrasing it as:
> Sets the maximum number of workers that the system can support for
> parallel queries.  The default value is 4.  Setting this value to 0
> disables parallel query execution.
>

It's better thanks.  Should we document somewhere the link between this
parameter and custom dynamic background workers or is it pretty
self-explanatory?

> 5.
> <primary><varname>max_parallel_workers_per_gather</> configuration
> parameter</primary>
>        </indexterm>
>        </term>
>        <listitem>
>         <para>
>          Sets the maximum number of workers that can be started by a single
>          <literal>Gather</literal> node.  Parallel workers are taken from the
>          pool of processes established by
>          <xref linkend="guc-max-worker-processes">.
>
> I think it is better to change above in documentation to indicate that
> "pool of processes established by guc-max-parallel-workers".
>

The real limit is the minimum of these two values, I think it's
important to be explicit here, since this pool is shared for parallelism
and custom bgworkers.

What about "pool of processes established by guc-max-worker-processes,
limited by guc-max-parallel-workers" (used in attached v5 patch)

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

global_max_parallel_workers-v5.diff

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

26 июня 2016 г., 09:37:31

On Sat, Jun 25, 2016 at 2:27 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 25/06/2016 09:33, Amit Kapila wrote:
>> On Wed, Jun 15, 2016 at 11:43 PM, Julien Rouhaud
>> <julien.rouhaud@dalibo.com> wrote:
>>>
>>> Attached v4 implements the design you suggested, I hope everything's ok.
>>>
>>
>> Few review comments:
>>
>
> Thanks for the review.
>
>
>
>> 4.
>> +      <varlistentry id="guc-max-parallel-workers"
>> xreflabel="max_parallel_workers">
>> +       <term><varname>max_parallel_workers</varname> (<type>integer</type>)
>> +       <indexterm>
>> +        <primary><varname>max_parallel_workers</> configuration
>> parameter</primary>
>> +       </indexterm>
>> +       </term>
>> +       <listitem>
>> +        <para>
>> +         Sets the maximum number of workers that can be launched at the same
>> +         time for the whole server.  This parameter allows the administrator to
>> +         reserve background worker slots for for third part dynamic background
>> +         workers.  The default value is 4.  Setting this value to 0 disables
>> +         parallel query execution.
>> +        </para>
>> +       </listitem>
>> +      </varlistentry>
>>
>> How about phrasing it as:
>> Sets the maximum number of workers that the system can support for
>> parallel queries.  The default value is 4.  Setting this value to 0
>> disables parallel query execution.
>>
>
> It's better thanks.  Should we document somewhere the link between this
> parameter and custom dynamic background workers or is it pretty
> self-explanatory?
>

How about if add an additiona line like:
Parallel workers are taken from the pool of processes established by
guc-max-worker-processes.

I think one might feel some duplication of text between this and what
we have for max_parallel_workers_per_gather, but it seems genuine to
me.


@@ -370,6 +379,8 @@ ForgetBackgroundWorker(slist_mutable_iter *cur) Assert(rw->rw_shmem_slot <
max_worker_processes); slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot]; slot->in_use =
false;
+ if (slot->parallel)
+ BackgroundWorkerData->parallel_terminate_count++;

I think operations on parallel_terminate_count are not safe.
ForgetBackgroundWorker() and RegisterDynamicBackgroundWorker() can try
to read write at same time.  It seems you need to use atomic
operations to ensure safety.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

26 июня 2016 г., 13:27:21

On 26/06/2016 08:37, Amit Kapila wrote:
> On Sat, Jun 25, 2016 at 2:27 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>>>
>>
>> It's better thanks.  Should we document somewhere the link between this
>> parameter and custom dynamic background workers or is it pretty
>> self-explanatory?
>>
> 
> How about if add an additiona line like:
> Parallel workers are taken from the pool of processes established by
> guc-max-worker-processes.
> 
> I think one might feel some duplication of text between this and what
> we have for max_parallel_workers_per_gather, but it seems genuine to
> me.
> 

Ok, I'll add it.

> 
> @@ -370,6 +379,8 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
>   Assert(rw->rw_shmem_slot <
> max_worker_processes);
>   slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
>   slot->in_use =
> false;
> + if (slot->parallel)
> + BackgroundWorkerData->parallel_terminate_count++;
> 
> I think operations on parallel_terminate_count are not safe.
> ForgetBackgroundWorker() and RegisterDynamicBackgroundWorker() can try
> to read write at same time.  It seems you need to use atomic
> operations to ensure safety.
> 
> 

But operations on parallel_terminate_count are done by the postmaster,
and per Robert's previous email postmaster can't use atomics:

> We can't have the postmaster and the
> backends share the same counter, because then it would need locking,
> and the postmaster can't try to take spinlocks - can't even use
> atomics, because those might be emulated using spinlocks.

Do we support platforms on which 32bits operations are not atomic?

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

27 июня 2016 г., 07:51:54

On Sun, Jun 26, 2016 at 3:57 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 26/06/2016 08:37, Amit Kapila wrote:
>>
>> @@ -370,6 +379,8 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
>>   Assert(rw->rw_shmem_slot <
>> max_worker_processes);
>>   slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
>>   slot->in_use =
>> false;
>> + if (slot->parallel)
>> + BackgroundWorkerData->parallel_terminate_count++;
>>
>> I think operations on parallel_terminate_count are not safe.
>> ForgetBackgroundWorker() and RegisterDynamicBackgroundWorker() can try
>> to read write at same time.  It seems you need to use atomic
>> operations to ensure safety.
>>
>>
>
> But operations on parallel_terminate_count are done by the postmaster,
> and per Robert's previous email postmaster can't use atomics:
>

I think as the parallel_terminate_count is only modified by postmaster
and read by other processes, such an operation will be considered
atomic.  We don't need to specifically make it atomic unless multiple
processes are allowed to modify such a counter.  Although, I think we
need to use some barriers in code to make it safe.

1.
@@ -272,6 +279,8 @@ BackgroundWorkerStateChange(void) pg_memory_barrier();

slot->pid = 0; slot->in_use = false;
+ if (slot->parallel)
+
BackgroundWorkerData->parallel_terminate_count++;

I think the check of slot->parallel and increment to
parallel_terminate_count should be done before marking slot->in_use to
false.  Consider the case if slot->in_use is marked as false and then
before next line execution, one of the backends registers dynamic
worker (non-parallel worker), so that
backend can see this slot as free and use it.  It will also mark the
parallel flag of slot as false.  Now when postmaster will check the
slot->parallel flag, it will find it false and then skip the increment
to parallel_terminate_count.

2.
+ if (parallel && (BackgroundWorkerData->parallel_register_count -
+
BackgroundWorkerData->parallel_terminate_count) >=
+
max_parallel_workers)
+ {
+ LWLockRelease(BackgroundWorkerLock);
+ return
false;
+ }
+

I think we need a read barrier here, so that this check doesn't get
reordered with the for loop below it.   Also, see if you find the code
more readable by moving the after && part of check to next line.

3.
typedef struct BackgroundWorkerArray{ int total_slots;
+ uint32
parallel_register_count;
+ uint32 parallel_terminate_count; BackgroundWorkerSlot
slot[FLEXIBLE_ARRAY_MEMBER];} BackgroundWorkerArray;

See, if you can add comments on top of this structure to explain the
usage/meaning of newly added parallel_* counters?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

27 июня 2016 г., 08:18:33

On Mon, Jun 27, 2016 at 10:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sun, Jun 26, 2016 at 3:57 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> On 26/06/2016 08:37, Amit Kapila wrote:
>>>
>>> @@ -370,6 +379,8 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
>>>   Assert(rw->rw_shmem_slot <
>>> max_worker_processes);
>>>   slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
>>>   slot->in_use =
>>> false;
>>> + if (slot->parallel)
>>> + BackgroundWorkerData->parallel_terminate_count++;
>>>
>>> I think operations on parallel_terminate_count are not safe.
>>> ForgetBackgroundWorker() and RegisterDynamicBackgroundWorker() can try
>>> to read write at same time.  It seems you need to use atomic
>>> operations to ensure safety.
>>>
>>>
>>
>> But operations on parallel_terminate_count are done by the postmaster,
>> and per Robert's previous email postmaster can't use atomics:
>>
>
> I think as the parallel_terminate_count is only modified by postmaster
> and read by other processes, such an operation will be considered
> atomic.  We don't need to specifically make it atomic unless multiple
> processes are allowed to modify such a counter.  Although, I think we
> need to use some barriers in code to make it safe.
>
> 1.
> @@ -272,6 +279,8 @@ BackgroundWorkerStateChange(void)
>   pg_memory_barrier();
>
> slot->pid = 0;
>   slot->in_use = false;
> + if (slot->parallel)
> +
> BackgroundWorkerData->parallel_terminate_count++;
>
> I think the check of slot->parallel and increment to
> parallel_terminate_count should be done before marking slot->in_use to
> false.  Consider the case if slot->in_use is marked as false and then
> before next line execution, one of the backends registers dynamic
> worker (non-parallel worker), so that
> backend can see this slot as free and use it.  It will also mark the
> parallel flag of slot as false.  Now when postmaster will check the
> slot->parallel flag, it will find it false and then skip the increment
> to parallel_terminate_count.
>

If you agree with above theory, then you need to use write barrier as
well after incrementing the parallel_terminate_count to avoid
re-ordering with slot->in_use flag.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

27 июня 2016 г., 20:05:12

On 27/06/2016 07:18, Amit Kapila wrote:
> On Mon, Jun 27, 2016 at 10:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> I think as the parallel_terminate_count is only modified by postmaster
>> and read by other processes, such an operation will be considered
>> atomic.  We don't need to specifically make it atomic unless multiple
>> processes are allowed to modify such a counter.  Although, I think we
>> need to use some barriers in code to make it safe.
>>
>> 1.
>> @@ -272,6 +279,8 @@ BackgroundWorkerStateChange(void)
>>   pg_memory_barrier();
>>
>> slot->pid = 0;
>>   slot->in_use = false;
>> + if (slot->parallel)
>> +
>> BackgroundWorkerData->parallel_terminate_count++;
>>
>> I think the check of slot->parallel and increment to
>> parallel_terminate_count should be done before marking slot->in_use to
>> false.  Consider the case if slot->in_use is marked as false and then
>> before next line execution, one of the backends registers dynamic
>> worker (non-parallel worker), so that
>> backend can see this slot as free and use it.  It will also mark the
>> parallel flag of slot as false.  Now when postmaster will check the
>> slot->parallel flag, it will find it false and then skip the increment
>> to parallel_terminate_count.
>>
> 
> If you agree with above theory, then you need to use write barrier as
> well after incrementing the parallel_terminate_count to avoid
> re-ordering with slot->in_use flag.
> 

I totally agree, I didn't thought about this corner case.

There's already a pg_memory_barrier() call in
BackgroundWorkerStateChange(), to avoid reordering the notify_pid load.
Couldn't we use it to also make sure the parallel_terminate_count
increment happens before the slot->in_use store?  I guess that a write
barrier will be needed in ForgetBacgroundWorker().

>> 2.
>> + if (parallel && (BackgroundWorkerData->parallel_register_count -
>> +
>> BackgroundWorkerData->parallel_terminate_count) >=
>> +
>> max_parallel_workers)
>> + {
>> + LWLockRelease(BackgroundWorkerLock);
>> + return
>> false;
>> + }
>>+
>>
>> I think we need a read barrier here, so that this check doesn't get
>> reordered with the for loop below it.

You mean between the end of this block and the for loop? (Sorry, I'm not
at all familiar with the pg_barrier mechanism).

>>  Also, see if you find the code
>> more readable by moving the after && part of check to next line.

I think I'll just pgindent the file.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

28 июня 2016 г., 05:44:18

On Mon, Jun 27, 2016 at 10:35 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 27/06/2016 07:18, Amit Kapila wrote:
>> On Mon, Jun 27, 2016 at 10:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>
>>> I think as the parallel_terminate_count is only modified by postmaster
>>> and read by other processes, such an operation will be considered
>>> atomic.  We don't need to specifically make it atomic unless multiple
>>> processes are allowed to modify such a counter.  Although, I think we
>>> need to use some barriers in code to make it safe.
>>>
>>> 1.
>>> @@ -272,6 +279,8 @@ BackgroundWorkerStateChange(void)
>>>   pg_memory_barrier();
>>>
>>> slot->pid = 0;
>>>   slot->in_use = false;
>>> + if (slot->parallel)
>>> +
>>> BackgroundWorkerData->parallel_terminate_count++;
>>>
>>> I think the check of slot->parallel and increment to
>>> parallel_terminate_count should be done before marking slot->in_use to
>>> false.  Consider the case if slot->in_use is marked as false and then
>>> before next line execution, one of the backends registers dynamic
>>> worker (non-parallel worker), so that
>>> backend can see this slot as free and use it.  It will also mark the
>>> parallel flag of slot as false.  Now when postmaster will check the
>>> slot->parallel flag, it will find it false and then skip the increment
>>> to parallel_terminate_count.
>>>
>>
>> If you agree with above theory, then you need to use write barrier as
>> well after incrementing the parallel_terminate_count to avoid
>> re-ordering with slot->in_use flag.
>>
>
> I totally agree, I didn't thought about this corner case.
>
> There's already a pg_memory_barrier() call in
> BackgroundWorkerStateChange(), to avoid reordering the notify_pid load.
> Couldn't we use it to also make sure the parallel_terminate_count
> increment happens before the slot->in_use store?
>

Yes, that is enough, as memory barrier ensures that both loads and
stores are completed before any loads and stores that are after
barrier.

>  I guess that a write
> barrier will be needed in ForgetBacgroundWorker().
>

Yes.

>>> 2.
>>> + if (parallel && (BackgroundWorkerData->parallel_register_count -
>>> +
>>> BackgroundWorkerData->parallel_terminate_count) >=
>>> +
>>> max_parallel_workers)
>>> + {
>>> + LWLockRelease(BackgroundWorkerLock);
>>> + return
>>> false;
>>> + }
>>>+
>>>
>>> I think we need a read barrier here, so that this check doesn't get
>>> reordered with the for loop below it.
>
> You mean between the end of this block and the for loop?
>

Yes.

>>>  Also, see if you find the code
>>> more readable by moving the after && part of check to next line.
>
> I think I'll just pgindent the file.
>

make sense.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

29 июня 2016 г., 00:27:22

On 28/06/2016 04:44, Amit Kapila wrote:
> On Mon, Jun 27, 2016 at 10:35 PM, Julien Rouhaud
>>
>> There's already a pg_memory_barrier() call in
>> BackgroundWorkerStateChange(), to avoid reordering the notify_pid load.
>> Couldn't we use it to also make sure the parallel_terminate_count
>> increment happens before the slot->in_use store?
>>
>
> Yes, that is enough, as memory barrier ensures that both loads and
> stores are completed before any loads and stores that are after
> barrier.
>
>>  I guess that a write
>> barrier will be needed in ForgetBacgroundWorker().
>>
>
> Yes.
>
>>>> 2.
>>>> + if (parallel && (BackgroundWorkerData->parallel_register_count -
>>>> +
>>>> BackgroundWorkerData->parallel_terminate_count) >=
>>>> +
>>>> max_parallel_workers)
>>>> + {
>>>> + LWLockRelease(BackgroundWorkerLock);
>>>> + return
>>>> false;
>>>> + }
>>>> +
>>>>
>>>> I think we need a read barrier here, so that this check doesn't get
>>>> reordered with the for loop below it.
>>
>> You mean between the end of this block and the for loop?
>>
>
> Yes.
>
>>>>  Also, see if you find the code
>>>> more readable by moving the after && part of check to next line.
>>
>> I think I'll just pgindent the file.
>>
>
> make sense.
>
>

Thanks a lot for the help!

PFA v6 which should fix all the issues mentioned.  Also, after second
thought I didn't add the extra hint about max_worker_processes in the
max_parallel_worker paragraph, since this line was a duplicate of the
precedent paragraph, it seemed better to leave the text as is.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

global_max_parallel_workers-v6.diff

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

29 июня 2016 г., 07:29:54

On Wed, Jun 29, 2016 at 2:57 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
>
> Thanks a lot for the help!
>
> PFA v6 which should fix all the issues mentioned.

Couple of minor suggestions.

-         <xref linkend="guc-max-worker-processes">.  Note that the requested
+         <xref linkend="guc-max-worker-processes">, limited by
+         <xref linked="guc-max-parallel-workers">.  Note that the requested

Typo.
/linked/linkend

You can always find such mistakes by doing make check in doc/src/sgml/

+ /*
+ * We need a memory barrier here to make sure the above test doesn't get
+ * reordered
+ */
+ pg_read_barrier();

/memory barrier/read barrier

+ if (max_parallel_workers == 0)
+ {
+ ereport(elevel,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("background worker \"%s\": cannot request parallel worker if
no parallel worker allowed",

" ..no parallel worker is allowed".  'is' seems to be missing.


>  Also, after second
> thought I didn't add the extra hint about max_worker_processes in the
> max_parallel_worker paragraph, since this line was a duplicate of the
> precedent paragraph, it seemed better to leave the text as is.
>

not a big problem, we can leave it for committer to decide on same.
However just by reading the description of max_parallel_worker, user
can set its value more than max_wroker_processes which we don't want.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

29 июня 2016 г., 09:24:22

On 29/06/2016 06:29, Amit Kapila wrote:
> On Wed, Jun 29, 2016 at 2:57 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>>
>> Thanks a lot for the help!
>>
>> PFA v6 which should fix all the issues mentioned.
>
> Couple of minor suggestions.
>
> -         <xref linkend="guc-max-worker-processes">.  Note that the requested
> +         <xref linkend="guc-max-worker-processes">, limited by
> +         <xref linked="guc-max-parallel-workers">.  Note that the requested
>
> Typo.
> /linked/linkend
>

Oops, fixed.

> You can always find such mistakes by doing make check in doc/src/sgml/
>

I wasn't aware of that, it's really a nice thing to know, thanks!

> + /*
> + * We need a memory barrier here to make sure the above test doesn't get
> + * reordered
> + */
> + pg_read_barrier();
>
> /memory barrier/read barrier
>

fixed

> + if (max_parallel_workers == 0)
> + {
> + ereport(elevel,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("background worker \"%s\": cannot request parallel worker if
> no parallel worker allowed",
>
> " ..no parallel worker is allowed".  'is' seems to be missing.
>

fixed

>
>>  Also, after second
>> thought I didn't add the extra hint about max_worker_processes in the
>> max_parallel_worker paragraph, since this line was a duplicate of the
>> precedent paragraph, it seemed better to leave the text as is.
>>
>
> not a big problem, we can leave it for committer to decide on same.
> However just by reading the description of max_parallel_worker, user
> can set its value more than max_wroker_processes which we don't want.
>

Right.  On the other hand I'm not sure that's really an issue, because
such a case is handled in the code, and setting max_parallel_workers way
above max_worker_processes could be a way to configure it as unlimited.
Or should we allow setting it to -1 for instance to disable the limit?

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

global_max_parallel_workers-v7.diff

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

29 июня 2016 г., 09:51:50

On Wed, Jun 29, 2016 at 11:54 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> Or should we allow setting it to -1 for instance to disable the limit?
>

By disabling the limit, do you mean to say that only
max_parallel_workers_per_gather will determine the workers required or
something else?  If earlier, then I am not sure if it is good idea,
because it can cause some confusion to the user about usage of both
the parameters together.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

29 июня 2016 г., 17:00:59

On 29/06/2016 08:51, Amit Kapila wrote:
> On Wed, Jun 29, 2016 at 11:54 AM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> Or should we allow setting it to -1 for instance to disable the limit?
>>
> 
> By disabling the limit, do you mean to say that only
> max_parallel_workers_per_gather will determine the workers required or
> something else?

I meant what the current behavior (before this patch) is, which is
probably what some user without custom dynamic bgworker may like to
have.  Otherwise, you'd have to change two parameters to effectively
increase parallelism, and I think it can also cause some confusion.

>  If earlier, then I am not sure if it is good idea,
> because it can cause some confusion to the user about usage of both
> the parameters together.
> 

I also agree.  I don't see an ideal solution, so just keeping this patch
behavior is fine for me.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

30 июня 2016 г., 05:46:31

On Wed, Jun 29, 2016 at 11:54 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 29/06/2016 06:29, Amit Kapila wrote:
>> On Wed, Jun 29, 2016 at 2:57 AM, Julien Rouhaud
>> <julien.rouhaud@dalibo.com> wrote:
>>>
>>> Thanks a lot for the help!
>>>
>>> PFA v6 which should fix all the issues mentioned.
>>
>> Couple of minor suggestions.
>>
>> -         <xref linkend="guc-max-worker-processes">.  Note that the requested
>> +         <xref linkend="guc-max-worker-processes">, limited by
>> +         <xref linked="guc-max-parallel-workers">.  Note that the requested
>>
>> Typo.
>> /linked/linkend
>>
>
> Oops, fixed.
>
>> You can always find such mistakes by doing make check in doc/src/sgml/
>>
>
> I wasn't aware of that, it's really a nice thing to know, thanks!
>
>> + /*
>> + * We need a memory barrier here to make sure the above test doesn't get
>> + * reordered
>> + */
>> + pg_read_barrier();
>>
>> /memory barrier/read barrier
>>
>
> fixed
>
>> + if (max_parallel_workers == 0)
>> + {
>> + ereport(elevel,
>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> + errmsg("background worker \"%s\": cannot request parallel worker if
>> no parallel worker allowed",
>>
>> " ..no parallel worker is allowed".  'is' seems to be missing.
>>
>
> fixed
>

Your patch looks good to me and is ready for a committer's look.

Notes for committer -
a. Verify if description of newly added Guc max_parallel_workers looks
okay to you, me and Julien are not in 100% agreement on that.
b. Comments might need some improvement.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

16 сентября 2016 г., 21:24:39

On Wed, Jun 29, 2016 at 10:46 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Your patch looks good to me and is ready for a committer's look.
>
> Notes for committer -
> a. Verify if description of newly added Guc max_parallel_workers looks
> okay to you, me and Julien are not in 100% agreement on that.
> b. Comments might need some improvement.

This patch needs to be rebased.  I hope somebody can volunteer to do
that, because I'd like to commit it once we've hashed out the details.

Would it bother anybody very much if we bumped up these values, say by
increasing max_worker_processes from 8 to 16 and max_parallel_workers
from 4 (as it is in the current patch version) to 8?  I feel like 4 is
a bit more conservative than I'd like to be by default, and I'd like
to make sure that we leave room for other sorts of background workers
between the two limits.

I'd suggest renaming the "parallel" flag to BackgroundWorkerSlot to
"is_parallel_worker".  Or, actually, what I think would be better is
to give it a name like worker_class, and then we can have
BGWORKER_CLASS_PARALLEL and perhaps eventually
BGWORKER_CLASS_REPLICATION, etc.

+ * terminated ones.  These counters can of course overlaps, but it's not
+ * important here since the substraction will still give the right number.

overlaps -> overflow.  substraction -> subtraction.

+       /*
+        * We need a write barrier to make sure the update of
+        * parallel_terminate_count is done before the store to in_use
+        */

Does the order actually matter here?

+               {"max_parallel_workers", PGC_USERSET, RESOURCES_ASYNCHRONOUS,
+                       gettext_noop("Sets the maximum number of
parallel processes for the cluster."),

I suggest: sets the maximum number of parallel workers that can be
active at one time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

16 сентября 2016 г., 22:54:20

On 16/09/2016 20:24, Robert Haas wrote:
> On Wed, Jun 29, 2016 at 10:46 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> Your patch looks good to me and is ready for a committer's look.
>>
>> Notes for committer -
>> a. Verify if description of newly added Guc max_parallel_workers looks
>> okay to you, me and Julien are not in 100% agreement on that.
>> b. Comments might need some improvement.
>
> This patch needs to be rebased.  I hope somebody can volunteer to do
> that, because I'd like to commit it once we've hashed out the details.
>

I just rebased the previous patch on current HEAD, with some other
modifications, see below (attached v8 if that helps).

> Would it bother anybody very much if we bumped up these values, say by
> increasing max_worker_processes from 8 to 16 and max_parallel_workers
> from 4 (as it is in the current patch version) to 8?  I feel like 4 is
> a bit more conservative than I'd like to be by default, and I'd like
> to make sure that we leave room for other sorts of background workers
> between the two limits.
>

That's fine by me.  Should this be done (if there's no objection) in the
same patch, or in another one?


> I'd suggest renaming the "parallel" flag to BackgroundWorkerSlot to
> "is_parallel_worker".  Or, actually, what I think would be better is
> to give it a name like worker_class, and then we can have
> BGWORKER_CLASS_PARALLEL and perhaps eventually
> BGWORKER_CLASS_REPLICATION, etc.
>

For now I just renamed "parallel" to "is_parallel_worker", since this is
straightforward.  For a new "worker_class", I guess we'd need a new enum
stored in BackgroundWorker struct instead of the
BGWORKER_IS_PARALLEL_WORKER flag, and store it in the
BackgroundWorkerSlot. Should I do that instead?


> + * terminated ones.  These counters can of course overlaps, but it's not
> + * important here since the substraction will still give the right number.
>
> overlaps -> overflow.  substraction -> subtraction.
>

oops sorry, fixed

> +       /*
> +        * We need a write barrier to make sure the update of
> +        * parallel_terminate_count is done before the store to in_use
> +        */
>
> Does the order actually matter here?
>

After some more thinking, it looks like a reorder here won't have any
impact. I'll remove it, unless Amit has an objection about it.

> +               {"max_parallel_workers", PGC_USERSET, RESOURCES_ASYNCHRONOUS,
> +                       gettext_noop("Sets the maximum number of
> parallel processes for the cluster."),
>
> I suggest: sets the maximum number of parallel workers that can be
> active at one time.
>

changed

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

global_max_parallel_workers-v8.diff

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

17 сентября 2016 г., 09:40:32

On Fri, Sep 16, 2016 at 11:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> +       /*
> +        * We need a write barrier to make sure the update of
> +        * parallel_terminate_count is done before the store to in_use
> +        */
>
> Does the order actually matter here?
>

I think so.  If slot->in_use is reordered before the check of
is_parallel_worker, then it is possible that concurrent registration
of worker can mark the is_parallel_worker as false before we check the
flag here.  See explanation in previous e-mail [1].

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BQ_DdJ28qXYSHZiBKNf2MY1QFrv5XAOAw4ZXHw4TPMxA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

19 сентября 2016 г., 21:18:41

On Fri, Sep 16, 2016 at 3:53 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> That's fine by me.  Should this be done (if there's no objection) in the
> same patch, or in another one?

I'd say "same patch".

>> I'd suggest renaming the "parallel" flag to BackgroundWorkerSlot to
>> "is_parallel_worker".  Or, actually, what I think would be better is
>> to give it a name like worker_class, and then we can have
>> BGWORKER_CLASS_PARALLEL and perhaps eventually
>> BGWORKER_CLASS_REPLICATION, etc.
>
> For now I just renamed "parallel" to "is_parallel_worker", since this is
> straightforward.  For a new "worker_class", I guess we'd need a new enum
> stored in BackgroundWorker struct instead of the
> BGWORKER_IS_PARALLEL_WORKER flag, and store it in the
> BackgroundWorkerSlot. Should I do that instead?

I suggest that we make it part of bgw_flags, but use a bitmask for it,
like this:

#define BGWORKER_CLASS_MASK   0x00f0
#define BGWORKER_CLASS_PARALLEL  0x0010
/* add additional bgworker classes here */

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

19 сентября 2016 г., 21:19:32

On Sat, Sep 17, 2016 at 2:40 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Fri, Sep 16, 2016 at 11:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> +       /*
>> +        * We need a write barrier to make sure the update of
>> +        * parallel_terminate_count is done before the store to in_use
>> +        */
>>
>> Does the order actually matter here?
>>
>
> I think so.  If slot->in_use is reordered before the check of
> is_parallel_worker, then it is possible that concurrent registration
> of worker can mark the is_parallel_worker as false before we check the
> flag here.  See explanation in previous e-mail [1].

Tricky.  I believe you're right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Peter Eisentraut

Дата:

20 сентября 2016 г., 18:28:08

On 9/19/16 2:18 PM, Robert Haas wrote:
> I suggest that we make it part of bgw_flags, but use a bitmask for it,
> like this:
> 
> #define BGWORKER_CLASS_MASK   0x00f0
> #define BGWORKER_CLASS_PARALLEL  0x0010
> /* add additional bgworker classes here */

Unless we have a mechanism that makes use of this somehow, this attempt
at categorizing might be premature.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

20 сентября 2016 г., 18:47:13

On Tue, Sep 20, 2016 at 11:27 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 9/19/16 2:18 PM, Robert Haas wrote:
>> I suggest that we make it part of bgw_flags, but use a bitmask for it,
>> like this:
>>
>> #define BGWORKER_CLASS_MASK   0x00f0
>> #define BGWORKER_CLASS_PARALLEL  0x0010
>> /* add additional bgworker classes here */
>
> Unless we have a mechanism that makes use of this somehow, this attempt
> at categorizing might be premature.

My main reason for wanting is that I suspect there will be a similar
desire to limit the number of replication workers at some point.

However, that could be wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Petr Jelinek

Дата:

20 сентября 2016 г., 20:30:54

On 20/09/16 17:47, Robert Haas wrote:
> On Tue, Sep 20, 2016 at 11:27 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 9/19/16 2:18 PM, Robert Haas wrote:
>>> I suggest that we make it part of bgw_flags, but use a bitmask for it,
>>> like this:
>>>
>>> #define BGWORKER_CLASS_MASK   0x00f0
>>> #define BGWORKER_CLASS_PARALLEL  0x0010
>>> /* add additional bgworker classes here */
>>
>> Unless we have a mechanism that makes use of this somehow, this attempt
>> at categorizing might be premature.
> 
> My main reason for wanting is that I suspect there will be a similar
> desire to limit the number of replication workers at some point.
> 

Yes there definitely is desire to use this for replication workers as
well. The logical replication launcher currently handles the limit by
itself but I'd definitely prefer to reuse something more generic.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services

Re: Rename max_parallel_degree?

От

Peter Eisentraut

Дата:

20 сентября 2016 г., 22:32:05

On 9/20/16 11:47 AM, Robert Haas wrote:
>>> >> #define BGWORKER_CLASS_MASK   0x00f0
>>> >> #define BGWORKER_CLASS_PARALLEL  0x0010
>>> >> /* add additional bgworker classes here */
>> >
>> > Unless we have a mechanism that makes use of this somehow, this attempt
>> > at categorizing might be premature.
> My main reason for wanting is that I suspect there will be a similar
> desire to limit the number of replication workers at some point.

Would that work for something developed as an extension?  Would we all
have to agree on non-conflicting numbers?  Maybe a string tag would work
better?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

20 сентября 2016 г., 23:07:44

On Tue, Sep 20, 2016 at 3:31 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 9/20/16 11:47 AM, Robert Haas wrote:
>>>> >> #define BGWORKER_CLASS_MASK   0x00f0
>>>> >> #define BGWORKER_CLASS_PARALLEL  0x0010
>>>> >> /* add additional bgworker classes here */
>>> >
>>> > Unless we have a mechanism that makes use of this somehow, this attempt
>>> > at categorizing might be premature.
>> My main reason for wanting is that I suspect there will be a similar
>> desire to limit the number of replication workers at some point.
>
> Would that work for something developed as an extension?  Would we all
> have to agree on non-conflicting numbers?  Maybe a string tag would work
> better?

No, I'm assuming that the classes would be built-in.  A string tag
seems like over-engineering to me, particularly because the postmaster
needs to switch on the tag, and we need to be very careful about the
degree to which the postmaster trusts the contents of shared memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Peter Eisentraut

Дата:

23 сентября 2016 г., 15:55:00

On 9/20/16 4:07 PM, Robert Haas wrote:
> No, I'm assuming that the classes would be built-in.  A string tag
> seems like over-engineering to me, particularly because the postmaster
> needs to switch on the tag, and we need to be very careful about the
> degree to which the postmaster trusts the contents of shared memory.

I'm hoping that we can come up with something that extensions can
participate in, without the core having to know ahead of time what those
extensions are or how they would be categorized.

My vision is something like

max_processes = 512  # requires restart

process_allowances = 'connection:300 superuser:10 autovacuum:10
parallel:30 replication:10 someextension:20 someotherextension:20'
# does not require restart

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

23 сентября 2016 г., 22:10:27

On Fri, Sep 23, 2016 at 8:54 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 9/20/16 4:07 PM, Robert Haas wrote:
>> No, I'm assuming that the classes would be built-in.  A string tag
>> seems like over-engineering to me, particularly because the postmaster
>> needs to switch on the tag, and we need to be very careful about the
>> degree to which the postmaster trusts the contents of shared memory.
>
> I'm hoping that we can come up with something that extensions can
> participate in, without the core having to know ahead of time what those
> extensions are or how they would be categorized.
>
> My vision is something like
>
> max_processes = 512  # requires restart
>
> process_allowances = 'connection:300 superuser:10 autovacuum:10
> parallel:30 replication:10 someextension:20 someotherextension:20'
> # does not require restart

I don't think it's going to be very practical to allow extensions to
participate in the mechanism because there have to be a finite number
of slots that is known at the time we create the main shared memory
segment.

Also, it's really important that we don't add lots more surface area
for the postmaster to crash and burn.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

30 сентября 2016 г., 19:23:45

On 23/09/2016 21:10, Robert Haas wrote:
> On Fri, Sep 23, 2016 at 8:54 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 9/20/16 4:07 PM, Robert Haas wrote:
>>> No, I'm assuming that the classes would be built-in.  A string tag
>>> seems like over-engineering to me, particularly because the postmaster
>>> needs to switch on the tag, and we need to be very careful about the
>>> degree to which the postmaster trusts the contents of shared memory.
>>
>> I'm hoping that we can come up with something that extensions can
>> participate in, without the core having to know ahead of time what those
>> extensions are or how they would be categorized.
>>
>> My vision is something like
>>
>> max_processes = 512  # requires restart
>>
>> process_allowances = 'connection:300 superuser:10 autovacuum:10
>> parallel:30 replication:10 someextension:20 someotherextension:20'
>> # does not require restart
> 
> I don't think it's going to be very practical to allow extensions to
> participate in the mechanism because there have to be a finite number
> of slots that is known at the time we create the main shared memory
> segment.
> 
> Also, it's really important that we don't add lots more surface area
> for the postmaster to crash and burn.
> 

It seems that there's no objection on Robert's initial proposal, so I'll
try to implement it.

I've already fixed every other issues mentioned upthread, but I'm facing
a problem for this one.  Assuming that the bgworker classes are supposed
to be mutually exclusive, I don't see a simple and clean way to add such
a check in SanityCheckBackgroundWorker().  Am I missing something
obvious, or can someone give me some advice for this?

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: Rename max_parallel_degree?

От

Michael Paquier

Дата:

03 октября 2016 г., 05:20:32

On Sat, Oct 1, 2016 at 1:23 AM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> On 23/09/2016 21:10, Robert Haas wrote:
>> On Fri, Sep 23, 2016 at 8:54 AM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>> On 9/20/16 4:07 PM, Robert Haas wrote:
>>>> No, I'm assuming that the classes would be built-in.  A string tag
>>>> seems like over-engineering to me, particularly because the postmaster
>>>> needs to switch on the tag, and we need to be very careful about the
>>>> degree to which the postmaster trusts the contents of shared memory.
>>>
>>> I'm hoping that we can come up with something that extensions can
>>> participate in, without the core having to know ahead of time what those
>>> extensions are or how they would be categorized.
>>>
>>> My vision is something like
>>>
>>> max_processes = 512  # requires restart
>>>
>>> process_allowances = 'connection:300 superuser:10 autovacuum:10
>>> parallel:30 replication:10 someextension:20 someotherextension:20'
>>> # does not require restart
>>
>> I don't think it's going to be very practical to allow extensions to
>> participate in the mechanism because there have to be a finite number
>> of slots that is known at the time we create the main shared memory
>> segment.
>>
>> Also, it's really important that we don't add lots more surface area
>> for the postmaster to crash and burn.
>>
>
> It seems that there's no objection on Robert's initial proposal, so I'll
> try to implement it.
>
> I've already fixed every other issues mentioned upthread, but I'm facing
> a problem for this one.  Assuming that the bgworker classes are supposed
> to be mutually exclusive, I don't see a simple and clean way to add such
> a check in SanityCheckBackgroundWorker().  Am I missing something
> obvious, or can someone give me some advice for this?

Okay, so marking it as returned with feedback is adapted? I have done
so but feel free to contradict me.
-- 
Michael

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

03 октября 2016 г., 22:27:12

On Fri, Sep 30, 2016 at 12:23 PM, Julien Rouhaud
<julien.rouhaud@dalibo.com> wrote:
> I've already fixed every other issues mentioned upthread, but I'm facing
> a problem for this one.  Assuming that the bgworker classes are supposed
> to be mutually exclusive, I don't see a simple and clean way to add such
> a check in SanityCheckBackgroundWorker().  Am I missing something
> obvious, or can someone give me some advice for this?

My advice is "don't worry about it".   If somebody thinks of something
that can be usefully added there, it will take very little time to add
it and test that it works.  Don't get hung up on that for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

04 октября 2016 г., 17:16:55

On 03/10/2016 21:27, Robert Haas wrote:
> On Fri, Sep 30, 2016 at 12:23 PM, Julien Rouhaud
> <julien.rouhaud@dalibo.com> wrote:
>> I've already fixed every other issues mentioned upthread, but I'm facing
>> a problem for this one.  Assuming that the bgworker classes are supposed
>> to be mutually exclusive, I don't see a simple and clean way to add such
>> a check in SanityCheckBackgroundWorker().  Am I missing something
>> obvious, or can someone give me some advice for this?
>
> My advice is "don't worry about it".   If somebody thinks of something
> that can be usefully added there, it will take very little time to add
> it and test that it works.  Don't get hung up on that for now.
>

Ok, thanks!

Please find attached v9 of the patch, adding the parallel worker class
and changing max_worker_processes default to 16 and max_parallel_workers
to 8.  I also added Amit's explanation for the need of a write barrier
in ForgetBackgroundWorker().

I'll add this patch to the next commitfest.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вложения

global_max_parallel_workers-v9.diff

Re: Rename max_parallel_degree?

От

Peter Eisentraut

Дата:

12 октября 2016 г., 18:14:00

On 10/4/16 10:16 AM, Julien Rouhaud wrote:
> Please find attached v9 of the patch, adding the parallel worker class
> and changing max_worker_processes default to 16 and max_parallel_workers
> to 8.  I also added Amit's explanation for the need of a write barrier
> in ForgetBackgroundWorker().

This approach totally messes up the decoupling between the background
worker facilities and consumers of those facilities.  Having dozens of
lines of code in bgworker.c that does the accounting and resource
checking on behalf of parallel.c looks very suspicious.  Once we add
logical replication workers or whatever, we'll be tempted to add even
more stuff in there and it will become a mess.

I think we should think of a way where we can pass the required
information during background worker setup, like a pointer to the
resource-limiting GUC variable.

For style, I would also prefer the "class" to be a separate struct field
from the flags.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

13 октября 2016 г., 02:58:14

On Wed, Oct 12, 2016 at 8:13 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 10/4/16 10:16 AM, Julien Rouhaud wrote:
>> Please find attached v9 of the patch, adding the parallel worker class
>> and changing max_worker_processes default to 16 and max_parallel_workers
>> to 8.  I also added Amit's explanation for the need of a write barrier
>> in ForgetBackgroundWorker().
>
> This approach totally messes up the decoupling between the background
> worker facilities and consumers of those facilities.  Having dozens of
> lines of code in bgworker.c that does the accounting and resource
> checking on behalf of parallel.c looks very suspicious.  Once we add
> logical replication workers or whatever, we'll be tempted to add even
> more stuff in there and it will become a mess.

I attach a new version of the patch that I've been hacking on in my
"spare time", which reduces the footprint in bgworker.c quite a bit.
I don't think it's wrong that the handling is done there, though.  The
process that is registering the background worker is well-placed to
check whether there are already too many, and if it does not then the
slot is consumed at least temporarily even if it busts the cap.  On
the flip side, the postmaster is the only process that is well-placed
to know when a background worker terminates.  The worker process
itself can't be made responsible for it, as you suggest below, because
it may never even start up in the first place (e.g. fork() returns
EAGAIN).  And the registering process can't be made responsible,
because it might die before the worker.

> I think we should think of a way where we can pass the required
> information during background worker setup, like a pointer to the
> resource-limiting GUC variable.

I don't think this can work, per the above.

> For style, I would also prefer the "class" to be a separate struct field
> from the flags.

I think that it makes sense to keep the class as part of the flags
because then a large amount of the stuff in this patch that is
concerned with propagating the flags around becomes unnecessary.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

max-parallel-workers-v10.patch

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

24 октября 2016 г., 10:48:54

On Thu, Oct 13, 2016 at 5:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Oct 12, 2016 at 8:13 AM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> On 10/4/16 10:16 AM, Julien Rouhaud wrote:
>>> Please find attached v9 of the patch, adding the parallel worker class
>>> and changing max_worker_processes default to 16 and max_parallel_workers
>>> to 8.  I also added Amit's explanation for the need of a write barrier
>>> in ForgetBackgroundWorker().
>>
>> This approach totally messes up the decoupling between the background
>> worker facilities and consumers of those facilities.  Having dozens of
>> lines of code in bgworker.c that does the accounting and resource
>> checking on behalf of parallel.c looks very suspicious.  Once we add
>> logical replication workers or whatever, we'll be tempted to add even
>> more stuff in there and it will become a mess.
>
> I attach a new version of the patch that I've been hacking on in my
> "spare time", which reduces the footprint in bgworker.c quite a bit.
>

Couple of comments -

@@ -370,6 +388,9 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
 Assert(rw->rw_shmem_slot <
max_worker_processes); slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
+ if ((rw-
>rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
+ BackgroundWorkerData-
>parallel_terminate_count++;
+ slot->in_use = false;

It seems upthread [1], we agreed to have a write barrier before the
setting slot->in_use, but I don't see the same in patch.


Isn't it better to keep planner related checks from Julien's patch,
especially below one:

--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -249,6 +249,7 @@ standard_planner(Query *parse, int cursorOptions,
ParamListInfo boundParams) parse->utilityStmt == NULL && !parse->hasModifyingCTE && max_parallel_workers_per_gather > 0
&&
+ max_parallel_workers > 0 && !IsParallelWorker() && !IsolationIsSerializable())

> I don't think it's wrong that the handling is done there, though.  The
> process that is registering the background worker is well-placed to
> check whether there are already too many, and if it does not then the
> slot is consumed at least temporarily even if it busts the cap.  On
> the flip side, the postmaster is the only process that is well-placed
> to know when a background worker terminates.  The worker process
> itself can't be made responsible for it, as you suggest below, because
> it may never even start up in the first place (e.g. fork() returns
> EAGAIN).  And the registering process can't be made responsible,
> because it might die before the worker.
>
>> I think we should think of a way where we can pass the required
>> information during background worker setup, like a pointer to the
>> resource-limiting GUC variable.
>
> I don't think this can work, per the above.
>

I see the value in Peter's point, but could not think of a way to
implement the same.


[1] - https://www.postgresql.org/message-id/CA%2BTgmoZS7DFFGz7kKWLoTAsNnXRKUiQFDt5yHgf4L73z52dgAQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Peter Eisentraut

Дата:

24 октября 2016 г., 23:04:30

On 10/12/16 7:58 PM, Robert Haas wrote:
> I don't think it's wrong that the handling is done there, though.  The
> process that is registering the background worker is well-placed to
> check whether there are already too many, and if it does not then the
> slot is consumed at least temporarily even if it busts the cap.  On
> the flip side, the postmaster is the only process that is well-placed
> to know when a background worker terminates.  The worker process
> itself can't be made responsible for it, as you suggest below, because
> it may never even start up in the first place (e.g. fork() returns
> EAGAIN).  And the registering process can't be made responsible,
> because it might die before the worker.

Those are valid technical points.  I have not worked out any alternatives.

I'm concerned that all this makes background workers managed by
extensions second-class citizens.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

26 октября 2016 г., 15:24:40

On Mon, Oct 24, 2016 at 3:48 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Thu, Oct 13, 2016 at 5:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Oct 12, 2016 at 8:13 AM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>>> On 10/4/16 10:16 AM, Julien Rouhaud wrote:
>>>> Please find attached v9 of the patch, adding the parallel worker class
>>>> and changing max_worker_processes default to 16 and max_parallel_workers
>>>> to 8.  I also added Amit's explanation for the need of a write barrier
>>>> in ForgetBackgroundWorker().
>>>
>>> This approach totally messes up the decoupling between the background
>>> worker facilities and consumers of those facilities.  Having dozens of
>>> lines of code in bgworker.c that does the accounting and resource
>>> checking on behalf of parallel.c looks very suspicious.  Once we add
>>> logical replication workers or whatever, we'll be tempted to add even
>>> more stuff in there and it will become a mess.
>>
>> I attach a new version of the patch that I've been hacking on in my
>> "spare time", which reduces the footprint in bgworker.c quite a bit.
>>
>
> Couple of comments -
>
> @@ -370,6 +388,9 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
>
>   Assert(rw->rw_shmem_slot <
> max_worker_processes);
>   slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
> + if ((rw-
>>rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
> + BackgroundWorkerData-
>>parallel_terminate_count++;
> +
>   slot->in_use = false;
>
> It seems upthread [1], we agreed to have a write barrier before the
> setting slot->in_use, but I don't see the same in patch.

That's because I removed it.  The reason given for the barrier was
that otherwise it might be reordered before the check of
is_parallel_worker, but that's now done by checking the postmaster's
backend-private copy of the flags, not the copy in shared memory.  So
the reordering can't affect the result.

> Isn't it better to keep planner related checks from Julien's patch,
> especially below one:
>
> --- a/src/backend/optimizer/plan/planner.c
> +++ b/src/backend/optimizer/plan/planner.c
> @@ -249,6 +249,7 @@ standard_planner(Query *parse, int cursorOptions,
> ParamListInfo boundParams)
>   parse->utilityStmt == NULL &&
>   !parse->hasModifyingCTE &&
>   max_parallel_workers_per_gather > 0 &&
> + max_parallel_workers > 0 &&
>   !IsParallelWorker() &&
>   !IsolationIsSerializable())

I don't really think we need to do that.  If you set
max_parallel_workers_per_gather > max_parallel_workers, you might get
some plans that will never succeed in obtaining the number of workers
for which they planned.  If that bothers you, don't do it.  It's
actually quite useful to do that for testing purposes, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Amit Kapila

Дата:

26 октября 2016 г., 16:03:18

On Wed, Oct 26, 2016 at 5:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 24, 2016 at 3:48 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Oct 13, 2016 at 5:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Wed, Oct 12, 2016 at 8:13 AM, Peter Eisentraut
>>> <peter.eisentraut@2ndquadrant.com> wrote:
>>>> On 10/4/16 10:16 AM, Julien Rouhaud wrote:
>>>>> Please find attached v9 of the patch, adding the parallel worker class
>>>>> and changing max_worker_processes default to 16 and max_parallel_workers
>>>>> to 8.  I also added Amit's explanation for the need of a write barrier
>>>>> in ForgetBackgroundWorker().
>>>>
>>>> This approach totally messes up the decoupling between the background
>>>> worker facilities and consumers of those facilities.  Having dozens of
>>>> lines of code in bgworker.c that does the accounting and resource
>>>> checking on behalf of parallel.c looks very suspicious.  Once we add
>>>> logical replication workers or whatever, we'll be tempted to add even
>>>> more stuff in there and it will become a mess.
>>>
>>> I attach a new version of the patch that I've been hacking on in my
>>> "spare time", which reduces the footprint in bgworker.c quite a bit.
>>>
>>
>> Couple of comments -
>>
>> @@ -370,6 +388,9 @@ ForgetBackgroundWorker(slist_mutable_iter *cur)
>>
>>   Assert(rw->rw_shmem_slot <
>> max_worker_processes);
>>   slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
>> + if ((rw-
>>>rw_worker.bgw_flags & BGWORKER_CLASS_PARALLEL) != 0)
>> + BackgroundWorkerData-
>>>parallel_terminate_count++;
>> +
>>   slot->in_use = false;
>>
>> It seems upthread [1], we agreed to have a write barrier before the
>> setting slot->in_use, but I don't see the same in patch.
>
> That's because I removed it.  The reason given for the barrier was
> that otherwise it might be reordered before the check of
> is_parallel_worker, but that's now done by checking the postmaster's
> backend-private copy of the flags, not the copy in shared memory.  So
> the reordering can't affect the result.
>

You are right.  I missed to notice that.

The patch looks good to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

26 октября 2016 г., 19:38:25

On Mon, Oct 24, 2016 at 4:04 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 10/12/16 7:58 PM, Robert Haas wrote:
>> I don't think it's wrong that the handling is done there, though.  The
>> process that is registering the background worker is well-placed to
>> check whether there are already too many, and if it does not then the
>> slot is consumed at least temporarily even if it busts the cap.  On
>> the flip side, the postmaster is the only process that is well-placed
>> to know when a background worker terminates.  The worker process
>> itself can't be made responsible for it, as you suggest below, because
>> it may never even start up in the first place (e.g. fork() returns
>> EAGAIN).  And the registering process can't be made responsible,
>> because it might die before the worker.
>
> Those are valid technical points.  I have not worked out any alternatives.
>
> I'm concerned that all this makes background workers managed by
> extensions second-class citizens.

I suppose at some level that's true, but I have a hard time getting
worked up about it.  There are always some areas where core code is
going to be privileged over non-core code, but I'd say that background
workers stand out as a shining example of enabling interesting
non-core code.  In fact, the main reason why this patch exists at all
is so that the sole in-core user of the background worker facility -
parallel query - doesn't crowd out interesting non-core uses of that
facility.  This isn't a parallel query feature; it's an
everything-that-uses-background-workers-that-is-not-parallel-query
feature.

Also, for many types of background workers, this kind of limiting
behavior isn't really useful or doesn't really make sense.  Obviously,
any application that uses exactly one background worker (or any fixed
number of background workers) doesn't need any feature similar to this
one.  Also, any application that uses a "launcher" process and some
number of per-database workers can limit the number of workers it
consumes by teaching the launcher process to enforce a limit.  These
two patterns are quite common, I think, and probably cover a lot of
what background workers might like to do.  Parallel query is a little
bit unusual in that any backend in the system might launch workers
without having any idea what workers other backends are already doing.
It seems unlikely to me that many extensions would share this pattern,
but perhaps I lack sufficient imagination.  pg_background, or any
facility derived from it, might qualify, but I can't think what else
would.

In any event, this is not written in stone.  I don't think it would be
a huge deal to change this later if we come up with something better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Haribabu Kommi

Дата:

02 декабря 2016 г., 06:07:55

On Thu, Oct 27, 2016 at 3:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 24, 2016 at 4:04 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 10/12/16 7:58 PM, Robert Haas wrote:
>> I don't think it's wrong that the handling is done there, though. The
>> process that is registering the background worker is well-placed to
>> check whether there are already too many, and if it does not then the
>> slot is consumed at least temporarily even if it busts the cap. On
>> the flip side, the postmaster is the only process that is well-placed
>> to know when a background worker terminates. The worker process
>> itself can't be made responsible for it, as you suggest below, because
>> it may never even start up in the first place (e.g. fork() returns
>> EAGAIN). And the registering process can't be made responsible,
>> because it might die before the worker.
>
> Those are valid technical points. I have not worked out any alternatives.
>
> I'm concerned that all this makes background workers managed by
> extensions second-class citizens.

I suppose at some level that's true, but I have a hard time getting
worked up about it. There are always some areas where core code is
going to be privileged over non-core code, but I'd say that background
workers stand out as a shining example of enabling interesting
non-core code. In fact, the main reason why this patch exists at all
is so that the sole in-core user of the background worker facility -
parallel query - doesn't crowd out interesting non-core uses of that
facility. This isn't a parallel query feature; it's an
everything-that-uses-background-workers-that-is-not-parallel-query
feature.

Also, for many types of background workers, this kind of limiting
behavior isn't really useful or doesn't really make sense. Obviously,
any application that uses exactly one background worker (or any fixed
number of background workers) doesn't need any feature similar to this
one. Also, any application that uses a "launcher" process and some
number of per-database workers can limit the number of workers it
consumes by teaching the launcher process to enforce a limit. These
two patterns are quite common, I think, and probably cover a lot of
what background workers might like to do. Parallel query is a little
bit unusual in that any backend in the system might launch workers
without having any idea what workers other backends are already doing.
It seems unlikely to me that many extensions would share this pattern,
but perhaps I lack sufficient imagination. pg_background, or any
facility derived from it, might qualify, but I can't think what else
would.

In any event, this is not written in stone. I don't think it would be
a huge deal to change this later if we come up with something better.

From the recent mails, it is not clear to me what is the status of this patch.

moved to next CF with "needs review" state. Please feel free to update

the status.

Regards,

Hari Babu

Fujitsu Australia

Re: Rename max_parallel_degree?

От

Robert Haas

Дата:

02 декабря 2016 г., 15:46:01

On Thu, Dec 1, 2016 at 10:07 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> From the recent mails, it is not clear to me what is the status of this
> patch.
> moved to next CF with "needs review" state. Please feel free to update
> the status.

I have committed this patch.  And updated the status, too!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Rename max_parallel_degree?

От

Julien Rouhaud

Дата:

04 декабря 2016 г., 16:29:13

On Fri, Dec 02, 2016 at 07:45:56AM -0500, Robert Haas wrote:
> On Thu, Dec 1, 2016 at 10:07 PM, Haribabu Kommi
> <kommi.haribabu@gmail.com> wrote:
> > From the recent mails, it is not clear to me what is the status of this
> > patch.
> > moved to next CF with "needs review" state. Please feel free to update
> > the status.
> 
> I have committed this patch.  And updated the status, too!

Thanks!

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Rename max_parallel_degree?

Вложения

Вложения

Вложения