Обсуждение: max_parallel_degree context level

Поиск
Список
Период
Сортировка

max_parallel_degree context level

От
Thom Brown
Дата:
Hi all,

As it currently stands, max_parallel_degree is set to a superuser
context, but we probably want to discuss whether we want to keep it
this way prior to releasing 9.6.  Might we want to reduce its level so
that users can adjust it accordingly?  They'd still be limited by
max_worker_processes, so they'd at least be constrained by that
setting.

Opinions?

Thom



Re: max_parallel_degree context level

От
Robert Haas
Дата:
On Thu, Feb 11, 2016 at 7:40 AM, Thom Brown <thom@linux.com> wrote:
> As it currently stands, max_parallel_degree is set to a superuser
> context, but we probably want to discuss whether we want to keep it
> this way prior to releasing 9.6.  Might we want to reduce its level so
> that users can adjust it accordingly?  They'd still be limited by
> max_worker_processes, so they'd at least be constrained by that
> setting.

I don't have a clue why it's like that.  It seems like it should be
PGC_USERSSET just like, say, work_mem.  I think that's just brain fade
on my part, and I think the current setting will be really
inconvenient for unprivileged users: as it is, they have no way to
turn parallel query off.  Unless somebody objects, I'll go change
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: max_parallel_degree context level

От
Dean Rasheed
Дата:
On 11 February 2016 at 13:18, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 11, 2016 at 7:40 AM, Thom Brown <thom@linux.com> wrote:
>> As it currently stands, max_parallel_degree is set to a superuser
>> context
>
> I don't have a clue why it's like that.  It seems like it should be
> PGC_USERSSET just like, say, work_mem.  I think that's just brain fade
> on my part, and I think the current setting will be really
> inconvenient for unprivileged users: as it is, they have no way to
> turn parallel query off.  Unless somebody objects, I'll go change
> that.
>

+1. I would want it to be user settable.

Regards,
Dean



Re: max_parallel_degree context level

От
Simon Riggs
Дата:
On 11 February 2016 at 12:40, Thom Brown <thom@linux.com> wrote:
Hi all,

As it currently stands, max_parallel_degree is set to a superuser
context, but we probably want to discuss whether we want to keep it
this way prior to releasing 9.6.  Might we want to reduce its level so
that users can adjust it accordingly?  They'd still be limited by
max_worker_processes, so they'd at least be constrained by that
setting.

A few questions and thoughts to help decide...

Does it take into account the parallel degree during planning?
Does it take into account the actual parallel degree during planning?

If you make max_worker_processes USERSET won't everybody just set it to max_worker_processes?

How do you prevent or control that? Is that limited by the user's connection limit?

How does the server behave when less servers are available than max_parallel_degree?

Is it slower if you request N workers, yet only 1 is available?

Does pg_stat_activity show the number of parallel workers active for a controliing process?
Do parallel workers also show in pg_stat_activity at all?
If so, does it show who currently has them?
Does pg_stat_statements record how many workers were available during execution?

Is there a way to prevent execution if too few parallel workers are available?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: max_parallel_degree context level

От
Robert Haas
Дата:
On Thu, Feb 11, 2016 at 10:32 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> A few questions and thoughts to help decide...
>
> Does it take into account the parallel degree during planning?
> Does it take into account the actual parallel degree during planning?

max_worker_processes is a query planner GUC, just like work_mem.  Just
as we can't know how much memory will be available at planning time,
we can't know how many worker processes will be available at execution
time.  In each case, we have a GUC that tells the system what to
assume.  In each case also, some better model might be possible, but
today we don't have it.

> If you make max_worker_processes USERSET won't everybody just set it to
> max_worker_processes?

I think that you meant for the first instance of max_worker_processes
in that sentence to be max_parallel_degree.  I'll respond as if that's
what you meant.  Basically, I think this like asking whether everybody
won't just set work_mem to the entire amount of free memory on the
machine and try to use it all themselves.  We really have never tried
very hard to prevent that sort of thing in PostgreSQL.  Maybe we
should, but we'd have to fix an awful lot of stuff.  There are many
ways for malicious users to do things that interfere with the ability
of other users to use the system.  I admit that the same problem
exists here, but I don't think it's any more severe than any of the
cases that already exist.  In some ways I think it's a whole lot LESS
serious than what a bad work_mem setting an do to your system.

> How does the server behave when less servers are available than
> max_parallel_degree?

The same query plan is executed with fewer workers, even with 0
workers.  If we chose a parallel plan that is a mirror of the
non-parallel plan we would have chosen, this doesn't cost much.  If
there's some other non-parallel plan that would be much faster and we
only picked this parallel plan because we thought we would have
several workers available, and then we get fewer or none, that might
be expensive.  One can imagine a system that always computes both a
parallel plan and a non-parallel plan and chooses between them at
runtime, or even multiple plans for varying number of workers, but we
don't have that today.  I am not actually sure it would be worth it.

Basically, I think this comes back to the analogy between
max_parallel_degree and work_mem.  If you set work_mem too high and
the system starts swapping and becomes very slow, that's your fault
(we say) for setting an unreasonable value of work_mem.  Similarly, if
you set max_parallel_degree to an unreasonable value such that the
system is unlikely to be able to obtain that number of workers at
execution time, you have configured your query planner settings
poorly.  This is no different than setting random_page_cost lower than
seq_page_cost or any number of other dumb things you could do.

> Is it slower if you request N workers, yet only 1 is available?

I sure hope so.  There may be some cases where more workers are slower
than fewer workers, but those cases are defects that we should try to
fix.

> Does pg_stat_activity show the number of parallel workers active for a
> controliing process?
> Do parallel workers also show in pg_stat_activity at all?
> If so, does it show who currently has them?
> Does pg_stat_statements record how many workers were available during
> execution?

Background workers show up in pg_stat_activity, but the number of
workers used by a parallel query isn't reported anywhere.  It's
usually pretty easy to figure out from the EXPLAIN (ANALYZE, VERBOSE)
output, but clearly there might be some benefit in reporting it to
other monitoring facilities.  I hadn't really thought about that idea
before, but it's a good thought.

> Is there a way to prevent execution if too few parallel workers are
> available?

No. That might be a useful feature, but I don't have any plans to
implement it myself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: max_parallel_degree context level

От
Joe Conway
Дата:
On 02/11/2016 07:55 AM, Robert Haas wrote:
> On Thu, Feb 11, 2016 at 10:32 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> If you make max_worker_processes USERSET won't everybody just set it to
>> max_worker_processes?
>
> I think that you meant for the first instance of max_worker_processes
> in that sentence to be max_parallel_degree.  I'll respond as if that's
> what you meant.  Basically, I think this like asking whether everybody
> won't just set work_mem to the entire amount of free memory on the
> machine and try to use it all themselves.  We really have never tried
> very hard to prevent that sort of thing in PostgreSQL.  Maybe we
> should, but we'd have to fix an awful lot of stuff.  There are many
> ways for malicious users to do things that interfere with the ability
> of other users to use the system.  I admit that the same problem
> exists here, but I don't think it's any more severe than any of the
> cases that already exist.  In some ways I think it's a whole lot LESS
> serious than what a bad work_mem setting an do to your system.

This is pretty much exactly what I was thinking -- work_mem is already a
bigger potential problem than this. In general I think we need to
eventually provide more admin control over USERSET GUCs, but that is a
whole other conversation.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Re: max_parallel_degree context level

От
David Rowley
Дата:
On 12 February 2016 at 04:55, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 11, 2016 at 10:32 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Is it slower if you request N workers, yet only 1 is available?
>
> I sure hope so.  There may be some cases where more workers are slower
> than fewer workers, but those cases are defects that we should try to
> fix.

It would only take anything but the CPU to be a bottleneck for this to
be highly likely the case.
If a non-parallel query is bound on I/O, then adding workers is most
likely going to slow it down further. I've seen this when testing
parallel aggregates.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: max_parallel_degree context level

От
Robert Haas
Дата:
On Sun, Mar 20, 2016 at 3:01 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 12 February 2016 at 04:55, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Thu, Feb 11, 2016 at 10:32 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> Is it slower if you request N workers, yet only 1 is available?
>>
>> I sure hope so.  There may be some cases where more workers are slower
>> than fewer workers, but those cases are defects that we should try to
>> fix.
>
> It would only take anything but the CPU to be a bottleneck for this to
> be highly likely the case.
> If a non-parallel query is bound on I/O, then adding workers is most
> likely going to slow it down further. I've seen this when testing
> parallel aggregates.

Yeah.  If you're bottlenecked on I/O, having more workers fighting
over the limited amount of CPU work available just adds context
switching and communication overhead.  That's not a particularly easy
problem to solve.  One can imagine a system where the workers exit if
they turn out not be needed, but then of course you might end up
needing them later if the situation shifts.  I think eventually we
should have the ability for workers to both dynamically leave queries
that are I/O bound and dynamically join queries that become CPU bound,
but that is going to be a bit more than we can fit into 9.6.

Meanwhile, I made the change that was the original purpose of this thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company