Обсуждение: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup as parallel-restricted.

Поиск
Список
Период
Сортировка

[COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup as parallel-restricted.

От
Robert Haas
Дата:
Mark pg_start_backup and pg_stop_backup as parallel-restricted.

They depend on backend-private state that will not be synchronized by
the parallel machinery, so they should not be marked parallel-safe.
This issue also exists in 9.6, but we obviously can't do anything
about 9.6 clusters that already exist.  Possibly this could be
back-patched so that future 9.6 clusters would come out OK, or
possibly we should back-patch some other fix, but that would need more
discussion.

David Steele, reviewed by Michael Paquier

Discussion: http://postgr.es/m/CA+TgmoYCWfO2UM-t=HUMFJyxJywLDiLL0nAJpx88LKtvBvNECw@mail.gmail.com

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/9fe3c644a73198941e9a502958c24727dc4a6434

Modified Files
--------------
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_proc.h    | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)


Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup asparallel-restricted.

От
David Steele
Дата:
On 3/6/17 12:48 PM, Robert Haas wrote:
> Mark pg_start_backup and pg_stop_backup as parallel-restricted.
>
> They depend on backend-private state that will not be synchronized by
> the parallel machinery, so they should not be marked parallel-safe.
> This issue also exists in 9.6, but we obviously can't do anything
> about 9.6 clusters that already exist.  Possibly this could be
> back-patched so that future 9.6 clusters would come out OK, or
> possibly we should back-patch some other fix, but that would need more
> discussion.

I think it would be worth back-patching the catalog fix for future 9.6
clusters as a start.  Parallelism is off by default in 9.6 so that
mitigates some of the problem.

I don't have any regression tests that cover backups when parallelism is
enabled in 9.6, but I'm willing to do that and see if this is a
realistic issue or not.

--
-David
david@pgmasters.net


Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup as parallel-restricted.

От
Tom Lane
Дата:
David Steele <david@pgmasters.net> writes:
> On 3/6/17 12:48 PM, Robert Haas wrote:
>> This issue also exists in 9.6, but we obviously can't do anything
>> about 9.6 clusters that already exist.  Possibly this could be
>> back-patched so that future 9.6 clusters would come out OK, or
>> possibly we should back-patch some other fix, but that would need more
>> discussion.

> I think it would be worth back-patching the catalog fix for future 9.6
> clusters as a start.

Yes, I think it's rather silly not to do so.  We have made comparable
backpatched fixes multiple times in the past.  What is worth discussing is
whether there are *additional* things we ought to do in 9.6 to prevent
misbehavior in installations initdb'd pre-9.6.3.

If there's a cheap way of testing "AmInParallelWorker", I'd be in favor of
adding a quick-n-dirty test and ereport(ERROR) to these functions in the
9.6 branch, so that at least you get a clean error and not some weird
misbehavior.  Not sure if there's anything more we can do than that.

            regards, tom lane


Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup asparallel-restricted.

От
Stephen Frost
Дата:
Tom,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> David Steele <david@pgmasters.net> writes:
> > On 3/6/17 12:48 PM, Robert Haas wrote:
> >> This issue also exists in 9.6, but we obviously can't do anything
> >> about 9.6 clusters that already exist.  Possibly this could be
> >> back-patched so that future 9.6 clusters would come out OK, or
> >> possibly we should back-patch some other fix, but that would need more
> >> discussion.
>
> > I think it would be worth back-patching the catalog fix for future 9.6
> > clusters as a start.
>
> Yes, I think it's rather silly not to do so.  We have made comparable
> backpatched fixes multiple times in the past.  What is worth discussing is
> whether there are *additional* things we ought to do in 9.6 to prevent
> misbehavior in installations initdb'd pre-9.6.3.
>
> If there's a cheap way of testing "AmInParallelWorker", I'd be in favor of
> adding a quick-n-dirty test and ereport(ERROR) to these functions in the
> 9.6 branch, so that at least you get a clean error and not some weird
> misbehavior.  Not sure if there's anything more we can do than that.

That's more-or-less what I was thinking (and suggested to David over IM
a little while ago, actually).  I don't know if there's an easy way to
do such a check, but I don't think it would really need to be
particularly cheap, just not overly complex.  These code paths are
certainly not ones that need to be high-performance.

Thanks!

Stephen

Вложения

Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup as parallel-restricted.

От
Robert Haas
Дата:
On Mon, Mar 6, 2017 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yes, I think it's rather silly not to do so.  We have made comparable
> backpatched fixes multiple times in the past.  What is worth discussing is
> whether there are *additional* things we ought to do in 9.6 to prevent
> misbehavior in installations initdb'd pre-9.6.3.
>
> If there's a cheap way of testing "AmInParallelWorker", I'd be in favor of
> adding a quick-n-dirty test and ereport(ERROR) to these functions in the
> 9.6 branch, so that at least you get a clean error and not some weird
> misbehavior.  Not sure if there's anything more we can do than that.

Sounds like you want IsParallelWorker().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup asparallel-restricted.

От
David Steele
Дата:
On 3/6/17 3:28 PM, Stephen Frost wrote:
> 
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> David Steele <david@pgmasters.net> writes:
>>> On 3/6/17 12:48 PM, Robert Haas wrote:
>>>> This issue also exists in 9.6, but we obviously can't do anything
>>>> about 9.6 clusters that already exist.  Possibly this could be
>>>> back-patched so that future 9.6 clusters would come out OK, or
>>>> possibly we should back-patch some other fix, but that would need more
>>>> discussion.
>>
>>> I think it would be worth back-patching the catalog fix for future 9.6
>>> clusters as a start.
>>
>> Yes, I think it's rather silly not to do so.  We have made comparable
>> backpatched fixes multiple times in the past.  What is worth discussing is
>> whether there are *additional* things we ought to do in 9.6 to prevent
>> misbehavior in installations initdb'd pre-9.6.3.
>>
>> If there's a cheap way of testing "AmInParallelWorker", I'd be in favor of
>> adding a quick-n-dirty test and ereport(ERROR) to these functions in the
>> 9.6 branch, so that at least you get a clean error and not some weird
>> misbehavior.  Not sure if there's anything more we can do than that.
> 
> That's more-or-less what I was thinking (and suggested to David over IM
> a little while ago, actually).  I don't know if there's an easy way to
> do such a check, but I don't think it would really need to be
> particularly cheap, just not overly complex.  These code paths are
> certainly not ones that need to be high-performance.

Way back when, I tried to get backups on 9.6 to fail due to 
pg_stop_backup() running in a parallel worker and I was not able to make 
it happen so I gave up and moved on to other things.

However, we just got a report from the field that a user ran into this 
exact situation:

ERROR: [057]: raised from remote-0 protocol on 'XX.XX.XXX.XXX': unable 
to execute query 'select lsn::text as lsn,
pg_catalog.pg_xlogfile_name(lsn)::text as wal_segment_name,
labelfile::text as backuplabel_file,
spcmapfile::text as tablespacemap_file
from pg_catalog.pg_stop_backup(false)'
ERROR: non-exclusive backup is not in progress
HINT: Did you mean to use pg_stop_backup('t')?
CONTEXT: parallel worker

https://github.com/pgbackrest/pgbackrest/issues/1083

So apparently it is possible. To get them working as soon as possible I 
recommended that they run:

alter role postgres set max_parallel_workers_per_gather = 0;

And that solved their problem. 9.6 is getting on in years so I'm not 
sure how much time/effort we want to spend on this, but I figured it was 
worth mentioning.

I did another round of trying to reproduce the issue but came up short a 
second time.

I'm willing to put together a patch for 9.6 to update the catalog and/or 
add the error if there is interest.

Thoughts?

Regards,
-- 
-David
david@pgmasters.net



Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup as parallel-restricted.

От
Tom Lane
Дата:
David Steele <david@pgmasters.net> writes:
> ... So apparently it is possible. To get them working as soon as possible I 
> recommended that they run:
> alter role postgres set max_parallel_workers_per_gather = 0;
> And that solved their problem. 9.6 is getting on in years so I'm not 
> sure how much time/effort we want to spend on this, but I figured it was 
> worth mentioning.

> I did another round of trying to reproduce the issue but came up short a 
> second time.

I was able to force it like this:

regression=# select pg_start_backup('foo', 'f', 'f');
 pg_start_backup 
-----------------
 0/18000028
(1 row)

regression=# set force_parallel_mode TO 'on';
SET
regression=# set max_parallel_workers_per_gather = 4;
SET
regression=# explain select * from pg_catalog.pg_stop_backup(false);
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Gather  (cost=1000.00..1000.11 rows=1 width=72)
   Workers Planned: 1
   Single Copy: true
   ->  Function Scan on pg_stop_backup  (cost=0.00..0.01 rows=1 width=72)
(4 rows)

regression=# select * from pg_catalog.pg_stop_backup(false);
ERROR:  non-exclusive backup is not in progress
HINT:  Did you mean to use pg_stop_backup('t')?
CONTEXT:  parallel worker

It doesn't seem terribly likely that anybody would be using
force_parallel_mode = on in production, but perhaps there's some
reasonable combination of the other parallelization planning GUCs
that can make this plan look attractive.

I'm kind of inclined not to bother with a code fix at this late date,
given that we've had so few trouble reports.  The right answer is
to tell anyone who's affected to fix their catalogs manually, viz

update pg_proc set proparallel = 'r' where proname = 'pg_start_backup';
update pg_proc set proparallel = 'r' where proname = 'pg_stop_backup';

If we had back-patched 9fe3c644a (sans catversion bump of course)
at the time, then initdb's done with 9.6.3 or later would have gotten
this right.  In hindsight, not doing that was clearly wrong.  But it
seems a bit late to do it now.

            regards, tom lane



Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup asparallel-restricted.

От
David Steele
Дата:
On 6/24/20 6:27 PM, Tom Lane wrote:
> David Steele <david@pgmasters.net> writes:
>> ... So apparently it is possible. To get them working as soon as possible I
>> recommended that they run:
>> alter role postgres set max_parallel_workers_per_gather = 0;
>> And that solved their problem. 9.6 is getting on in years so I'm not
>> sure how much time/effort we want to spend on this, but I figured it was
>> worth mentioning.
> 
>> I did another round of trying to reproduce the issue but came up short a
>> second time.
> 
> I was able to force it like this:
> 
> regression=# set force_parallel_mode TO 'on';
> SET

Ah, yes, that works. Now at least I can test it.

> It doesn't seem terribly likely that anybody would be using
> force_parallel_mode = on in production, but perhaps there's some
> reasonable combination of the other parallelization planning GUCs
> that can make this plan look attractive.

I'll also ask the user if they have this GUC enabled.

> I'm kind of inclined not to bother with a code fix at this late date,
> given that we've had so few trouble reports.  The right answer is
> to tell anyone who's affected to fix their catalogs manually, viz
> 
> update pg_proc set proparallel = 'r' where proname = 'pg_start_backup';
> update pg_proc set proparallel = 'r' where proname = 'pg_stop_backup';

Only pg_stop_backup() needs to be updated, but that's probably the best 
solution.

> If we had back-patched 9fe3c644a (sans catversion bump of course)
> at the time, then initdb's done with 9.6.3 or later would have gotten
> this right.  In hindsight, not doing that was clearly wrong.  But it
> seems a bit late to do it now.

OK by me.

Regards,
-- 
-David
david@pgmasters.net



Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup as parallel-restricted.

От
Magnus Hagander
Дата:


On Thu, Jun 25, 2020 at 2:11 PM David Steele <david@pgmasters.net> wrote:
On 6/24/20 6:27 PM, Tom Lane wrote:
> David Steele <david@pgmasters.net> writes:
>> ... So apparently it is possible. To get them working as soon as possible I
>> recommended that they run:
>> alter role postgres set max_parallel_workers_per_gather = 0;
>> And that solved their problem. 9.6 is getting on in years so I'm not
>> sure how much time/effort we want to spend on this, but I figured it was
>> worth mentioning.
>
>> I did another round of trying to reproduce the issue but came up short a
>> second time.
>
> I was able to force it like this:
>
> regression=# set force_parallel_mode TO 'on';
> SET

Ah, yes, that works. Now at least I can test it.

> It doesn't seem terribly likely that anybody would be using
> force_parallel_mode = on in production, but perhaps there's some
> reasonable combination of the other parallelization planning GUCs
> that can make this plan look attractive.

I'll also ask the user if they have this GUC enabled.

Maybe have pgbackrest issue an explicit SET force_parallel_mode=off when it runs against a 9.6? 


> I'm kind of inclined not to bother with a code fix at this late date,
> given that we've had so few trouble reports.  The right answer is
> to tell anyone who's affected to fix their catalogs manually, viz
>
> update pg_proc set proparallel = 'r' where proname = 'pg_start_backup';
> update pg_proc set proparallel = 'r' where proname = 'pg_stop_backup';

Only pg_stop_backup() needs to be updated, but that's probably the best
solution.

> If we had back-patched 9fe3c644a (sans catversion bump of course)
> at the time, then initdb's done with 9.6.3 or later would have gotten
> this right.  In hindsight, not doing that was clearly wrong.  But it
> seems a bit late to do it now.

OK by me.

+1. Seems infrequent enough or we would've heard much about it before. And it's not going to be *all* that common with brand new clusters on 9.6 these days (I hope).
 
--

Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup asparallel-restricted.

От
David Steele
Дата:
On 6/25/20 8:43 AM, Magnus Hagander wrote:
> On Thu, Jun 25, 2020 at 2:11 PM David Steele <david@pgmasters.net 
> <mailto:david@pgmasters.net>> wrote:
>     On 6/24/20 6:27 PM, Tom Lane wrote:
>      >
>      > I was able to force it like this:
>      >
>      > regression=# set force_parallel_mode TO 'on';
>      > SET
> 
>     Ah, yes, that works. Now at least I can test it.
> 
>      > It doesn't seem terribly likely that anybody would be using
>      > force_parallel_mode = on in production, but perhaps there's some
>      > reasonable combination of the other parallelization planning GUCs
>      > that can make this plan look attractive.
> 
>     I'll also ask the user if they have this GUC enabled.

The user confirmed they are running with force_parallel_mode=on so that 
probably explains why we have never seen this in the field before.

> Maybe have pgbackrest issue an explicit SET force_parallel_mode=off when 
> it runs against a 9.6?

According to the documentation the way to disable parallelism is:

set max_parallel_workers_per_gather = 0

So we added that to session initialization in pgBackRest:

https://github.com/pgbackrest/pgbackrest/commit/ea04ec7b3f4c6cf25c1b827c25c6a3c5896159a8

I'm worried that (as Tom said) the planner might find another reason to 
choose a parallel query.

I'm looking at this as more than a fix for 9.6. I can't see any reason 
for the backup control session to run queries in parallel and possibly 
use more resources, so parallelism is disabled for all versions that 
support it.

Regards,
-- 
-David
david@pgmasters.net



Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup as parallel-restricted.

От
Magnus Hagander
Дата:


On Thu, Jun 25, 2020 at 4:56 PM David Steele <david@pgmasters.net> wrote:
On 6/25/20 8:43 AM, Magnus Hagander wrote:
> On Thu, Jun 25, 2020 at 2:11 PM David Steele <david@pgmasters.net
> <mailto:david@pgmasters.net>> wrote:
>     On 6/24/20 6:27 PM, Tom Lane wrote:
>      >
>      > I was able to force it like this:
>      >
>      > regression=# set force_parallel_mode TO 'on';
>      > SET
>
>     Ah, yes, that works. Now at least I can test it.
>
>      > It doesn't seem terribly likely that anybody would be using
>      > force_parallel_mode = on in production, but perhaps there's some
>      > reasonable combination of the other parallelization planning GUCs
>      > that can make this plan look attractive.
>
>     I'll also ask the user if they have this GUC enabled.

The user confirmed they are running with force_parallel_mode=on so that
probably explains why we have never seen this in the field before.

That's good.


> Maybe have pgbackrest issue an explicit SET force_parallel_mode=off when
> it runs against a 9.6?

According to the documentation the way to disable parallelism is:

set max_parallel_workers_per_gather = 0

Interesting. I had somehow gotten into my head that force_parallel_mode would override that. But it doesn't.


So we added that to session initialization in pgBackRest:

https://github.com/pgbackrest/pgbackrest/commit/ea04ec7b3f4c6cf25c1b827c25c6a3c5896159a8

Personally I would've done it *just* for 9.6 and not for 10+, but that's mostly semantic :) But if you do it for 9.6 then *eventually* you will be able to retire it.



I'm worried that (as Tom said) the planner might find another reason to
choose a parallel query.

I'm looking at this as more than a fix for 9.6. I can't see any reason
for the backup control session to run queries in parallel and possibly
use more resources, so parallelism is disabled for all versions that
support it.

Right. But since the parameters are flagged as parallel restricted in 10+...

Or are you saying you're worried about other things, completely unrelated to start/stop backup, that the session might run? 

--

Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup as parallel-restricted.

От
Tom Lane
Дата:
David Steele <david@pgmasters.net> writes:
> I'm looking at this as more than a fix for 9.6. I can't see any reason 
> for the backup control session to run queries in parallel and possibly 
> use more resources, so parallelism is disabled for all versions that 
> support it.

Post-9.6, these functions are correctly marked as parallel-restricted,
so there really shouldn't be an issue.

            regards, tom lane



Re: [COMMITTERS] pgsql: Mark pg_start_backup and pg_stop_backup asparallel-restricted.

От
David Steele
Дата:
On 6/25/20 11:02 AM, Magnus Hagander wrote:
> On Thu, Jun 25, 2020 at 4:56 PM David Steele <david@pgmasters.net 
> <mailto:david@pgmasters.net>> wrote:
> 
>     So we added that to session initialization in pgBackRest:
> 
>     https://github.com/pgbackrest/pgbackrest/commit/ea04ec7b3f4c6cf25c1b827c25c6a3c5896159a8
> 
> Personally I would've done it *just* for 9.6 and not for 10+, but that's 
> mostly semantic :) But if you do it for 9.6 then *eventually* you will 
> be able to retire it.

We still support 8.3 so the day when we drop 9.6 support seems so far 
away that I'm just not worried about it.

And yes, we still see users with 8.3/8.4 databases. Mostly they just 
want to update but backups are an essential part of that process and 
some of them are multi-TB databases.

We are planning to drop 8.3 support soon, though, and just tell users to 
go back and use X version of pgbackrest if they really need it. This has 
more to do with the availability of packages than anything else. 
Christoph has built packages all the way back to 8.4 for all recent 
Debian distros (thanks Christoph!) but the only place we can get 8.3 
packages is on EOL distros, e.g. Ubuntu 12.04.

>     I'm worried that (as Tom said) the planner might find another reason to
>     choose a parallel query.
> 
>     I'm looking at this as more than a fix for 9.6. I can't see any reason
>     for the backup control session to run queries in parallel and possibly
>     use more resources, so parallelism is disabled for all versions that
>     support it.
> 
> Right. But since the parameters are flagged as parallel restricted in 10+...
> 
> Or are you saying you're worried about other things, completely 
> unrelated to start/stop backup, that the session might run?

That's what I'm worried about. We run other queries and functions 
besides pg_start_backup()/pg_stop_backup(). None of them need to be 
parallelized so why not just disable it? One less variable to worry about.

Regards,
-- 
-David
david@pgmasters.net