Обсуждение: pg_promote not marked as parallel-restricted in pg_proc.dat

Поиск
Список
Период
Сортировка

pg_promote not marked as parallel-restricted in pg_proc.dat

От
Michael Paquier
Дата:
Hi all,

It looks that I forgot to mark pg_promote as parallel-restricted in
1007465 in pg_proc.dat.  It seems to me that this is not a huge issue as
system_views.sql redefines the function for its default values and
enforces parallel-restricted, but let's be right from the start.

Attached is a patch to fix that.  Any comments or objections?  Please
note that the catalog version bump is included as a personal reminder..

Thanks,
--
Michael

Вложения

Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Laurenz Albe
Дата:
Michael Paquier wrote:
> It looks that I forgot to mark pg_promote as parallel-restricted in
> 1007465 in pg_proc.dat.  It seems to me that this is not a huge issue as
> system_views.sql redefines the function for its default values and
> enforces parallel-restricted, but let's be right from the start.
> 
> Attached is a patch to fix that.  Any comments or objections?

Hmm, I should have noticed that.

I think that the question if pg_promote allows a parallel plan or not
is mostly academic, but the two definitions should be kept in sync.

Yours,
Laurenz Albe



Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Michael Paquier
Дата:
On Mon, Oct 29, 2018 at 09:41:08AM +0100, Laurenz Albe wrote:
> I think that the question if pg_promote allows a parallel plan or not
> is mostly academic, but the two definitions should be kept in sync.

It seems to me that the presence of the trigger file written in $PGDATA
means that the function should be restricted.
--
Michael

Вложения

Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Laurenz Albe
Дата:
Michael Paquier wrote:
> On Mon, Oct 29, 2018 at 09:41:08AM +0100, Laurenz Albe wrote:
> > I think that the question if pg_promote allows a parallel plan or not
> > is mostly academic, but the two definitions should be kept in sync.
> 
> It seems to me that the presence of the trigger file written in $PGDATA
> means that the function should be restricted.

Yes, it should be PARALLEL RESTRICTED or PARALLEL UNSAFE, but it won't matter
much in practice which of the two we choose.

Yours,
Laurenz Albe



Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Tom Lane
Дата:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> Yes, it should be PARALLEL RESTRICTED or PARALLEL UNSAFE, but it won't matter
> much in practice which of the two we choose.

I'd vote for PARALLEL UNSAFE myself.  Otherwise you have to ask questions
about whether it's really safe to do this while parallel workers are doing
things.  Perhaps the answer is "yes", but what's the point of having to
verify that?

            regards, tom lane


Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Michael Paquier
Дата:
On Mon, Oct 29, 2018 at 10:08:45AM -0400, Tom Lane wrote:
> Laurenz Albe <laurenz.albe@cybertec.at> writes:
> I'd vote for PARALLEL UNSAFE myself.  Otherwise you have to ask questions
> about whether it's really safe to do this while parallel workers are doing
> things.  Perhaps the answer is "yes", but what's the point of having to
> verify that?

All the backup-related functions doing on-disk activity are marked as
parallel-restricted:
=# select proparallel, proname from pg_proc where proname ~ 'backup';
 proparallel |       proname
-------------+----------------------
 s           | pg_is_in_backup
 s           | pg_backup_start_time
 r           | pg_stop_backup
 r           | pg_start_backup
 r           | pg_stop_backup
(5 rows)

As restricted, only the group leader is allowed to execute the
function.  So as long as we enforce the execution uniqueness, I don't
think that there is any point in enforcing a serial scan for the whole
query.  Queries launching pg_promote are unlikely going to be much
complicated, still it does not seem a consistent experience to me to not
use the same level for such operations.
--
Michael

Вложения

Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Robert Haas
Дата:
On Mon, Oct 29, 2018 at 6:48 PM Michael Paquier <michael@paquier.xyz> wrote:
> All the backup-related functions doing on-disk activity are marked as
> parallel-restricted:
> =# select proparallel, proname from pg_proc where proname ~ 'backup';
>  proparallel |       proname
> -------------+----------------------
>  s           | pg_is_in_backup
>  s           | pg_backup_start_time
>  r           | pg_stop_backup
>  r           | pg_start_backup
>  r           | pg_stop_backup
> (5 rows)
>
> As restricted, only the group leader is allowed to execute the
> function.  So as long as we enforce the execution uniqueness, I don't
> think that there is any point in enforcing a serial scan for the whole
> query.  Queries launching pg_promote are unlikely going to be much
> complicated, still it does not seem a consistent experience to me to not
> use the same level for such operations.

There's no rule whatsoever that a parallel worker can't write to the
disk.  pg_start_backup and pg_stop_backup have to be
parallel-restricted because, when used in non-exclusive mode, they
establish backend-local state that wouldn't be synchronized with the
state in the workers -- namely the information that a non-exclusive
backup is in progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Michael Paquier
Дата:
On Wed, Oct 31, 2018 at 01:09:53PM -0400, Robert Haas wrote:
> There's no rule whatsoever that a parallel worker can't write to the
> disk.  pg_start_backup and pg_stop_backup have to be
> parallel-restricted because, when used in non-exclusive mode, they
> establish backend-local state that wouldn't be synchronized with the
> state in the workers -- namely the information that a non-exclusive
> backup is in progress.

Okay, but likely we would not want to signal the postmaster
unnecessarily, no?  FALLBACK_PROMOTE_SIGNAL_FILE gets discarded if
promotion is triggered more than once, but that does not like a sane
thing to do if not necessary.

As far as I understand, there has been some input on this thread:
- I would prefer marking the function as parallel-restricted.
- Tom would make it parallel-unsafe.
- Laurenz (author of the feature) is fine with restricted or unsafe.

It is a bit hard to make a decision from that :)
--
Michael

Вложения

Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Amit Kapila
Дата:
On Thu, Nov 1, 2018 at 6:14 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Oct 31, 2018 at 01:09:53PM -0400, Robert Haas wrote:
> > There's no rule whatsoever that a parallel worker can't write to the
> > disk.  pg_start_backup and pg_stop_backup have to be
> > parallel-restricted because, when used in non-exclusive mode, they
> > establish backend-local state that wouldn't be synchronized with the
> > state in the workers -- namely the information that a non-exclusive
> > backup is in progress.
>
> Okay, but likely we would not want to signal the postmaster
> unnecessarily, no?

Right, but I think the question here is whether it is safe to execute
this function in parallel workers?  I don't see any meaningful use
cases where anyone wants to run this via parallel workers even if it
is safe to execute via them, but I think that is not how we decide
parallel-safe property of any functions.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Robert Haas
Дата:
On Wed, Oct 31, 2018 at 8:43 PM Michael Paquier <michael@paquier.xyz> wrote:
> Okay, but likely we would not want to signal the postmaster
> unnecessarily, no?  FALLBACK_PROMOTE_SIGNAL_FILE gets discarded if
> promotion is triggered more than once, but that does not like a sane
> thing to do if not necessary.

Uh, what in the world does that have to do with the topic at hand?
Parallel-safety has nothing to do with preventing functions from being
called "unnecessarily" or "more than once".

I don't like the fact that Tom keeps arguing for marking things
parallel-unsafe without any real justification.  During the
development of parallel query, he argued that no such concept should
exist, because everything should be work in parallel mode just like it
does otherwise.  I eventually convinced him (or so it seemed) that we
had to have the concept because it was impractical to eliminate all
the corner cases where things were parallel-unsafe in the short/medium
term.  However, in the long term, the fact that we have
parallel-restricted and parallel-unsafe is an implementation
restriction that we should be trying to eliminate.  Making the list of
parallel-unsafe things longer makes that goal harder to achieve,
especially when the marking is applied not out of any real
justification based on what the code does, as was my intent, but based
on an ill-defined feeling of unease.

Eventually, after some future round of infrastructure improvements,
we're going to end up having discussions about changing things that
are marked parallel-unsafe or parallel-restricted to parallel-safe,
and when somebody can't find any reason why the existing markings are
like that in the first place, they're going to use that as
justification for not changing them ("there must've been a reason!").
Parallel-safety shouldn't be some kind of quasi-religious thing where
markings bleed from a function that is properly marked to one with a
similar name, or irrational values are applied in the name of
preventing unspecified or only-vaguely-connected evils.  It should be
based on things that can be determined scientifically, like "hey, does
this code access any global variables other than those which are
automatically synchronized between the master and the workers?" and
"hey, does this code ever attempt
heap_insert/heap_update/heap_delete?" and "hey, does this ever call
other user-defined code that might itself be parallel-unsafe?".

The right way to figure out the appropriate parallel-safety marking
for a function is to read the code and see what it does.  Now you
might miss something, as with any code-reading exercise, but if you
don't, then you should come up with the right answer.  In the case of
pg_promote, it calls out to the following functions:

RecoveryInProgress - checks shared memory state.  equally available to
a worker as to the leader.
AllocateFile/FreeFile - no problem here; these work fine in workers
just like any other backend.
kill, unlink - same thing
ResetLatch, WaitLatch, CHECK_FOR_INTERRUPTS() - all fine in a worker;
in fact, used as part of the worker machinery
ereport - works anywhere, worker machinery propagates errors back to leader

That's all.  There are no obvious references to global variables or
anything here.  So, it looks to be parallel-safe.

If you do the same analysis for pg_start_backup(), you'll immediately
notice that it calls get_backup_status(), and if you look at that
function, you'll see that it returns the value of a global variable.
If you check parallel.c, you'll find that that global variable is not
one of the special ones that gets synchronized between a leader and
the workers.  Therefore, if you ran this function in a worker, it
might do the wrong thing, because it might get the wrong value of that
global variable.  So it's at least (and in fact exactly)
parallel-restricted.  It doesn't need to be parallel-restricted
because it's scary in some ill-defined way or any of these other
reasons advanced on this thread -- it needs to be parallel-restricted
because it *relies on a global variable that is not synchronized to
parallel workers*.  If somebody writes code to move that state out of
a global variables and into the main shared memory segment or to a DSM
shared between the leader and the workers, then it can be marked
parallel-safe (unless, for example, it also depends on OTHER global
variables that are NOT moved into shared storage).  Otherwise not.

I hope I'm making sense here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Michael Paquier
Дата:
On Thu, Nov 01, 2018 at 12:59:47PM -0400, Robert Haas wrote:
> If you do the same analysis for pg_start_backup(), you'll immediately
> notice that it calls get_backup_status(), and if you look at that
> function, you'll see that it returns the value of a global variable.
> If you check parallel.c, you'll find that that global variable is not
> one of the special ones that gets synchronized between a leader and
> the workers.  Therefore, if you ran this function in a worker, it
> might do the wrong thing, because it might get the wrong value of that
> global variable.  So it's at least (and in fact exactly)
> parallel-restricted.  It doesn't need to be parallel-restricted
> because it's scary in some ill-defined way or any of these other
> reasons advanced on this thread -- it needs to be parallel-restricted
> because it *relies on a global variable that is not synchronized to
> parallel workers*.  If somebody writes code to move that state out of
> a global variables and into the main shared memory segment or to a DSM
> shared between the leader and the workers, then it can be marked
> parallel-safe (unless, for example, it also depends on OTHER global
> variables that are NOT moved into shared storage).  Otherwise not.

My name is all over the commit logs for this area, so I kind of heard
about what those things rely on..  Both exclusive and non-exclusive
backup interfaces are definitely unsafe to run across multiple processes
in the same query.  Well, unsafe is incorrect for the pg_proc
definition, that's restricted :)

> I hope I'm making sense here.

You actually do a lot, moving just one person with MP as initials to
consider moving the function as being parallel-safe.  Thanks for the
points you raised, what needs to be done looks clear now.
--
Michael

Вложения

Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Michael Paquier
Дата:
On Fri, Nov 02, 2018 at 09:27:39AM +0900, Michael Paquier wrote:
> You actually do a lot, moving just one person with MP as initials to
> consider moving the function as being parallel-safe.  Thanks for the
> points you raised, what needs to be done looks clear now.

So anybody has an objection with marking the function as parallel-safe?
I'd like to do so if that's not the case and close the case.

What has been raised on this thread is more than I hoped first.  Thanks
Amit and Robert for the additional input!
--
Michael

Вложения

Re: pg_promote not marked as parallel-restricted in pg_proc.dat

От
Michael Paquier
Дата:
On Sat, Nov 03, 2018 at 08:02:36AM +0900, Michael Paquier wrote:
> So anybody has an objection with marking the function as parallel-safe?
> I'd like to do so if that's not the case and close the case.

And committed.
--
Michael

Вложения