Обсуждение: pg_promote not marked as parallel-restricted in pg_proc.dat
Hi all, It looks that I forgot to mark pg_promote as parallel-restricted in 1007465 in pg_proc.dat. It seems to me that this is not a huge issue as system_views.sql redefines the function for its default values and enforces parallel-restricted, but let's be right from the start. Attached is a patch to fix that. Any comments or objections? Please note that the catalog version bump is included as a personal reminder.. Thanks, -- Michael
Вложения
Michael Paquier wrote: > It looks that I forgot to mark pg_promote as parallel-restricted in > 1007465 in pg_proc.dat. It seems to me that this is not a huge issue as > system_views.sql redefines the function for its default values and > enforces parallel-restricted, but let's be right from the start. > > Attached is a patch to fix that. Any comments or objections? Hmm, I should have noticed that. I think that the question if pg_promote allows a parallel plan or not is mostly academic, but the two definitions should be kept in sync. Yours, Laurenz Albe
On Mon, Oct 29, 2018 at 09:41:08AM +0100, Laurenz Albe wrote: > I think that the question if pg_promote allows a parallel plan or not > is mostly academic, but the two definitions should be kept in sync. It seems to me that the presence of the trigger file written in $PGDATA means that the function should be restricted. -- Michael
Вложения
Michael Paquier wrote: > On Mon, Oct 29, 2018 at 09:41:08AM +0100, Laurenz Albe wrote: > > I think that the question if pg_promote allows a parallel plan or not > > is mostly academic, but the two definitions should be kept in sync. > > It seems to me that the presence of the trigger file written in $PGDATA > means that the function should be restricted. Yes, it should be PARALLEL RESTRICTED or PARALLEL UNSAFE, but it won't matter much in practice which of the two we choose. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > Yes, it should be PARALLEL RESTRICTED or PARALLEL UNSAFE, but it won't matter > much in practice which of the two we choose. I'd vote for PARALLEL UNSAFE myself. Otherwise you have to ask questions about whether it's really safe to do this while parallel workers are doing things. Perhaps the answer is "yes", but what's the point of having to verify that? regards, tom lane
On Mon, Oct 29, 2018 at 10:08:45AM -0400, Tom Lane wrote: > Laurenz Albe <laurenz.albe@cybertec.at> writes: > I'd vote for PARALLEL UNSAFE myself. Otherwise you have to ask questions > about whether it's really safe to do this while parallel workers are doing > things. Perhaps the answer is "yes", but what's the point of having to > verify that? All the backup-related functions doing on-disk activity are marked as parallel-restricted: =# select proparallel, proname from pg_proc where proname ~ 'backup'; proparallel | proname -------------+---------------------- s | pg_is_in_backup s | pg_backup_start_time r | pg_stop_backup r | pg_start_backup r | pg_stop_backup (5 rows) As restricted, only the group leader is allowed to execute the function. So as long as we enforce the execution uniqueness, I don't think that there is any point in enforcing a serial scan for the whole query. Queries launching pg_promote are unlikely going to be much complicated, still it does not seem a consistent experience to me to not use the same level for such operations. -- Michael
Вложения
On Mon, Oct 29, 2018 at 6:48 PM Michael Paquier <michael@paquier.xyz> wrote: > All the backup-related functions doing on-disk activity are marked as > parallel-restricted: > =# select proparallel, proname from pg_proc where proname ~ 'backup'; > proparallel | proname > -------------+---------------------- > s | pg_is_in_backup > s | pg_backup_start_time > r | pg_stop_backup > r | pg_start_backup > r | pg_stop_backup > (5 rows) > > As restricted, only the group leader is allowed to execute the > function. So as long as we enforce the execution uniqueness, I don't > think that there is any point in enforcing a serial scan for the whole > query. Queries launching pg_promote are unlikely going to be much > complicated, still it does not seem a consistent experience to me to not > use the same level for such operations. There's no rule whatsoever that a parallel worker can't write to the disk. pg_start_backup and pg_stop_backup have to be parallel-restricted because, when used in non-exclusive mode, they establish backend-local state that wouldn't be synchronized with the state in the workers -- namely the information that a non-exclusive backup is in progress. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 31, 2018 at 01:09:53PM -0400, Robert Haas wrote: > There's no rule whatsoever that a parallel worker can't write to the > disk. pg_start_backup and pg_stop_backup have to be > parallel-restricted because, when used in non-exclusive mode, they > establish backend-local state that wouldn't be synchronized with the > state in the workers -- namely the information that a non-exclusive > backup is in progress. Okay, but likely we would not want to signal the postmaster unnecessarily, no? FALLBACK_PROMOTE_SIGNAL_FILE gets discarded if promotion is triggered more than once, but that does not like a sane thing to do if not necessary. As far as I understand, there has been some input on this thread: - I would prefer marking the function as parallel-restricted. - Tom would make it parallel-unsafe. - Laurenz (author of the feature) is fine with restricted or unsafe. It is a bit hard to make a decision from that :) -- Michael
Вложения
On Thu, Nov 1, 2018 at 6:14 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Oct 31, 2018 at 01:09:53PM -0400, Robert Haas wrote: > > There's no rule whatsoever that a parallel worker can't write to the > > disk. pg_start_backup and pg_stop_backup have to be > > parallel-restricted because, when used in non-exclusive mode, they > > establish backend-local state that wouldn't be synchronized with the > > state in the workers -- namely the information that a non-exclusive > > backup is in progress. > > Okay, but likely we would not want to signal the postmaster > unnecessarily, no? Right, but I think the question here is whether it is safe to execute this function in parallel workers? I don't see any meaningful use cases where anyone wants to run this via parallel workers even if it is safe to execute via them, but I think that is not how we decide parallel-safe property of any functions. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Oct 31, 2018 at 8:43 PM Michael Paquier <michael@paquier.xyz> wrote: > Okay, but likely we would not want to signal the postmaster > unnecessarily, no? FALLBACK_PROMOTE_SIGNAL_FILE gets discarded if > promotion is triggered more than once, but that does not like a sane > thing to do if not necessary. Uh, what in the world does that have to do with the topic at hand? Parallel-safety has nothing to do with preventing functions from being called "unnecessarily" or "more than once". I don't like the fact that Tom keeps arguing for marking things parallel-unsafe without any real justification. During the development of parallel query, he argued that no such concept should exist, because everything should be work in parallel mode just like it does otherwise. I eventually convinced him (or so it seemed) that we had to have the concept because it was impractical to eliminate all the corner cases where things were parallel-unsafe in the short/medium term. However, in the long term, the fact that we have parallel-restricted and parallel-unsafe is an implementation restriction that we should be trying to eliminate. Making the list of parallel-unsafe things longer makes that goal harder to achieve, especially when the marking is applied not out of any real justification based on what the code does, as was my intent, but based on an ill-defined feeling of unease. Eventually, after some future round of infrastructure improvements, we're going to end up having discussions about changing things that are marked parallel-unsafe or parallel-restricted to parallel-safe, and when somebody can't find any reason why the existing markings are like that in the first place, they're going to use that as justification for not changing them ("there must've been a reason!"). Parallel-safety shouldn't be some kind of quasi-religious thing where markings bleed from a function that is properly marked to one with a similar name, or irrational values are applied in the name of preventing unspecified or only-vaguely-connected evils. It should be based on things that can be determined scientifically, like "hey, does this code access any global variables other than those which are automatically synchronized between the master and the workers?" and "hey, does this code ever attempt heap_insert/heap_update/heap_delete?" and "hey, does this ever call other user-defined code that might itself be parallel-unsafe?". The right way to figure out the appropriate parallel-safety marking for a function is to read the code and see what it does. Now you might miss something, as with any code-reading exercise, but if you don't, then you should come up with the right answer. In the case of pg_promote, it calls out to the following functions: RecoveryInProgress - checks shared memory state. equally available to a worker as to the leader. AllocateFile/FreeFile - no problem here; these work fine in workers just like any other backend. kill, unlink - same thing ResetLatch, WaitLatch, CHECK_FOR_INTERRUPTS() - all fine in a worker; in fact, used as part of the worker machinery ereport - works anywhere, worker machinery propagates errors back to leader That's all. There are no obvious references to global variables or anything here. So, it looks to be parallel-safe. If you do the same analysis for pg_start_backup(), you'll immediately notice that it calls get_backup_status(), and if you look at that function, you'll see that it returns the value of a global variable. If you check parallel.c, you'll find that that global variable is not one of the special ones that gets synchronized between a leader and the workers. Therefore, if you ran this function in a worker, it might do the wrong thing, because it might get the wrong value of that global variable. So it's at least (and in fact exactly) parallel-restricted. It doesn't need to be parallel-restricted because it's scary in some ill-defined way or any of these other reasons advanced on this thread -- it needs to be parallel-restricted because it *relies on a global variable that is not synchronized to parallel workers*. If somebody writes code to move that state out of a global variables and into the main shared memory segment or to a DSM shared between the leader and the workers, then it can be marked parallel-safe (unless, for example, it also depends on OTHER global variables that are NOT moved into shared storage). Otherwise not. I hope I'm making sense here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 01, 2018 at 12:59:47PM -0400, Robert Haas wrote: > If you do the same analysis for pg_start_backup(), you'll immediately > notice that it calls get_backup_status(), and if you look at that > function, you'll see that it returns the value of a global variable. > If you check parallel.c, you'll find that that global variable is not > one of the special ones that gets synchronized between a leader and > the workers. Therefore, if you ran this function in a worker, it > might do the wrong thing, because it might get the wrong value of that > global variable. So it's at least (and in fact exactly) > parallel-restricted. It doesn't need to be parallel-restricted > because it's scary in some ill-defined way or any of these other > reasons advanced on this thread -- it needs to be parallel-restricted > because it *relies on a global variable that is not synchronized to > parallel workers*. If somebody writes code to move that state out of > a global variables and into the main shared memory segment or to a DSM > shared between the leader and the workers, then it can be marked > parallel-safe (unless, for example, it also depends on OTHER global > variables that are NOT moved into shared storage). Otherwise not. My name is all over the commit logs for this area, so I kind of heard about what those things rely on.. Both exclusive and non-exclusive backup interfaces are definitely unsafe to run across multiple processes in the same query. Well, unsafe is incorrect for the pg_proc definition, that's restricted :) > I hope I'm making sense here. You actually do a lot, moving just one person with MP as initials to consider moving the function as being parallel-safe. Thanks for the points you raised, what needs to be done looks clear now. -- Michael
Вложения
On Fri, Nov 02, 2018 at 09:27:39AM +0900, Michael Paquier wrote: > You actually do a lot, moving just one person with MP as initials to > consider moving the function as being parallel-safe. Thanks for the > points you raised, what needs to be done looks clear now. So anybody has an objection with marking the function as parallel-safe? I'd like to do so if that's not the case and close the case. What has been raised on this thread is more than I hoped first. Thanks Amit and Robert for the additional input! -- Michael
Вложения
On Sat, Nov 03, 2018 at 08:02:36AM +0900, Michael Paquier wrote: > So anybody has an objection with marking the function as parallel-safe? > I'd like to do so if that's not the case and close the case. And committed. -- Michael