Обсуждение: Interrupts vs signals
Hi, I wonder if we really need signals to implement interrupts. Given that they are non-preemptive/cooperative (work happens at the next CFI()), why not just use shared memory flags and latches? That skips a bunch of code, global variables and scary warnings about programming in signal handlers. I sketched out some code to try that a few months back, while speculating about bite-sized subproblems that would come up if each backend is, one day, a thread. There are several other conditions that are also handled by CHECK_FOR_INTERRUPTS(), but are not triggered by other backends sending signals, or are set by other signal handlers (SIGALRM, SIGQUIT). One idea is to convert those into "procsignals" too, for consistency. In the attached, they can be set (ie by the same backend) with ProcSignalRaise(), but it's possible that in future we might have a reason for another backend to set them too, so it seems like a good idea to have a single system, effectively merging the concepts of "procsignals" and "interrupts". There are still a few more ad hoc (non-ProcSignal) uses of SIGUSR1 in the tree. For one thing, we don't allow the postmaster to set latches; if we gave up on that rule, we wouldn't need the bgworker please-signal-me thing. Also the logical replication launcher does the same sort of thing for no apparent reason. Changed in the attached -- mainly so I could demonstrate that check-world passes with SIGUSR1 ignored. The attached is only experiment grade code: in particular, I didn't quite untangle the recovery conflict flags properly. It's also doing function calls where some kind of fast inlined magic is probably required, and I probably have a few other details wrong, but I figured it was good enough to demonstrate the concept.
Вложения
Hi, On 2021-10-21 07:55:54 +1300, Thomas Munro wrote: > I wonder if we really need signals to implement interrupts. Given > that they are non-preemptive/cooperative (work happens at the next > CFI()), why not just use shared memory flags and latches? That skips > a bunch of code, global variables and scary warnings about programming > in signal handlers. Depending on how you implement them, one difference could be whether / when "slow" system calls (recv, poll, etc) are interrupted. Another is that that signal handling provides a memory barrier in the receiving process. For things that rarely change (like most interrupts), it can be more efficient to move the cost of that out-of-line, instead of incurring them on every check. One nice thing of putting the state variables into shared memory would be that that'd allow to see the pending interrupts of other backends for debugging purposes. > One idea is to convert those into "procsignals" too, for > consistency. In the attached, they can be set (ie by the same > backend) with ProcSignalRaise(), but it's possible that in future we > might have a reason for another backend to set them too, so it seems > like a good idea to have a single system, effectively merging the > concepts of "procsignals" and "interrupts". This seems a bit confusing to me. For one, we need to have interrupts working before we have a proc, IIRC. But leaving details like that aside, it just seems a bit backwards to me. I'm on board with other backends directly setting interrupt flags, but it seems to me that the procsignal stuff should be "client" of the process-local interrupt infrastructure, rather than the other way round. > +bool > +ProcSignalAnyPending(void) > +{ > + volatile ProcSignalSlot *slot = MyProcSignalSlot; > > - if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) > - RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); > + /* XXX make this static inline? */ > + /* XXX point to a dummy entry instead of using NULL to avoid a branch */ > + return slot && slot->pss_signaled; > +} ISTM it might be easier to make this stuff efficiently race-free if we made this a count of pending operations. > @@ -3131,12 +3124,13 @@ ProcessInterrupts(void) > /* OK to accept any interrupts now? */ > if (InterruptHoldoffCount != 0 || CritSectionCount != 0) > return; > - InterruptPending = false; > + ProcSignalClearAnyPending(); > + > + pg_read_barrier(); > > - if (ProcDiePending) > + if (ProcSignalConsume(PROCSIG_DIE)) > { I think making all of these checks into function calls isn't great. How about making the set of pending signals a bitmask? That'd allow to efficiently check a bunch of interrupts together and even where not, it'd just be a single test of the mask, likely already in a register. Greetings, Andres Freund
On Thu, Oct 21, 2021 at 8:27 AM Andres Freund <andres@anarazel.de> wrote: > On 2021-10-21 07:55:54 +1300, Thomas Munro wrote: > > I wonder if we really need signals to implement interrupts. Given > > that they are non-preemptive/cooperative (work happens at the next > > CFI()), why not just use shared memory flags and latches? That skips > > a bunch of code, global variables and scary warnings about programming > > in signal handlers. > > Depending on how you implement them, one difference could be whether / when > "slow" system calls (recv, poll, etc) are interrupted. Hopefully by now all such waits are implemented with latch.c facilities? > Another is that that signal handling provides a memory barrier in the > receiving process. For things that rarely change (like most interrupts), it > can be more efficient to move the cost of that out-of-line, instead of > incurring them on every check. Agreed, but in this experiment I was trying out the idea that a memory barrier is not really needed at all, unless you're about to go to sleep. We already insert one of those before a latch wait. That is, if we see !set->latch->is_set, we do pg_memory_barrier() and check again, before sleeping, so your next CFI must see the flag. For computation loops (sort, hash, query execution, ...), I speculate that a relaxed read of memory is fine... you'll see the flag pretty soon, and you certainly won't be allowed to finish your computation and go to sleep. > One nice thing of putting the state variables into shared memory would be that > that'd allow to see the pending interrupts of other backends for debugging > purposes. +1 > > One idea is to convert those into "procsignals" too, for > > consistency. In the attached, they can be set (ie by the same > > backend) with ProcSignalRaise(), but it's possible that in future we > > might have a reason for another backend to set them too, so it seems > > like a good idea to have a single system, effectively merging the > > concepts of "procsignals" and "interrupts". > > This seems a bit confusing to me. For one, we need to have interrupts working > before we have a proc, IIRC. But leaving details like that aside, it just > seems a bit backwards to me. I'm on board with other backends directly setting > interrupt flags, but it seems to me that the procsignal stuff should be > "client" of the process-local interrupt infrastructure, rather than the other > way round. Hmm. Yeah, I see your point. But I can also think of some arguments for merging the concepts of local and shared interrupts; see below. In this new sketch, I tried doing it the other way around. That is, completely removing the concept of "ProcSignal", leaving only "Interrupts". Initially, MyPendingInterrupts points to something private, and once you're connected to shared memory it points to MyProc->pending_interrupts. > > +bool > > +ProcSignalAnyPending(void) > > +{ > > + volatile ProcSignalSlot *slot = MyProcSignalSlot; > > > > - if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN)) > > - RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN); > > + /* XXX make this static inline? */ > > + /* XXX point to a dummy entry instead of using NULL to avoid a branch */ > > + return slot && slot->pss_signaled; > > +} > > ISTM it might be easier to make this stuff efficiently race-free if we made > this a count of pending operations. Hmm, with a unified interrupt system and a bitmap it's not necessary to have a separate flag/counter at all. > > @@ -3131,12 +3124,13 @@ ProcessInterrupts(void) > > /* OK to accept any interrupts now? */ > > if (InterruptHoldoffCount != 0 || CritSectionCount != 0) > > return; > > - InterruptPending = false; > > + ProcSignalClearAnyPending(); > > + > > + pg_read_barrier(); > > > > - if (ProcDiePending) > > + if (ProcSignalConsume(PROCSIG_DIE)) > > { > > I think making all of these checks into function calls isn't great. How about > making the set of pending signals a bitmask? That'd allow to efficiently check > a bunch of interrupts together and even where not, it'd just be a single test > of the mask, likely already in a register. +1. Some assorted notes: 1. Aside from doing interrupts in this new way, I also have the postmaster setting latches (!) instead of sending ad hoc SIGUSR1 here and there. My main reason for doing that was to be able to chase out all reasons to register a SIGUSR1 handler, so I could prove that check-world passes. I like it, though. But maybe it's really a separate topic. 2. I moved this stuff into interrupt.{h,c}. There is nothing left in procsignal.c except code relating to ProcSignalBarrier. I guess that thing could use another name, anyway. It's a ... SystemInterruptBarrier? 3. Child-level SIGINT and SIGTERM handlers probably aren't really necessary, either: maybe the sender could do InterruptSend(INTERRUPT_{DIE,QUERY_CANCEL}, pgprocno) instead? But perhaps people are attached to being able to send those signals from external programs directly to backends. 4. Like the above, a SIGALRM handler might need to do eg InterruptRaise(INTERRUPT_STATEMENT_TIMEOUT). That's a problem for systems using spinlocks (self-deadlock against user context in InterruptRaise()), so I'd need to come up with some flag protocol for dinosaurs to make that safe, OR revert to having these "local only" interrupts done with separate flags, as you were getting at earlier. 5. The reason I prefer to put currently "local only" interrupts into the same atomic system is that I speculate that ultimately all of the backend-level signal handlers won't be needed. They all fall into three categories: (1) could be replaced with these interrupts directly, (2) could be replaced by the new timer infrastructure that multithreaded postgres would need to have to deliver interrupts to the right recipients, (3) are quickdie and can be handled at the containing process level. Then the only signal handlers left are top level external ones. But perhaps you're right and I should try reintroducing separate local interrupts for now. I dunno, I like the simplicity of the unified system; if only it weren't for those spinlock-backed atomics.
Вложения
On Thu, Nov 11, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > Depending on how you implement them, one difference could be whether / when > > "slow" system calls (recv, poll, etc) are interrupted. > > Hopefully by now all such waits are implemented with latch.c facilities? Do read(), write(), etc. count? Because we certainly have raw calls to those functions in lots of places. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2021-11-11 09:06:01 -0500, Robert Haas wrote: > On Thu, Nov 11, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > Depending on how you implement them, one difference could be whether / when > > > "slow" system calls (recv, poll, etc) are interrupted. > > > > Hopefully by now all such waits are implemented with latch.c facilities? > > Do read(), write(), etc. count? Because we certainly have raw calls to > those functions in lots of places. They can count, but only when used for network sockets or pipes ("slow devices" or whatever the posix language is). Disk IO doesn't count as that. So I don't think it'd be a huge issue. Greetings, Andres Freund
On Thu, Nov 11, 2021 at 2:50 PM Andres Freund <andres@anarazel.de> wrote: > They can count, but only when used for network sockets or pipes ("slow > devices" or whatever the posix language is). Disk IO doesn't count as that. So > I don't think it'd be a huge issue. Somehow the idea that the network is a slow device and the disk a fast one does not seem like it's necessarily accurate on modern hardware, but I guess the spec is what it is. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Nov 12, 2021 at 9:24 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Nov 11, 2021 at 2:50 PM Andres Freund <andres@anarazel.de> wrote: > > They can count, but only when used for network sockets or pipes ("slow > > devices" or whatever the posix language is). Disk IO doesn't count as that. So > > I don't think it'd be a huge issue. > > Somehow the idea that the network is a slow device and the disk a fast > one does not seem like it's necessarily accurate on modern hardware, > but I guess the spec is what it is. [Somehow I managed to reply to Robert only; let me try that again, this time to the list...] Network filesystems have in the past been confusing because they're both disk-like and network-like, and also slow as !@#$, which is why there have been mount point options like "intr", "nointr" (now ignored on Linux) to control what happens if you receive an async signal during a sleepy read/write. But even if you had some kind of Deathstation 9000 that had a switch on the front panel that ignores SA_RESTART and produces EINTR for disk I/O when a signal arrives, PostgreSQL already doesn't work today. Our pread() and pwrite() paths for data and WAL don't not have a EINTR loops or CHECK_FOR_INTERRUPTS() (we just can't take interrupts in the middle of eg a synchronous write), so I think we'd produce an ERROR or PANIC. So I think disk I/O is irrelevant, and network/pipe I/O is already handled everywhere via latch.c facilities. If there are any eg blocking reads on a socket in PostgreSQL, we should fix that to use latch.c non-blocking techniques, because such a place is already a place that ignores postmaster death and interrupts. To be more precise: such a place could of course wake up for EINTR on SIGUSR1 from procsignal.c, and that would no longer happen with my patch, but if we're relying on that anywhere, it's dangerous and unreliable. If SIGUSR1 is delivered right before you enter a blocking read(), you'll sleep waiting for the socket or whatever. That's precisely the problem that latch.c solves, and why it's already a bug if there are such places.