Обсуждение: Interrupts vs signals

Поиск

Список

Период

Сортировка

Interrupts vs signals

От

Thomas Munro

Дата:

20 октября 2021 г., 18:55:54

Hi,

I wonder if we really need signals to implement interrupts.  Given
that they are non-preemptive/cooperative (work happens at the next
CFI()), why not just use shared memory flags and latches?  That skips
a bunch of code, global variables and scary warnings about programming
in signal handlers.

I sketched out some code to try that a few months back, while
speculating about bite-sized subproblems that would come up if each
backend is, one day, a thread.

There are several other conditions that are also handled by
CHECK_FOR_INTERRUPTS(), but are not triggered by other backends
sending  signals, or are set by other signal handlers (SIGALRM,
SIGQUIT).  One idea is to convert those into "procsignals" too, for
consistency.  In the attached, they can be set (ie by the same
backend) with ProcSignalRaise(), but it's possible that in future we
might have a reason for another backend to set them too, so it seems
like a good idea to have a single system, effectively merging the
concepts of "procsignals" and "interrupts".

There are still a few more ad hoc (non-ProcSignal) uses of SIGUSR1 in
the tree.  For one thing, we don't allow the postmaster to set
latches; if we gave up on that rule, we wouldn't need the bgworker
please-signal-me thing.  Also the logical replication launcher does
the same sort of thing for no apparent reason.  Changed in the
attached -- mainly so I could demonstrate that check-world passes with
SIGUSR1 ignored.

The attached is only experiment grade code: in particular, I didn't
quite untangle the recovery conflict flags properly.  It's also doing
function calls where some kind of fast inlined magic is probably
required, and I probably have a few other details wrong, but I figured
it was good enough to demonstrate the concept.

Вложения

0001-WIP-Refactor-procsignals-and-interrupts.patch

Re: Interrupts vs signals

От

Andres Freund

Дата:

20 октября 2021 г., 19:27:48

Hi,

On 2021-10-21 07:55:54 +1300, Thomas Munro wrote:
> I wonder if we really need signals to implement interrupts.  Given
> that they are non-preemptive/cooperative (work happens at the next
> CFI()), why not just use shared memory flags and latches?  That skips
> a bunch of code, global variables and scary warnings about programming
> in signal handlers.

Depending on how you implement them, one difference could be whether / when
"slow" system calls (recv, poll, etc) are interrupted.

Another is that that signal handling provides a memory barrier in the
receiving process. For things that rarely change (like most interrupts), it
can be more efficient to move the cost of that out-of-line, instead of
incurring them on every check.

One nice thing of putting the state variables into shared memory would be that
that'd allow to see the pending interrupts of other backends for debugging
purposes.

> One idea is to convert those into "procsignals" too, for
> consistency.  In the attached, they can be set (ie by the same
> backend) with ProcSignalRaise(), but it's possible that in future we
> might have a reason for another backend to set them too, so it seems
> like a good idea to have a single system, effectively merging the
> concepts of "procsignals" and "interrupts".

This seems a bit confusing to me. For one, we need to have interrupts working
before we have a proc, IIRC. But leaving details like that aside, it just
seems a bit backwards to me. I'm on board with other backends directly setting
interrupt flags, but it seems to me that the procsignal stuff should be
"client" of the process-local interrupt infrastructure, rather than the other
way round.

> +bool
> +ProcSignalAnyPending(void)
> +{
> +    volatile ProcSignalSlot *slot = MyProcSignalSlot;
>  
> -    if (CheckProcSignal(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN))
> -        RecoveryConflictInterrupt(PROCSIG_RECOVERY_CONFLICT_BUFFERPIN);
> +    /* XXX make this static inline? */
> +    /* XXX point to a dummy entry instead of using NULL to avoid a branch */
> +    return slot && slot->pss_signaled;
> +}

ISTM it might be easier to make this stuff efficiently race-free if we made
this a count of pending operations.

> @@ -3131,12 +3124,13 @@ ProcessInterrupts(void)
>      /* OK to accept any interrupts now? */
>      if (InterruptHoldoffCount != 0 || CritSectionCount != 0)
>          return;
> -    InterruptPending = false;
> +    ProcSignalClearAnyPending();
> +
> +    pg_read_barrier();
>  
> -    if (ProcDiePending)
> +    if (ProcSignalConsume(PROCSIG_DIE))
>      {

I think making all of these checks into function calls isn't great. How about
making the set of pending signals a bitmask? That'd allow to efficiently check
a bunch of interrupts together and even where not, it'd just be a single test
of the mask, likely already in a register.

Greetings,

Andres Freund

Re: Interrupts vs signals

От

Robert Haas

Дата:

11 ноября 2021 г., 14:06:01

On Thu, Nov 11, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > Depending on how you implement them, one difference could be whether / when
> > "slow" system calls (recv, poll, etc) are interrupted.
>
> Hopefully by now all such waits are implemented with latch.c facilities?

Do read(), write(), etc. count? Because we certainly have raw calls to
those functions in lots of places.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

От

Andres Freund

Дата:

11 ноября 2021 г., 19:50:00

Hi,

On 2021-11-11 09:06:01 -0500, Robert Haas wrote:
> On Thu, Nov 11, 2021 at 12:27 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > Depending on how you implement them, one difference could be whether / when
> > > "slow" system calls (recv, poll, etc) are interrupted.
> >
> > Hopefully by now all such waits are implemented with latch.c facilities?
> 
> Do read(), write(), etc. count? Because we certainly have raw calls to
> those functions in lots of places.

They can count, but only when used for network sockets or pipes ("slow
devices" or whatever the posix language is). Disk IO doesn't count as that. So
I don't think it'd be a huge issue.

Greetings,

Andres Freund

Re: Interrupts vs signals

От

Robert Haas

Дата:

11 ноября 2021 г., 20:24:41

On Thu, Nov 11, 2021 at 2:50 PM Andres Freund <andres@anarazel.de> wrote:
> They can count, but only when used for network sockets or pipes ("slow
> devices" or whatever the posix language is). Disk IO doesn't count as that. So
> I don't think it'd be a huge issue.

Somehow the idea that the network is a slow device and the disk a fast
one does not seem like it's necessarily accurate on modern hardware,
but I guess the spec is what it is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

От

Thomas Munro

Дата:

11 ноября 2021 г., 20:57:38

On Fri, Nov 12, 2021 at 9:24 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Nov 11, 2021 at 2:50 PM Andres Freund <andres@anarazel.de> wrote:
> > They can count, but only when used for network sockets or pipes ("slow
> > devices" or whatever the posix language is). Disk IO doesn't count as that. So
> > I don't think it'd be a huge issue.
>
> Somehow the idea that the network is a slow device and the disk a fast
> one does not seem like it's necessarily accurate on modern hardware,
> but I guess the spec is what it is.

[Somehow I managed to reply to Robert only; let me try that again,
this time to the list...]

Network filesystems have in the past been confusing because they're
both disk-like and network-like, and also slow as !@#$, which is why
there have been mount point options like "intr", "nointr" (now ignored
on Linux) to control what happens if you receive an async signal
during a sleepy read/write.  But even if you had some kind of
Deathstation 9000 that had a switch on the front panel that ignores
SA_RESTART and produces EINTR for disk I/O when a signal arrives,
PostgreSQL already doesn't work today.  Our pread() and pwrite() paths
for data and WAL don't not have a EINTR loops or
CHECK_FOR_INTERRUPTS() (we just can't take interrupts in the middle of
eg a synchronous write), so I think we'd produce an ERROR or PANIC.
So I think disk I/O is irrelevant, and network/pipe I/O is already
handled everywhere via latch.c facilities.

If there are any eg blocking reads on a socket in PostgreSQL, we
should fix that to use latch.c non-blocking techniques, because such a
place is already a place that ignores postmaster death and interrupts.
To be more precise: such a place could of course wake up for EINTR on
SIGUSR1 from procsignal.c, and that would no longer happen with my
patch, but if we're relying on that anywhere, it's dangerous and
unreliable.  If SIGUSR1 is delivered right before you enter a blocking
read(), you'll sleep waiting for the socket or whatever.  That's
precisely the problem that latch.c solves, and why it's already a bug
if there are such places.

Re: Interrupts vs signals

От

Thomas Munro

Дата:

08 июля 2024 г., 02:56:38

Here's an updated version of this patch.

The main idea is that SendProcSignal(pid, PROCSIGNAL_XXX, procno)
becomes SendInterrupt(INTERRUPT_XXX, procno), and all the pending
interrupt global variables and pss_procsignalFlags[] go away, along
with the SIGUSR1 handler.  The interrupts are compressed into a single
bitmap.  See commit message for more details.

The patch is failing on Windows CI for reasons I haven't debugged yet,
but I wanted to share what I have so far.  Work in progress!

Here is my attempt to survey the use of signals and write down what I
think we might do about them all so far, to give the context for this
patch:

https://wiki.postgresql.org/wiki/Signals

Comments, corrections, edits very welcome.

Вложения

v2-0001-Redesign-interrupts-remove-ProcSignals.patch

Re: Interrupts vs signals

От

Heikki Linnakangas

Дата:

08 июля 2024 г., 09:38:22

On 08/07/2024 05:56, Thomas Munro wrote:
> Here's an updated version of this patch.
> 
> The main idea is that SendProcSignal(pid, PROCSIGNAL_XXX, procno)
> becomes SendInterrupt(INTERRUPT_XXX, procno), and all the pending
> interrupt global variables and pss_procsignalFlags[] go away, along
> with the SIGUSR1 handler.  The interrupts are compressed into a single
> bitmap.  See commit message for more details.
> 
> The patch is failing on Windows CI for reasons I haven't debugged yet,
> but I wanted to share what I have so far.  Work in progress!
> 
> Here is my attempt to survey the use of signals and write down what I
> think we might do about them all so far, to give the context for this
> patch:
> 
> https://wiki.postgresql.org/wiki/Signals
> 
> Comments, corrections, edits very welcome.

Nice, thanks!

> Background worker state notifications are also changed from raw
> kill(SIGUSR1) to SetLatch().  That means that SetLatch() is now called
> from the postmaster.  The main purpose of including that change is to be
> able to remove procsignal_sigusr1_handler completely and set SIGUSR1 to
> SIG_IGN, and show the system working.
> 
> XXX Do we need to invent SetLatchRobust() that doesn't trust anything in
> shared memory, to be able to set latches from the postmaster?

The patch actually does both: it still does kill(SIGUSR1) and also sets 
the latch.

I think it would be nice if RegisterDynamicBackgroundWorker() had a 
"bool notify_me" argument, instead of requiring the caller to set 
"bgw_notify_pid = MyProcPid" before the call. That's a 
backwards-compatibility break, but maybe we should bite the bullet and 
do it. Or we could do this in RegisterDynamicBackgroundWorker():

if (worker->bgw_notify_pid == MyProcPid)
     worker->bgw_notify_pgprocno = MyProcNumber;

I think we can forbid setting pgw_notify_pid to anything else than 0 or 
MyProcPid.

A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it 
can do any damage if you called it on a pointer to garbage, except if 
the pointer itself is bogus, then just dereferencing it an cause a 
segfault. So it would be nice to have a version specifically designed 
with that in mind. For example, it could assume that the Latch's pid is 
never legally equal to MyProcPid, because postmaster cannot own any latches.

Another approach would be to move the responsibility of background 
worker state notifications out of postmaster completely. When a new 
background worker is launched, the worker process itself could send the 
notification that it has started. And similarly, when a worker exits, it 
could send the notification just before exiting. There's a little race 
condition with exiting: if a process is waiting for the bgworker to 
exit, and launches a new worker immediately when the old one exits, 
there will be a brief period when the old and new process are alive at 
the same time. The old worker wouldn't be doing anything interesting 
anymore since it's exiting, but it still counts towards 
max_worker_processes, so launching the new process might fail because of 
hitting the limit. Maybe we should just bump up max_worker_processes. Or 
postmaster could check PMChildFlags and not count processes that have 
already deregistered from PMChildFlags towards the limit.

> -volatile uint32 InterruptHoldoffCount = 0;
> -volatile uint32 QueryCancelHoldoffCount = 0;
> -volatile uint32 CritSectionCount = 0;
> +uint32 InterruptHoldoffCount = 0;
> +uint32 QueryCancelHoldoffCount = 0;
> +uint32 CritSectionCount = 0;

I wondered if these are used in PG_TRY-CATCH blocks in a way that would 
still require volatile. I couldn't find any such usage by some quick 
grepping, so I think we're good, but I thought I'd mention it.

> +/*
> + * The set of "standard" interrupts that CHECK_FOR_INTERRUPTS() and
> + * ProcessInterrupts() handle.  These perform work that is safe to run whenever
> + * interrupts are not "held".  Other kinds of interrupts are only handled at
> + * more restricted times.
> + */
> +#define INTERRUPT_STANDARD_MASK                               \

Some interrupts are missing from this mask:

- INTERRUPT_PARALLEL_APPLY_MESSAGE
- INTERRUPT_IDLE_STATS_UPDATE_TIMEOUT
- INTERRUPT_SINVAL_CATCHUP
- INTERRUPT_NOTIFY

Is that on purpose?

> -/*
> - * Because backends sitting idle will not be reading sinval events, we
> - * need a way to give an idle backend a swift kick in the rear and make
> - * it catch up before the sinval queue overflows and forces it to go
> - * through a cache reset exercise.  This is done by sending
> - * PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind.
> - *
> - * The signal handler will set an interrupt pending flag and will set the
> - * processes latch. Whenever starting to read from the client, or when
> - * interrupted while doing so, ProcessClientReadInterrupt() will call
> - * ProcessCatchupEvent().
> - */
> -volatile sig_atomic_t catchupInterruptPending = false;

Would be nice to move that comment somewhere else rather than remove it 
completely.

> --- a/src/backend/storage/lmgr/proc.c
> +++ b/src/backend/storage/lmgr/proc.c
> @@ -444,6 +444,10 @@ InitProcess(void)
>      OwnLatch(&MyProc->procLatch);
>      SwitchToSharedLatch();
>  
> +    /*We're now ready to accept interrupts from other processes. */
> +    pg_atomic_init_u32(&MyProc->pending_interrupts, 0);
> +    SwitchToSharedInterrupts();
> +
>      /* now that we have a proc, report wait events to shared memory */
>      pgstat_set_wait_event_storage(&MyProc->wait_event_info);
>  
> @@ -611,6 +615,9 @@ InitAuxiliaryProcess(void)
>      OwnLatch(&MyProc->procLatch);
>      SwitchToSharedLatch();
>  
> +    /* We're now ready to accept interrupts from other processes. */
> +    SwitchToSharedInterrupts();
> +
>      /* now that we have a proc, report wait events to shared memory */
>      pgstat_set_wait_event_storage(&MyProc->wait_event_info);
>  

Is there a reason for the different initialization between a regular 
backend and aux process?

> +/*
> + * Switch to shared memory interrupts.  Other backends can send interrupts
> + * to this one if they know its ProcNumber.
> + */
> +void
> +SwitchToSharedInterrupts(void)
> +{
> +    pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(MyPendingInterrupts));
> +    MyPendingInterrupts = &MyProc->pending_interrupts;
> +}

Hmm, I think there's a race condition here (and similarly in 
SwitchToLocalInterrupts), if the signal handler runs between the 
pg_atomic_fetch_or_u32, and changing MyPendingInterrupts. Maybe 
something like this instead:

MyPendingInterrupts = &MyProc->pending_interrupts;
pg_memory_barrier();
pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, 
pg_atomic_read_u32(LocalPendingInterrupts));

And perhaps this should also clear LocalPendingInterrupts, just to be tidy.

> @@ -138,7 +139,8 @@
>  typedef struct ProcState
>  {
>      /* procPid is zero in an inactive ProcState array entry. */
> -    pid_t        procPid;        /* PID of backend, for signaling */
> +    pid_t        procPid;        /* pid of backend */
> +    ProcNumber    pgprocno;        /* for sending interrupts */
>      /* nextMsgNum is meaningless if procPid == 0 or resetState is true. */
>      int            nextMsgNum;        /* next message number to read */
>      bool        resetState;        /* backend needs to reset its state */

We can easily remove procPid altogether now that we have pgprocno here. 
Similarly with the pid/pgprocno in ReplicationSlot and WalSndState.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Interrupts vs signals

От

Robert Haas

Дата:

08 июля 2024 г., 20:18:01

On Mon, Jul 8, 2024 at 5:38 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Another approach would be to move the responsibility of background
> worker state notifications out of postmaster completely. When a new
> background worker is launched, the worker process itself could send the
> notification that it has started. And similarly, when a worker exits, it
> could send the notification just before exiting. There's a little race
> condition with exiting: if a process is waiting for the bgworker to
> exit, and launches a new worker immediately when the old one exits,
> there will be a brief period when the old and new process are alive at
> the same time. The old worker wouldn't be doing anything interesting
> anymore since it's exiting, but it still counts towards
> max_worker_processes, so launching the new process might fail because of
> hitting the limit. Maybe we should just bump up max_worker_processes. Or
> postmaster could check PMChildFlags and not count processes that have
> already deregistered from PMChildFlags towards the limit.

I can testify that the current system is the result of a lot of trial
and error. I'm not saying it can't be made better, but my initial
attempts at getting this to work (back in the 9.4 era) resembled what
you proposed here, were consequently a lot simpler than what we have
now, and also did not work. Race conditions like you mention here were
part of that. Another consideration is that fork() can fail, and in
that case, the process that tried to register the new background
worker needs to find out that the background worker won't ever be
starting. Yet another problem is that, even if fork() succeeds, the
new process might fail before it executes any of our code e.g. because
it seg faults very early, a case that actually happened to me -
inadvertently - while I was testing these facilities. I ended up
deciding that we can't rely on the new process to do anything until
it's given us some signal that it is alive and able to carry out its
duties. If it dies before telling us that, or never starts in the
first place, we have to have some other way of finding that out, and
it's difficult to see how that can happen without postmaster
involvement.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Interrupts vs signals

От

Thomas Munro

Дата:

10 июля 2024 г., 06:48:48

On Mon, Jul 8, 2024 at 9:38 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> The patch actually does both: it still does kill(SIGUSR1) and also sets
> the latch.

Yeah, I had some ideas about supporting old extension code that really
wanted a SIGUSR1, but on reflection, the only reason anyone ever wants
that is so that sigusr1_handler can SetLatch(), which pairs with
WaitLatch() in WaitForBackgroundWorker*().  Let's go all the way and
assume that.

> I think it would be nice if RegisterDynamicBackgroundWorker() had a
> "bool notify_me" argument, instead of requiring the caller to set
> "bgw_notify_pid = MyProcPid" before the call. That's a
> backwards-compatibility break, but maybe we should bite the bullet and
> do it. Or we could do this in RegisterDynamicBackgroundWorker():
>
> if (worker->bgw_notify_pid == MyProcPid)
>      worker->bgw_notify_pgprocno = MyProcNumber;
>
> I think we can forbid setting pgw_notify_pid to anything else than 0 or
> MyProcPid.

Another idea: we could keep the bgw_notify_pid field around for a
while, documented as unused and due to be removed in future.  We could
automatically capture the caller's proc number.  So you get latch
wakeups by default, which I expect many people want, and most people
could cope with even if they don't want them.  If you really really
don't want them, you could set a new flag BGW_NO_NOTIFY.

I have now done this part of the change in a new first patch.  This
particular use of kill(SIGUSR1) is separate from the ProcSignal
removal, it's just that it relies on ProcSignal's handler's default
action of calling SetLatch().  It's needed so the ProcSignal-ectomy
can fully delete sigusr1_handler(), but it's not directly the same
thing, so it seemed good to split the patch.

> A SetLatchRobust would be nice. Looking at SetLatch(), I don't think it
> can do any damage if you called it on a pointer to garbage, except if
> the pointer itself is bogus, then just dereferencing it an cause a
> segfault. So it would be nice to have a version specifically designed
> with that in mind. For example, it could assume that the Latch's pid is
> never legally equal to MyProcPid, because postmaster cannot own any latches.

Yeah I'm starting to think that all we need to do here is range-check
the proc number.  Here's a version that adds:
ProcSetLatch(proc_number).  Another idea would be for SetLatch(latch)
to sanitise the address of a latch, ie its offset and range.

What the user really wants to be able to do with this bgworker API, I
think, is wait for a given a handle, which could find a condition
variable + generation in the slot, so that we don't have to register
any proc numbers anywhere until we're actually waiting.  But *clearly*
the postmaster can't use the condition variable API without risking
following corrupted pointers in shared memory.  Whereas AFAICT
ProcSetLatch() from the patched postmaster can't really be corrupted
in any new way that it couldn't already be corrupted in master (where
it runs in the target process), if we're just a bit paranoid about how
we find our way to the latch.

Receiving latch wakeups in the postmaster might be another question,
but I don't think we need to confront that question just yet.

> > -volatile uint32 InterruptHoldoffCount = 0;
> > -volatile uint32 QueryCancelHoldoffCount = 0;
> > -volatile uint32 CritSectionCount = 0;
> > +uint32 InterruptHoldoffCount = 0;
> > +uint32 QueryCancelHoldoffCount = 0;
> > +uint32 CritSectionCount = 0;
>
> I wondered if these are used in PG_TRY-CATCH blocks in a way that would
> still require volatile. I couldn't find any such usage by some quick
> grepping, so I think we're good, but I thought I'd mention it.

Hmm.  Still thinking about this.

> > +/*
> > + * The set of "standard" interrupts that CHECK_FOR_INTERRUPTS() and
> > + * ProcessInterrupts() handle.  These perform work that is safe to run whenever
> > + * interrupts are not "held".  Other kinds of interrupts are only handled at
> > + * more restricted times.
> > + */
> > +#define INTERRUPT_STANDARD_MASK                                                         \
>
> Some interrupts are missing from this mask:
>
> - INTERRUPT_PARALLEL_APPLY_MESSAGE

Oops, that one ^ is a rebasing mistake.  I wrote the ancestor of this
patch in 2021, and that new procsignal arrived in 2023, and I put the
code in to handle it, but I forgot to add it to the mask, which gives
me an idea (see below)...

> - INTERRUPT_IDLE_STATS_UPDATE_TIMEOUT
> - INTERRUPT_SINVAL_CATCHUP
> - INTERRUPT_NOTIFY
>
> Is that on purpose?

INTERRUPT_SINVAL_CATCHUP and INTERRUPT_NOTIFY are indeed handled
differently on purpose.  In master, they don't set InterruptPending,
and they are not handled by regular CHECK_FOR_INTERRUPTS() sites, but
in the patch they still need a bit in pending_interrupts, and that is
what that mask hides from CHECK_FOR_INTERRUPTS().  They are checked
explicitly in ProcessClientReadInterrupt().  I think the idea is that
we can't handle sinval at random places because that might create
dangling pointers to cached objects where we don't expect them, and we
can't emit NOTIFY-related protocol messages at random times either.

There is something a little funky about _IDLE_STATS_UPDATE_TIMEOUT,
though.  It has a different scheme for running only when idle, where
if it opts not to do anything, it doesn't consume the interrupt (a
later CFI() will, but the latch is not set so we might sleep).  I was
confused by that.  I think I have changed to be more faithful to
master's behaviour now.

Hmm, a better terminology for the interupts that CFI handles would be
s/standard/regular/, so I have changed that.

New idea: it would be less error-prone if we instead had a mask of
these special cases, of which there are now only two.  Tried that way!

> > -/*
> > - * Because backends sitting idle will not be reading sinval events, we
> > - * need a way to give an idle backend a swift kick in the rear and make
> > - * it catch up before the sinval queue overflows and forces it to go
> > - * through a cache reset exercise.  This is done by sending
> > - * PROCSIG_CATCHUP_INTERRUPT to any backend that gets too far behind.
> > - *
> > - * The signal handler will set an interrupt pending flag and will set the
> > - * processes latch. Whenever starting to read from the client, or when
> > - * interrupted while doing so, ProcessClientReadInterrupt() will call
> > - * ProcessCatchupEvent().
> > - */
> > -volatile sig_atomic_t catchupInterruptPending = false;
>
> Would be nice to move that comment somewhere else rather than remove it
> completely.

OK, I moved it to the top of ProcessCatchupInterrupt().

> > --- a/src/backend/storage/lmgr/proc.c
> > +++ b/src/backend/storage/lmgr/proc.c
> > @@ -444,6 +444,10 @@ InitProcess(void)
> >       OwnLatch(&MyProc->procLatch);
> >       SwitchToSharedLatch();
> >
> > +     /*We're now ready to accept interrupts from other processes. */
> > +     pg_atomic_init_u32(&MyProc->pending_interrupts, 0);
> > +     SwitchToSharedInterrupts();
> > +
> >       /* now that we have a proc, report wait events to shared memory */
> >       pgstat_set_wait_event_storage(&MyProc->wait_event_info);
> >
> > @@ -611,6 +615,9 @@ InitAuxiliaryProcess(void)
> >       OwnLatch(&MyProc->procLatch);
> >       SwitchToSharedLatch();
> >
> > +     /* We're now ready to accept interrupts from other processes. */
> > +     SwitchToSharedInterrupts();
> > +
> >       /* now that we have a proc, report wait events to shared memory */
> >       pgstat_set_wait_event_storage(&MyProc->wait_event_info);
> >
>
> Is there a reason for the different initialization between a regular
> backend and aux process?

No.  But I thought about something else to fix here.  Really we don't
want to switch to shared interrupts until we are ready for CFI() to do
stuff.  I think that should probably be at the places where master
unblocks signals.  Otherwise, for example, if someone sends you an
interrupt while you're starting up, something as innocent as
elog(DEBUG, ...), which reaches CFI(), might try to do things for
which the infrastructure is not yet fully set up, for example
INTERRUPT_BARRIER.

Not done yet, but wanted to share this new version.

> > +/*
> > + * Switch to shared memory interrupts.  Other backends can send interrupts
> > + * to this one if they know its ProcNumber.
> > + */
> > +void
> > +SwitchToSharedInterrupts(void)
> > +{
> > +     pg_atomic_fetch_or_u32(&MyProc->pending_interrupts, pg_atomic_read_u32(MyPendingInterrupts));
> > +     MyPendingInterrupts = &MyProc->pending_interrupts;
> > +}
>
> Hmm, I think there's a race condition here (and similarly in
> SwitchToLocalInterrupts), if the signal handler runs between the
> pg_atomic_fetch_or_u32, and changing MyPendingInterrupts. Maybe
> something like this instead:
>
> MyPendingInterrupts = &MyProc->pending_interrupts;
> pg_memory_barrier();
> pg_atomic_fetch_or_u32(&MyProc->pending_interrupts,
> pg_atomic_read_u32(LocalPendingInterrupts));

Yeah, right, done.

> And perhaps this should also clear LocalPendingInterrupts, just to be tidy.

I used atomic_exchange() to read and clear the bits in one go.

> > @@ -138,7 +139,8 @@
> >  typedef struct ProcState
> >  {
> >       /* procPid is zero in an inactive ProcState array entry. */
> > -     pid_t           procPid;                /* PID of backend, for signaling */
> > +     pid_t           procPid;                /* pid of backend */
> > +     ProcNumber      pgprocno;               /* for sending interrupts */
> >       /* nextMsgNum is meaningless if procPid == 0 or resetState is true. */
> >       int                     nextMsgNum;             /* next message number to read */
> >       bool            resetState;             /* backend needs to reset its state */
>
> We can easily remove procPid altogether now that we have pgprocno here.

Since other things access those values, I propose to remove them in
separate patches.

> Similarly with the pid/pgprocno in ReplicationSlot and WalSndState.

Same.  Those pids show up in user interfaces, so I think they should
be handled in separate patches.

Note to self: I need to change some pgprocno to proc_number...

The next problems to remove are, I think, the various SIGUSR2, SIGINT,
SIGTERM signals sent by the postmaster.  These should clearly become
SendInterrupt() or ProcSetLatch().  The problem here is that the
postmaster doesn't have the proc numbers yet.  One idea is to teach
the postmaster to assign them!  Not explored yet.

This version is passing on Windows.  I'll create a CF entry.  Still
work in progress!

On 28/01/2025 19:01, Andres Freund wrote:
> On 2024-12-02 16:39:28 +0200, Heikki Linnakangas wrote:
>> This also moves the WaitEventSet functions to a different source file,
>> waiteventset.c. This separates the platform-dependent code waiting and
>> signalling code from the platform-independent parts.
> 
> I think this should be split into a separate commit. It's painful to verify
> that code-movement didn't change anything else if there's lots of other
> changes in the same commit.

Here's a patch set to do just that, split WaitEventSet stuff to a 
separate source file. That's actually a pretty nice separation even 
without any of the rest of the patches.

I noticed that the ShutdownLatchSupport() function is unused. The first 
patch removes it.

The second patch makes it possible to use ModifyWaitEvent() to switch 
between WL_POSTMASTER_DEATH and WL_EXIT_ON_PM_DEATH. WaitLatch() used to 
modify WaitEventSet->exit_on_postmaster_death directly, now it uses 
ModifyWaitEvent() for that. That's needed because with the final patch, 
WaitLatch() is in a different source file than WaitEventSet, so it 
cannot directly modify its field anymore.

The third patch is mechanical and moves existing code. The file header 
comments in the modified files are perhaps worth reviewing. They are 
also just existing text moved around, but there was some small decisions 
on what exactly should go where.

I'll continue working on the other parts, but these patches seems ready 
for commit already.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Here's a new patch set. It includes the previous work, and also goes the 
whole hog and replaces procsignals and many other signalling with 
interrupts. It's based on Thomas's 
v3-0002-Redesign-interrupts-remove-ProcSignals.patch much earlier in 
this thread.

That's included as a separate patch in the patch set, on top of the 
previous ones. I like the end state, but I'm not sure if that's the most 
sensible way to commit these. It might make more sense to commit the 
procsignal->interrupt changes first, and the commit to remove latches 
second. Or just commit all in one giant commit. Not sure.

In any case, I hope this split is easier to review at least than just 
squashing them all tgoether...

I haven't addressed all your comments yet, and there are some TODO/FIXME 
comments. More details below.

On 28/01/2025 19:01, Andres Freund wrote:
> On 2024-12-02 16:39:28 +0200, Heikki Linnakangas wrote:
>>  From eff8de11fbfea4e2aadce9c1d71452b0f5a1b80b Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Mon, 2 Dec 2024 15:51:54 +0200
>> Subject: [PATCH v5 1/4] Replace Latches with Interrupts
> 
> Your email has only three attachements - I assume the fourth is something
> local that didn't need to be shared?

Yeah, it was a work-in-progress patch to convert config-reload requests 
to the interrupts, to see how it works.

>> The Latch was a flag, with an inter-process signalling mechanism that
>> allowed a process to wait for the Latch to be set. My original vision
>> for Latches was that you could have different latches for different
>> things that you need to wait for, but in practice, almost all code
>> used one "process latch" that was shared for all wakeups. The only
>> exception was the "recoveryWakeupLatch". I think the reason that we
>> ended up just sharing the same latch for all wakeups is that it was
>> cumbersome in practice to deal with multiple latches. For starters,
>> there was no machinery for waiting for multiple latches at the same
>> time. Secondly, changing the "ownership" of a latch needed extra
>> locking or other coordination. Thirdly, each inter-process latch
>> needed to be initialized at postmaster startup time.
>>
>> This patch embraces the reality of how Latches were used, and replaces
>> the Latches with a per-process interrupt mask. You can no longer
>> allocate Latches in arbitrary shared memory structs and an interrupt
>> is always directed at a particular process, addressed by its
>> ProcNumber.  Each process has a bitmask of pending interrupts in
>> PGPROC.
> 
> That doesn't really remove the need for inter-process coordination to know
> what ProcNumber corresponds to what...

Right. Was there a comment that claims it does or something, or why do 
you point that out?

>> This commit introduces two interrupt bits. INTERRUPT_GENERAL replaces
>> the general-purpose per-process latch. All code that previously set a
>> process's process latch now sets its INTERRUPT_GENERAL interrupt bit
>> instead.
> 
> Half-formed-at-best thought: I wonder if we should split the "interrupt
> yourself in response to a signal" cases from "another process wakes you up",
> even in the initial commit. They seem rather different to me.

Hmm. Some interrupts are indeed clearly only sent within the process, 
while others are sent across processes. Not sure where that distinction 
would help though. Just group them together and have comments? Or what 
did you have in mind?

There are other ways to group the interrupts. Some are harmless if 
they're sent to a wrong process. Others are not. Some are handled by 
CHECK_FOR_INTERRUTPS, others are not.

>> diff --git a/src/include/storage/interrupt.h b/src/include/storage/interrupt.h
>> new file mode 100644
>> index 0000000000..1da05369c3
>> --- /dev/null
>> +++ b/src/include/storage/interrupt.h
> 
> So we now have postmaster/interrupt.[ch] and storage/interrupt.h /
> storage/ipc/interrupt.c.

In the attached version, I renamed the existing 
postmaster/interrupt.[ch] to postmaster/interrupt_handlers.[ch]. I think 
that helps.

Now that I look at the big picture again, though, I'm thinking that they 
should perhaps be merged, so that there would be only 
postmaster/interrupt.[ch]. That's how Thomas had them in the original 
procsignal->interrupt patches. There's not much left of the old 
postmaster/interrupt.[ch], and it's all related to interrupts anyway.

>> + * The "standard" set of interrupts is handled by CHECK_FOR_INTERRUPTS(), and
>> + * consists of tasks that are safe to perform at most times.  They can be
>> + * suppressed by HOLD_INTERRUPTS()/RESUME_INTERRUPTS().
>> + *
>> + *
>> + * The correct pattern to wait for event(s) using INTERRUPT_GENERAL is:
>> + *
>> + * for (;;)
>> + * {
>> + *       ClearInterrupt(INTERRUPT_GENERAL);
>> + *       if (work to do)
>> + *           Do Stuff();
>> + *       WaitInterrupt(1 << INTERRUPT_GENERAL, ...);
>> + * }
> 
> I don't particularly like that there's now dozens of places that need to do
> bit-shifting.
> 
> One reason I don't like it is that, for example, this should actually be 1U <<
> INTERRUPT_GENERAL, and at some later point we might need it to be a 64bit
> integer due to a larger number of interrupts, which will require going to all
> extensions and updating them.
> 
> Perhaps WaitInterrupt should accept just a single interrupt type and
> WaitEventSets should allow registering/reporting different interrupts to wait
> for as different events?
> 
> Or perhaps we could introduce an interrupt mask type?

I changed the values of INTERRUPT_* to be bitmasks, so that you can do 
e.g. "INTERRUPT_GENERAL | INTERRUPT_CONFIG_RELOAD" without any 
bit-shifting. All the functions like ClearInterrupt, RaiseInterrupt, 
SendInterrupt now work with a bitmask, it doesn't have to be just a 
single flag anymore.

>> +extern PGDLLIMPORT pg_atomic_uint32 *MyPendingInterrupts;
>> +
>> +/*
>> + * Flags in the pending interrupts bitmask. Each value represents one bit in
>> + * the bitmask.
>> + */
>> +typedef enum
> 
> Personally I prefer if we name the objects underlying typedefs, as otherwise
> some tools will label them "anonymous". And, while it doesn't matter for
> enums, which we can't forward declare in our version of C, it also is required
> to make forward declarations work.

Ok

>> +/*
>> + * Test an interrupt flag.
>> + */
>> +static inline bool
>> +InterruptIsPending(InterruptType reason)
>> +{
>> +    return (pg_atomic_read_u32(MyPendingInterrupts) & (1 << reason)) != 0;
>> +}
> 
> This has no memory ordering guarantees implied - is that correct?  I think
> that could lead to missing interrupts in some cases.

I added pg_read_barrier() and pg_write_barrier() calls to these. I'm not 
100% if that's really needed, or if I got them right, though.

> Given functions like ClearInterrupt() are <Verb><Object>, why differ here?
> 
> 
>> +/*
>> + * Test an interrupt flag.
>> + */
>> +static inline bool
>> +InterruptsPending(uint32 mask)
>> +{
>> +    return (pg_atomic_read_u32(MyPendingInterrupts) & (mask)) != 0;
>> +}
> 
> It seems somewhat confusing to have two naming patterns for this and the
> closely related InterruptIsPending().
> 
> Maybe IsInterruptPending() and AreInterruptsPending()? Or
> IsInterruptPendingMask()?
> 
> I think I like IsInterruptPendingMask() better, because we're not actually
> waiting for multiple interrupts, we're waiting for one out of multiple
> possible interrupts to arrive

There's now just a single function, InterruptPending(mask), which works 
with a single bit or mask of several bits.

I agree it's not a great name, I just didn't get around to think about 
that yet.

>> @@ -4476,7 +4458,10 @@ CheckPromoteSignal(void)
>>   void
>>   WakeupRecovery(void)
>>   {
>> -    SetLatch(&XLogRecoveryCtl->recoveryWakeupLatch);
>> +    ProcNumber    procno = ((volatile PROC_HDR *) ProcGlobal)->startupProc;
>> +
>> +    if (procno != INVALID_PROC_NUMBER)
>> +        SendInterrupt(INTERRUPT_RECOVERY_CONTINUE, procno);
>>   }
> 
> Not a new problem, and not really a problem for the startup process, that we
> don't expect to die, but: Code like this could obviously lead to interrupts
> being sent to the "former" owner of a procno.
> 
> I think either the file header of SendInterrupt() should mention that risk and
> that code has to be aware of it.

Yeah, that's fair. As the patch stands, the rules are different for 
different interrupts. Some interrupts are OK to send to random backends; 
no harm done. But others, like INTERRUPT_DIE or INTERRUPT_QUERY_CANCEL, 
are not. I think we need a mechanism to reliably send an interrupt to 
the right backend, without race conditions if the backend exits 
concurrently. Holding ProcArrayLock works, but that's a pretty big hammer.

> Should we explicitly define the interrupt state a process has at startup?

Makes sense. (Not done yet, it's a TODO)

>> +/*
>> + * Set an interrupt flag in this backend.
>> + */
>> +void
>> +RaiseInterrupt(InterruptType reason)
> 
>> +{
>> +    uint32        old_pending;
>> +
>> +    old_pending = pg_atomic_fetch_or_u32(MyPendingInterrupts, 1 << reason);
>> +
>> +    /*
>> +     * If the process is currently blocked waiting for an interrupt to arrive,
>> +     * and the interrupt wasn't already pending, wake it up.
>> +     */
>> +    if ((old_pending & (1 << reason | 1 << SLEEPING_ON_INTERRUPTS)) == 1 << SLEEPING_ON_INTERRUPTS)
> 
> This is a somewhat hard to read line. Maybe it's worth breaking it out into
> two conditions? Or use a local variable?

It's a little better now that the bitshifts are gone, but still a fair 
point..

> I was chatting with Heikki about this patch and he mentioned that he recalls a
> patch that did some work to unify the signal replacement, procsignal.h and
> CHECK_FOR_INTERRUPTS(). Thomas, that was probably from you? Do you have a
> pointer, if so?
> 
> It does seem like we're going to have to do some unification here. We have too
> many different partially overlapping, partially collaborating systems here.
> 
> 
> - procsignals - kinda like the interrupts here, but not quite. Probably can't
>   just merge them 1:1 into the proposed mechanism, or we'll run out of bits
>   rather soon.  I don't know if we want the relevant multiplexing to be built
>   into interrupt.h or not.
> 
>   Or we ought to redesign this mechanism to deal with more than 32 types of
>   interrupt.

In this patch, 29 out of 32 bits are used. Yeah, that doesn't leave much 
headroom.

Some ideas for addressing that:

1. Use 64-bit atomics. Are there any architectures left where that would 
not be acceptable?

2. Use 64-bit integers, but for the actual signaling part, split it into 
two 32-bit atomic words. Waiting on the interrupts would need to check 
both, but that seems fine from a performance point of view.

3. If we need > 64 bits, things get a bit awkward as you can no longer 
have a single integer to represent a bitmask that includes all the bits. 
That makes the function signatures more ugly, you need to have two 
arguments like interruptMask1 and interruptMask2 or somehting. But 
should basically work otherwise.

4. Multiplex the interrupts so that we need fewer of them. I think all 
the recovery conflict interrupts could be merged into one, for example, 
if we had a separate bitmask somewhere else in PGPROC to indicate which 
ones are set. The limitation would be that if interrupts are multiplexed 
together into one bit, you could only wait for all or none of them.

I'd like to not have to treat these bits as a scarce resource. It'd be 
nice to use even more distinct interrupts than what's in the patch now 
for different things, just for clarity. And I'd like to make it possible 
for extensions to allocate interrupt bits too. (Not included in these 
patches yet)

I'm leaning towards 1. or 2. at the moment. 64 bits should be enough for 
a long time.

> - pmsignal.h - basically the same thing as here, except for signals sent *too*
>   postmaster.
> 
>   It might or might not be required to keep this separate. There are different
>   "reliability" requirements...

I think pmsignal.c could be rewritten to use interrupts fairly easily. I 
didn't do that yet though. For that, postmaster needs to have a PGPROC 
slot, or at least a special reserved ProcNumber, but it doesn't seem hard.

> - CHECK_FOR_INTERRUPTS(), which uses the mechanism introduced here to react to
>   signals while blocked, using RaiseInterrupt() (whereas it did SetLatch()
>   before).
> 
>   A fairly simple improvement would be to use different InterruptType for
>   e.g. termination and query cancellation.

That's done now.

>   But even with this infrastructure in place, we can't *quite* get away from
>   signal handlers, because we need some way to set at the very least
>   InterruptPending (and perhaps ProcDiePending, QueryCancelPending etc,
>   although those could be replaced with InterruptIsPending()). The
>   infrastructure introduced with these patches provides neither a way to set
>   those variables in response to delivery of an interrupt, nor can we
>   currently set them from another backend, as they are global variables.
> 
>   We could of course do InterruptsPending(~0) in in
>   INTERRUPTS_PENDING_CONDITION(), but it would require analyzing the increased
>   indirection overhead as well as the "timeliness" guarantees.
> 
>   Today we don't need a memory barrier around checking InterruptPending,
>   because delivery of a signal delivery (via SetLatch()) ensures that. But I
>   think we would need one if we were to not send signals for
>   cancellation/terminations etc. I.e. right now the overhead of delivery is
>   bigger, but checking if there pending signals is cheaper.

That's changed heavily in this latest patch. All the *Pending global 
variables are gone, they are replaced with InterruptPending() calls.

>   Another aspect of this is that we're cramming more and more code into
>   ProcessInterrupts(), in a long weave of ifs.  I wonder if somewhere along
>   ipc/interrupt.h there should be a facility to register callbacks to react to
>   delivered interrupts.  It'd need to be a bit more than just a mapping of
>   InterruptType to callback, because of things like InterruptHoldoffCount,
>   CritSectionCount, QueryCancelHoldoffCount.

+1 for the general idea, but I didn't pursue it yet.

There are some changes to CHECK_FOR_INTERRUPTS() and the holdoff counts 
here already though. The pattern for waiting for something now looks 
like this:

for (;;)
{
     CHECK_FOR_INTERRUPTS();

     /*
      * The baker will wake us up with INTERRUPT_GENERAL when the cake
      * is ready.
      */
     ClearInterrupt(INTERRUPT_GENERAL);
     if (IsTheCakeBakedYet())
         break;

     WaitInterrupt(CheckForInterruptsMask | INTERRUPT_GENERAL, ...);
}

CheckForInterruptsMask includes INTERRUPT_BARRIER, INTERRUPT_DIE, 
INTERRUPT_QUERY_CANCEL, and all the other interrupts that are handled by 
CHECK_FOR_INTERRUPTS(). However, when you call HOLD_INTERRUPTS(), 
INTERRUPT_DIE is removed from CheckForInterruptsMask.

(I'm not wedded to the name CheckForInterruptsMask. Thomas called them 
"regular interrupts" in his patch. I think I used the term "standard 
interrupts" somewhere.)

> Other questions:
> 
> - Can ipc/interrupts.c style interrupts be sent by postmaster? I think they
>   ought to before long.

Yes. The postmaster does that now to notify backends when background 
workers are launched or terminated.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Вложения

Re: Interrupts vs signals

От

Heikki Linnakangas

Дата:

06 марта, 19:43:47

On 06/03/2025 02:47, Heikki Linnakangas wrote:
> Here's a new patch set. It includes the previous work, and also goes the 
> whole hog and replaces procsignals and many other signalling with 
> interrupts. It's based on Thomas's v3-0002-Redesign-interrupts-remove- 
> ProcSignals.patch much earlier in this thread.

And here's yet another version. It's the same at high level, but with a 
ton of little fixes.

One notable change is that I merged storage/interrupt.[ch] with 
postmaster/interrupt.[ch], so the awkwardness of having two files with 
same name is gone.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

On Wed, Jul 23, 2025, at 09:42, Joel Jacobson wrote:
> Great work in this thread. I'll try to help, definitively benchmarking,
> but will also try to load the new design into my brain, to get the
> correct mental model of it, so I can hopefully help with code review as
> well.

First, my apologies for joining this long-running discussion late.

Following up on my email in the "Optimize LISTEN/NOTIFY" thread [1], I
wanted to elaborate on the architectural pattern my patches demonstrate,
as I believe it's highly complementary to the work being done here.

The fundamental distinction is between decoupled vs. combined state
management. The v7 patch set creates a new Interrupt system where the
wakeup mechanism and the 'pending' state for all subsystems are combined
into a single, centralized atomic bitmask.

My work explores an alternative: decoupling the two. The insight is that
many of our legacy subsystems were designed to be stateless because they
lacked the atomics and WaitEventSet abstractions we have today. The
signal was a workaround for this state management limitation.

This suggests a powerful, three-step migration pattern:

Step 1: Decouple state management with a lock-free atomic FSM.

Each subsystem should be empowered to manage its own state. My first
patch for async.c [1] demonstrates this by introducing a lock-free,
atomic finite state machine (FSM). Using a subsystem-specific atomic
integer and CAS operations, async.c now robustly manages its own IDLE,
SIGNALLED, PROCESSING states without any locks. This solves the state
synchronization problem directly and eliminates the vast majority of
redundant wakeups.

Step 2: Trivialize the wake-up with a generic poke.

Once state is managed reliably, the expensive kill(pid, SIGUSR1) syscall
can be trivially replaced with a direct SetLatch, as shown in my second
patch [1]. This is a simple and effective intermediate step that leverages
the existing WaitEventSet infrastructure to make the wakeup much
cheaper.

Step 3: Decouple event dispatch with specific interrupts.

The final goal is to replace the generic poke with SendInterrupt using a
specific reason (e.g., INTERRUPT_ASYNC_NOTIFY). This lets the event loop
wait on a bitmask of interrupt reasons and, on wakeup, dispatch only to
the relevant subsystem handler—fully decoupling event handling from
subsystem details.

This three-step, vertical migration strategy (subsystem by subsystem)
seems to offer a powerful alternative to a horizontal, layer-by-layer
replacement. It allows each subsystem's logic to be self-contained and
avoids constraints on a single global interrupt mask. The core
WaitEventSet is kept simple: its only job is to provide the efficient,
multiplexed wait.

I believe the ideal architecture uses your unified Interrupt system as
the low-level primitive, while encouraging key subsystems to adopt these
decoupled FSMs to determine *when* to call SendInterrupt.

I'm happy to help with this effort in any way I can.

/Joel

[1] https://www.postgresql.org/message-id/0b4d402a-9ac2-4aa8-acf8-8231dbe579ea%40app.fastmail.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Interrupts vs signals

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения