Обсуждение: add assertion for palloc in signal handlers

Поиск
Список
Период
Сортировка

add assertion for palloc in signal handlers

От
Nathan Bossart
Дата:
(new thread)

On Tue, Feb 17, 2026 at 03:30:57PM -0600, Nathan Bossart wrote:
> On Tue, Feb 17, 2026 at 11:18:00PM +0200, Heikki Linnakangas wrote:
>> On 14/02/2026 23:56, Andres Freund wrote:
>>> We really need some instrumentation that fails if we do allocations in signal
>>> handlers etc.
>> 
>> Yeah, that would be nice..
> 
> In theory we could pretty easily add assertions for that, given the
> wrapper_handler business added a couple of years ago.  I'll put together a
> patch...

As promised...  Fortunately, check-world didn't uncover any existing
issues.  I was able to manually verify the assertion by switching a
background worker to use bgworker_die() and sending it SIGTERM.  Probably
could use some additional commentary, which I'll add if the idea seems
reasonable to you.

-- 
nathan

Вложения

Re: add assertion for palloc in signal handlers

От
Andres Freund
Дата:
Hi,

On 2026-02-17 16:24:58 -0600, Nathan Bossart wrote:
> On Tue, Feb 17, 2026 at 03:30:57PM -0600, Nathan Bossart wrote:
> > On Tue, Feb 17, 2026 at 11:18:00PM +0200, Heikki Linnakangas wrote:
> >> On 14/02/2026 23:56, Andres Freund wrote:
> >>> We really need some instrumentation that fails if we do allocations in signal
> >>> handlers etc.
> >> 
> >> Yeah, that would be nice..
> > 
> > In theory we could pretty easily add assertions for that, given the
> > wrapper_handler business added a couple of years ago.  I'll put together a
> > patch...
> 
> As promised...  Fortunately, check-world didn't uncover any existing
> issues.  I was able to manually verify the assertion by switching a
> background worker to use bgworker_die() and sending it SIGTERM.  Probably
> could use some additional commentary, which I'll add if the idea seems
> reasonable to you.

Seems reasonable to me.  I guess we could put the various asserts into a
helper function, but it's ok as-is I think.

I think the spinlock functions should also assert this.

I'd advocate for adding an InSpinlock or such at the same time, but admittedly
there's not really anything forcing that to happen together.

Greetings,

Andres Freund



Re: add assertion for palloc in signal handlers

От
Heikki Linnakangas
Дата:
On 18/02/2026 01:11, Andres Freund wrote:
> Hi,
> 
> On 2026-02-17 16:24:58 -0600, Nathan Bossart wrote:
>> On Tue, Feb 17, 2026 at 03:30:57PM -0600, Nathan Bossart wrote:
>>> On Tue, Feb 17, 2026 at 11:18:00PM +0200, Heikki Linnakangas wrote:
>>>> On 14/02/2026 23:56, Andres Freund wrote:
>>>>> We really need some instrumentation that fails if we do allocations in signal
>>>>> handlers etc.
>>>>
>>>> Yeah, that would be nice..
>>>
>>> In theory we could pretty easily add assertions for that, given the
>>> wrapper_handler business added a couple of years ago.  I'll put together a
>>> patch...
>>
>> As promised...  Fortunately, check-world didn't uncover any existing
>> issues.  I was able to manually verify the assertion by switching a
>> background worker to use bgworker_die() and sending it SIGTERM.  Probably
>> could use some additional commentary, which I'll add if the idea seems
>> reasonable to you.
> 
> Seems reasonable to me.  I guess we could put the various asserts into a
> helper function, but it's ok as-is I think.

+1

> I think the spinlock functions should also assert this.

+1

> I'd advocate for adding an InSpinlock or such at the same time, but admittedly
> there's not really anything forcing that to happen together.

What would you do with the InSpinlock flag? Forbid palloc()'s etc. while 
holding a spinlock? I guess, although I'm not too worried about that. 
Spinlocks are not held for long.

- Heikki




Re: add assertion for palloc in signal handlers

От
Andres Freund
Дата:
Hi,

On 2026-02-18 02:17:34 +0200, Heikki Linnakangas wrote:
> > I'd advocate for adding an InSpinlock or such at the same time, but admittedly
> > there's not really anything forcing that to happen together.
> 
> What would you do with the InSpinlock flag? Forbid palloc()'s etc. while
> holding a spinlock? I guess, although I'm not too worried about that.

Forbid other spinlocks, elog, palloc, lwlock, for starters. I've seen all of
those in patches in the last years.


> Spinlocks are not held for long.

That's what should be the case, yet we semi-regularly get patches that don't
follow that rule...

Greetings,

Andres Freund



Re: add assertion for palloc in signal handlers

От
Kirill Reshke
Дата:
On Wed, 18 Feb 2026 at 03:25, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> (new thread)
>
> On Tue, Feb 17, 2026 at 03:30:57PM -0600, Nathan Bossart wrote:
> > On Tue, Feb 17, 2026 at 11:18:00PM +0200, Heikki Linnakangas wrote:
> >> On 14/02/2026 23:56, Andres Freund wrote:
> >>> We really need some instrumentation that fails if we do allocations in signal
> >>> handlers etc.
> >>
> >> Yeah, that would be nice..
> >
> > In theory we could pretty easily add assertions for that, given the
> > wrapper_handler business added a couple of years ago.  I'll put together a
> > patch...
>
> As promised...  Fortunately, check-world didn't uncover any existing
> issues.  I was able to manually verify the assertion by switching a
> background worker to use bgworker_die() and sending it SIGTERM.  Probably
> could use some additional commentary, which I'll add if the idea seems
> reasonable to you.
>
> --
> nathan


Hi! I tested patch under --single (single user mode), and ISTM that we
can reach palloc from ProcessInterrups here?
Maybe we should not elog in single user mode inside sighandler...

```
(gdb)
1398        Assert(!InSignalHandler);
(gdb) p InSignalHandler
$11 = 1
(gdb) bt
#0  palloc (size=size@entry=1024) at mcxt.c:1398
#1  0x0000555555c85a42 in initStringInfoInternal (initsize=1024,
str=0x7fffffffcdc0) at stringinfo.c:45
#2  initStringInfo (str=str@entry=0x7fffffffcdc0) at stringinfo.c:99
#3  0x0000555555c34923 in errmsg (fmt=fmt@entry=0x555555d45c28
"terminating connection due to administrator command") at elog.c:1091
#4  0x00005555556b0b92 in ProcessInterrupts () at postgres.c:3393
#5  0x0000555555c90622 in wrapper_handler (postgres_signal_arg=15) at
pqsignal.c:116
#6  <signal handler called>
#7  0x00007ffff731ba8f in __GI___libc_read (fd=0, buf=0x55555615d5d0,
nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:26
#8  0x00007ffff72927a5 in _IO_new_file_underflow (fp=0x7ffff74038e0
<_IO_2_1_stdin_>) at ./libio/libioP.h:1030
#9  0x00007ffff72955d2 in __GI__IO_default_uflow (fp=0x7ffff74038e0
<_IO_2_1_stdin_>) at ./libio/libioP.h:1030
#10 0x0000555555ad6c3f in interactive_getc () at postgres.c:336
#11 InteractiveBackend (inBuf=0x7fffffffde10) at postgres.c:251
#12 ReadCommand (inBuf=0x7fffffffde10) at postgres.c:487
#13 PostgresMain (dbname=<optimized out>,
username=username@entry=0x5555560949f0 "reshke") at postgres.c:4741
#14 0x0000555555ad8eba in PostgresSingleUserMain (argc=argc@entry=4,
argv=argv@entry=0x55555608e150, username=0x5555560949f0 "reshke") at
postgres.c:4211
#15 0x00005555556dcd5d in main (argc=4, argv=0x55555608e150) at main.c:227
(gdb)

```


and then

```
(gdb) n
TRAP: failed Assert("!InSignalHandler"), File: "mcxt.c", Line: 1398,
PID: 1437985
/home/reshke/cpg/bin/bin/postgres(ExceptionalCondition+0x70)[0x555555c2ecb0]
/home/reshke/cpg/bin/bin/postgres(+0x70c616)[0x555555c60616]
/home/reshke/cpg/bin/bin/postgres(initStringInfo+0x12)[0x555555c85a42]
/home/reshke/cpg/bin/bin/postgres(errmsg+0xe3)[0x555555c34923]
/home/reshke/cpg/bin/bin/postgres(+0x15cb92)[0x5555556b0b92]
/home/reshke/cpg/bin/bin/postgres(+0x73c622)[0x555555c90622]
/lib/x86_64-linux-gnu/libc.so.6(+0x45330)[0x7ffff7245330]
/lib/x86_64-linux-gnu/libc.so.6(read+0xf)[0x7ffff731ba8f]
/lib/x86_64-linux-gnu/libc.so.6(_IO_file_underflow+0x165)[0x7ffff72927a5]
/lib/x86_64-linux-gnu/libc.so.6(_IO_default_uflow+0x32)[0x7ffff72955d2]
/home/reshke/cpg/bin/bin/postgres(PostgresMain+0x3ff)[0x555555ad6c3f]
/home/reshke/cpg/bin/bin/postgres(PostgresSingleUserMain+0xfa)[0x555555ad8eba]
/home/reshke/cpg/bin/bin/postgres(main+0x4ad)[0x5555556dcd5d]
/lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7ffff722a1ca]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7ffff722a28b]
/home/reshke/cpg/bin/bin/postgres(_start+0x25)[0x5555556dcdf5]

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized
out>) at ./nptl/pthread_kill.c:44
warning: 44    ./nptl/pthread_kill.c: No such file or directory
(gdb)

```

-- 
Best regards,
Kirill Reshke



Re: add assertion for palloc in signal handlers

От
Nathan Bossart
Дата:
On Wed, Feb 18, 2026 at 12:13:46PM +0500, Kirill Reshke wrote:
> Hi! I tested patch under --single (single user mode), and ISTM that we
> can reach palloc from ProcessInterrups here?
> Maybe we should not elog in single user mode inside sighandler...

This is reproducible even without single-user mode by sending SIGQUIT to a
client backend.  Both die() and quickdie() call ereport(), and both are
commonly used as signal handlers for SIGTERM and SIGQUIT.

-- 
nathan



Re: add assertion for palloc in signal handlers

От
Nathan Bossart
Дата:
On Wed, Feb 18, 2026 at 09:46:43AM -0600, Nathan Bossart wrote:
> On Wed, Feb 18, 2026 at 12:13:46PM +0500, Kirill Reshke wrote:
>> Hi! I tested patch under --single (single user mode), and ISTM that we
>> can reach palloc from ProcessInterrups here?
>> Maybe we should not elog in single user mode inside sighandler...
> 
> This is reproducible even without single-user mode by sending SIGQUIT to a
> client backend.  Both die() and quickdie() call ereport(), and both are
> commonly used as signal handlers for SIGTERM and SIGQUIT.

I also just got an assertion failure due to something in the startup
process.  Unfortunately I lost it before digging further, but I think it
was the proc_exit() call in StartupProcShutdownHandler().  I'll try to
reproduce it to get more details.  In any case, I'm a bit worried that
these assertions will cause quite a bit of noise.  I think we'd better
concoct a plan for dealing with them prior to inflicting the buildfarm.

-- 
nathan



Re: add assertion for palloc in signal handlers

От
Andres Freund
Дата:
Hi,

On 2026-02-18 09:46:43 -0600, Nathan Bossart wrote:
> On Wed, Feb 18, 2026 at 12:13:46PM +0500, Kirill Reshke wrote:
> > Hi! I tested patch under --single (single user mode), and ISTM that we
> > can reach palloc from ProcessInterrups here?
> > Maybe we should not elog in single user mode inside sighandler...
> 
> This is reproducible even without single-user mode by sending SIGQUIT to a
> client backend.  Both die() and quickdie() call ereport(), and both are
> commonly used as signal handlers for SIGTERM and SIGQUIT.

I don't think die() should call ereport() without being in single user mode. I
guess there's a corner case though, which is that the signal handler is
executed during exit processing, when we already reset whereToSendOutput.  I
think we probably should make sure this stuff is only reached in actual single
user mode.

I'm somewhat confused by

    /* Don't joggle the elbow of proc_exit */
    if (!proc_exit_inprogress)
    {
        InterruptPending = true;
        ProcDiePending = true;
    }

because the only thing that avoids is actually harmless stuff, we're not going
to notice that , whereas the ProcessInterrupts() done further down is actually
certainly "joggling the elbow of proc_exit().

<digs>

I think the comment hasn't quite kept up with the time, the block used to do
more. Some dude named Andres didn't remove the comment in commit 2505ce0be0b.

And another dude, confusingly also named Andres, in commit 4f85fde8eb86,
didn't think about the scenario that whereToSendOutput could already be reset
to DestNone in die()

Ooops.



quickdie() obviously does does reach the ereport().  I think it's a bad idea,
with a bad justification: For one, libpq makes up the same error reason if the
client just vanishes.  For another, because it just uses
WARNING[_CLIENT_ONLY], it'll not even reach clients if they use a higher
client_min_messages.

Just hoping that the client communication, including openssl, is in a state
that makes it somewhat safe to send messages from a signal handler, is
bonkers, IMNSHO.  Yes, we have the "safety" mechanism of postmaster SIGKILLing
processes, but that only protects against self deadlocks, not against
corrupted datastructures etc.

It's one thing to ereport() in a signal handler during a crash triggered
quickdie(), the shit already has hit the fan after all, but doing it as part
of an intended immediate shutdown is a seriously bad idea.


Greetings,

Andres Freund



Re: add assertion for palloc in signal handlers

От
Kirill Reshke
Дата:
On Thu, 19 Feb 2026 at 02:50, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2026-02-18 09:46:43 -0600, Nathan Bossart wrote:
> > On Wed, Feb 18, 2026 at 12:13:46PM +0500, Kirill Reshke wrote:
> > > Hi! I tested patch under --single (single user mode), and ISTM that we
> > > can reach palloc from ProcessInterrups here?
> > > Maybe we should not elog in single user mode inside sighandler...
> >
> > This is reproducible even without single-user mode by sending SIGQUIT to a
> > client backend.  Both die() and quickdie() call ereport(), and both are
> > commonly used as signal handlers for SIGTERM and SIGQUIT.
>
> I don't think die() should call ereport() without being in single user mode. I
> guess there's a corner case though, which is that the signal handler is
> executed during exit processing, when we already reset whereToSendOutput.  I
> think we probably should make sure this stuff is only reached in actual single
> user mode.

Hi! Thank you for your input.
Stupid question from me: in singleuser mode, how should this work? We
still should not ereport inside signal handlers, are we?
If singleuser mode is somehow safe with die & ereport then... I don't
understand how.

> I'm somewhat confused by
>
>         /* Don't joggle the elbow of proc_exit */
>         if (!proc_exit_inprogress)
>         {
>                 InterruptPending = true;
>                 ProcDiePending = true;
>         }
>
> because the only thing that avoids is actually harmless stuff, we're not going
> to notice that , whereas the ProcessInterrupts() done further down is actually
> certainly "joggling the elbow of proc_exit().
>
> <digs>
>
> I think the comment hasn't quite kept up with the time, the block used to do
> more. Some dude named Andres didn't remove the comment in commit 2505ce0be0b.
>
> And another dude, confusingly also named Andres, in commit 4f85fde8eb86,
> didn't think about the scenario that whereToSendOutput could already be reset
> to DestNone in die()
>
> Ooops.

Thanks for the history overview here. I think you can actually reach
Andres and ask him in-person about those commits, I heard he is also
employed by Microsoft.

>
> quickdie() obviously does does reach the ereport().  I think it's a bad idea,
> with a bad justification: For one, libpq makes up the same error reason if the
> client just vanishes.  For another, because it just uses
> WARNING[_CLIENT_ONLY], it'll not even reach clients if they use a higher
> client_min_messages.
>
> Just hoping that the client communication, including openssl, is in a state
> that makes it somewhat safe to send messages from a signal handler, is
> bonkers, IMNSHO.  Yes, we have the "safety" mechanism of postmaster SIGKILLing
> processes, but that only protects against self deadlocks, not against
> corrupted datastructures etc.
>
> It's one thing to ereport() in a signal handler during a crash triggered
> quickdie(), the shit already has hit the fan after all, but doing it as part
> of an intended immediate shutdown is a seriously bad idea.
>

So, after all, we need to remove ereport from quick die completely, no
backup plan?
This will make whole system less verbose about its shutdown reasons...
Not sure how vital that is

-- 
Best regards,
Kirill Reshke



Re: add assertion for palloc in signal handlers

От
Heikki Linnakangas
Дата:
On 19/02/2026 09:21, Kirill Reshke wrote:
> On Thu, 19 Feb 2026 at 02:50, Andres Freund <andres@anarazel.de> wrote:
>> I don't think die() should call ereport() without being in single user mode. I
>> guess there's a corner case though, which is that the signal handler is
>> executed during exit processing, when we already reset whereToSendOutput.  I
>> think we probably should make sure this stuff is only reached in actual single
>> user mode.

+1

> Hi! Thank you for your input.
> Stupid question from me: in singleuser mode, how should this work? We
> still should not ereport inside signal handlers, are we?
> If singleuser mode is somehow safe with die & ereport then... I don't
> understand how.

It's not safe to ereport() from die() in single-user mode either. We've 
just accepted it as the lesser evil:

>     /*
>      * If we're in single user mode, we want to quit immediately - we can't
>      * rely on interrupts as they wouldn't work when stdin/stdout is a file.
>      * Rather ugly, but it's unlikely to be worthwhile to invest much more
>      * effort just for the benefit of single user mode.
>      */
>     if (DoingCommandRead && whereToSendOutput != DestRemote)
>         ProcessTerminateInterrupt();

I don't quite understand that though. I understand that a read/write on 
a file might be uninterruptible. But so what? It presumably won't take 
very long, so it's fine to wait for the read/write to finish. Are we 
worried about getting stuck on an NFS read when the network is down, or 
something like that?

This was the discussion back then 
[https://www.postgresql.org/message-id/20150203202811.GA12566%40awork2.anarazel.de]:

>> Could you use WaitLatchOrSocket to sleep on stdin?
> 
> Unfortunately I don't think so. poll() etc only works properly on
> network handles, pipes etc - but stdin can be a file :(. And I think
> what exactly happens if it's a file fd isn't super well defined. On
> linux the file is always marked as ready, which would probably actually
> work...

I think that's actually wrong. POSIX says 
[https://pubs.opengroup.org/onlinepubs/9799919799/functions/poll.html]:

> The poll() and ppoll() functions shall support regular files,
> terminal and pseudo-terminal devices, FIFOs, pipes, and sockets. The
> behavior of poll() and ppoll() on elements of fds that refer to
> other types of file is unspecified.
> 
> Regular files shall always poll TRUE for reading and writing.

So that is pretty well-defined, and we could use poll() on stdin too.

>> I'm somewhat confused by
>>
>>          /* Don't joggle the elbow of proc_exit */
>>          if (!proc_exit_inprogress)
>>          {
>>                  InterruptPending = true;
>>                  ProcDiePending = true;
>>          }
>>
>> because the only thing that avoids is actually harmless stuff, we're not going
>> to notice that , whereas the ProcessInterrupts() done further down is actually
>> certainly "joggling the elbow of proc_exit().
>>
>> <digs>
>>
>> I think the comment hasn't quite kept up with the time, the block used to do
>> more. Some dude named Andres didn't remove the comment in commit 2505ce0be0b.
>>
>> And another dude, confusingly also named Andres, in commit 4f85fde8eb86,
>> didn't think about the scenario that whereToSendOutput could already be reset
>> to DestNone in die()
>>
>> Ooops.

Yeah, I've been wondering about that "joggle the elbow" comment too. We 
don't have that check in any other signal handlers that set 
InterruptPending. I believe we could just remove the if(...) and set 
InterruptPending and ProcDiePending unconditionally here.

>> quickdie() obviously does does reach the ereport().  I think it's a bad idea,
>> with a bad justification: For one, libpq makes up the same error reason if the
>> client just vanishes.  For another, because it just uses
>> WARNING[_CLIENT_ONLY], it'll not even reach clients if they use a higher
>> client_min_messages.
>>
>> Just hoping that the client communication, including openssl, is in a state
>> that makes it somewhat safe to send messages from a signal handler, is
>> bonkers, IMNSHO.  Yes, we have the "safety" mechanism of postmaster SIGKILLing
>> processes, but that only protects against self deadlocks, not against
>> corrupted datastructures etc.

PqCommBusy gives some protection from messing with openssl state. But 
yes, it's generally unsafe.

>> It's one thing to ereport() in a signal handler during a crash triggered
>> quickdie(), the shit already has hit the fan after all, but doing it as part
>> of an intended immediate shutdown is a seriously bad idea.

I don't see much difference between a crash and an "intended immediate 
shutdown". In both cases, you don't want to corrupt datastructures etc. 
more than you already have, but on the other hand the system is going 
down anyway.

> So, after all, we need to remove ereport from quick die completely, no
> backup plan?
> This will make whole system less verbose about its shutdown reasons...
> Not sure how vital that is

What's the proposed alternative to ereporting() from quickdie? If we 
just remove the ereport(), yeah, clients will miss out on the messages. 
Are we OK with that?

One alternative is to add more checks to quickdie(), and refrain from 
ereporting() if we're in the middle of an existing ereport(), for 
example, or some other such state. We probably cannot make it 100% safe 
with that approach, but maybe that's the right tradeoff.

Another alternative is to try to handle SIGQUIT more like SIGTERM, and 
delay the exiting until next CHECK_FOR_INTERRUPTS(). Maybe with a short 
timer to just SIGKILL if it the interrupt isn't processed quickly. It 
increases the risk that the process won't exit as quickly as we'd like, 
but maybe that's the right tradeoff.

- Heikki




Re: add assertion for palloc in signal handlers

От
Heikki Linnakangas
Дата:
On 19/02/2026 11:25, Heikki Linnakangas wrote:
> On 19/02/2026 09:21, Kirill Reshke wrote:
>> On Thu, 19 Feb 2026 at 02:50, Andres Freund <andres@anarazel.de> wrote:
>>> I don't think die() should call ereport() without being in single 
>>> user mode. I
>>> guess there's a corner case though, which is that the signal handler is
>>> executed during exit processing, when we already reset 
>>> whereToSendOutput.  I
>>> think we probably should make sure this stuff is only reached in 
>>> actual single
>>> user mode.
> 
> +1

Attached is a quick patch for that.

At first I considered replacing the "if (DoingCommandRead && 
whereToSendOutput != DestRemote)" check with "if (DoingCommandRead && 
whereToSendOutput == DestDebug)". That would be the minimal change to 
fix the confusion when a regular backend is exiting. But I think it's 
better to make this exception as narrow as possible in general. We only 
need the exit from the signal handler while we're in the middle of the 
uninterruptible getc(), not for any longer.

> So that is pretty well-defined, and we could use poll() on stdin too.

That said, WaitLatchOrSocket() et al. currently assume that the fd is a 
socket rather than a pipe or a file. It might work the same on most 
platforms, but I wonder about Windows. So it could be done, but it might 
require changes to waiteventset.c, which  might not be worth the trouble 
just for single-user mode.

Hmm, how do we do this for pipes in COPY? If waiteventset supported 
pipes, would it be useful for COPY too ?

- Heikki
Вложения

Re: add assertion for palloc in signal handlers

От
Andres Freund
Дата:
Hi,

On 2026-02-19 11:25:43 +0200, Heikki Linnakangas wrote:
> On 19/02/2026 09:21, Kirill Reshke wrote:
> > Hi! Thank you for your input.
> > Stupid question from me: in singleuser mode, how should this work? We
> > still should not ereport inside signal handlers, are we?
> > If singleuser mode is somehow safe with die & ereport then... I don't
> > understand how.
> 
> It's not safe to ereport() from die() in single-user mode either.

Well, the consequences of doing something stupid in single user mode are just
much less severe. There is the potential that doing invalid stuff in a signal
handler in a normal backend could lead to vulnerabilities that a user could
exploit, particularly if TLS is used, but that's not a threat in single user
mode.

That said, it'd obviously be much better if we stopped doing that stuff.


> We've just accepted it as the lesser evil:

> >     /*
> >      * If we're in single user mode, we want to quit immediately - we can't
> >      * rely on interrupts as they wouldn't work when stdin/stdout is a file.
> >      * Rather ugly, but it's unlikely to be worthwhile to invest much more
> >      * effort just for the benefit of single user mode.
> >      */
> >     if (DoingCommandRead && whereToSendOutput != DestRemote)
> >         ProcessTerminateInterrupt();
> 
> I don't quite understand that though. I understand that a read/write on a
> file might be uninterruptible. But so what? It presumably won't take very
> long, so it's fine to wait for the read/write to finish.

It can take a while, in some cases, even leaving NFS and such aside, think of
a redirect to a file where the kernel suddenly blocks you dirtying more
data. That can be a while. Of course, whether we actually care about that is
another question.

Right now the main issue would be that we don't actually do anything to get
the interrupt until further input has arrived. Think of the single user code
being in InteractiveBackend()->getc().


> > > Could you use WaitLatchOrSocket to sleep on stdin?
> > 
> > Unfortunately I don't think so. poll() etc only works properly on
> > network handles, pipes etc - but stdin can be a file :(. And I think
> > what exactly happens if it's a file fd isn't super well defined. On
> > linux the file is always marked as ready, which would probably actually
> > work...
> 
> I think that's actually wrong. POSIX says
> [https://pubs.opengroup.org/onlinepubs/9799919799/functions/poll.html]:
> 
> > The poll() and ppoll() functions shall support regular files,
> > terminal and pseudo-terminal devices, FIFOs, pipes, and sockets. The
> > behavior of poll() and ppoll() on elements of fds that refer to
> > other types of file is unspecified.
> > 
> > Regular files shall always poll TRUE for reading and writing.
> 
> So that is pretty well-defined, and we could use poll() on stdin too.

Pretty sure poll on pipes etc doesn't work on windows :(.



> > > quickdie() obviously does does reach the ereport().  I think it's a bad idea,
> > > with a bad justification: For one, libpq makes up the same error reason if the
> > > client just vanishes.  For another, because it just uses
> > > WARNING[_CLIENT_ONLY], it'll not even reach clients if they use a higher
> > > client_min_messages.
> > > 
> > > Just hoping that the client communication, including openssl, is in a state
> > > that makes it somewhat safe to send messages from a signal handler, is
> > > bonkers, IMNSHO.  Yes, we have the "safety" mechanism of postmaster SIGKILLing
> > > processes, but that only protects against self deadlocks, not against
> > > corrupted datastructures etc.
> 
> PqCommBusy gives some protection from messing with openssl state. But yes,
> it's generally unsafe.

Not sure how far that even helps us, as PqCommBusy isn't volatile, the
compiler is free to keep it in a register if it can prove that no code is
called that could use PqCommBusy. C runtime functions are often marked to not
access user variables outside of arguments...



> > > It's one thing to ereport() in a signal handler during a crash triggered
> > > quickdie(), the shit already has hit the fan after all, but doing it as part
> > > of an intended immediate shutdown is a seriously bad idea.
> 
> I don't see much difference between a crash and an "intended immediate
> shutdown". In both cases, you don't want to corrupt datastructures etc. more
> than you already have, but on the other hand the system is going down
> anyway.

An immediate shutdown may be executed in completely normal operations, whereas
a crash shutdown should only happen after shit already has hit the fan (and
thus there are already other issues).  I think there's some difference.



> > So, after all, we need to remove ereport from quick die completely, no
> > backup plan?
> > This will make whole system less verbose about its shutdown reasons...
> > Not sure how vital that is
> 
> What's the proposed alternative to ereporting() from quickdie? If we just
> remove the ereport(), yeah, clients will miss out on the messages. Are we OK
> with that?

I don't think the information that we do print is actually particularly
useful right now. What's the client going to do differently based on the
error messages?

The "unexpected SIGQUIT" one really is only relevant for PG developers.

I guess there's some possibility of a client benefiting from the difference
between a crash and an immediate shutdown. Hampered by it already not being
reliable that we can send this information to the client.


Without TLS in the game, this would be a lot easier. We could do a
non-blocking write to the socket with some non-translated message. But with
TLS...



> Another alternative is to try to handle SIGQUIT more like SIGTERM, and delay
> the exiting until next CHECK_FOR_INTERRUPTS(). Maybe with a short timer to
> just SIGKILL if it the interrupt isn't processed quickly. It increases the
> risk that the process won't exit as quickly as we'd like, but maybe that's
> the right tradeoff.

I'm doubtful that that's the right thing, we really don't want to have
backends continue any longer than possible if there was a
stuff-is-inconsistent style PANIC. They could screw up things further.

I guess I could get more on board doing so for an immediate shutdown, just to
give backends a chance to actually send a connection termination.

Greetings,

Andres Freund



Re: add assertion for palloc in signal handlers

От
Andres Freund
Дата:
Hi,

Thomas, added you to To for the last paragraph.


On 2026-02-19 11:55:20 +0200, Heikki Linnakangas wrote:
> On 19/02/2026 11:25, Heikki Linnakangas wrote:
> > On 19/02/2026 09:21, Kirill Reshke wrote:
> > > On Thu, 19 Feb 2026 at 02:50, Andres Freund <andres@anarazel.de> wrote:
> > > > I don't think die() should call ereport() without being in
> > > > single user mode. I
> > > > guess there's a corner case though, which is that the signal handler is
> > > > executed during exit processing, when we already reset
> > > > whereToSendOutput.  I
> > > > think we probably should make sure this stuff is only reached in
> > > > actual single
> > > > user mode.
> >
> > +1
>
> Attached is a quick patch for that.
>
> At first I considered replacing the "if (DoingCommandRead &&
> whereToSendOutput != DestRemote)" check with "if (DoingCommandRead &&
> whereToSendOutput == DestDebug)".

I had been wondering about !IsUnderPostmaster...


> That would be the minimal change to fix the confusion when a regular backend
> is exiting. But I think it's better to make this exception as narrow as
> possible in general. We only need the exit from the signal handler while
> we're in the middle of the uninterruptible getc(), not for any longer.

What if the client is printing out a lot of data and is stuck in something
like this:

#0  __GI__IO_fwrite (buf=0x7ffcc56b3f70, size=1, count=61, fp=0x7fa4765285c0 <_IO_2_1_stdout_>) at
./libio/iofwrite.c:32
#1  0x0000000000dd31fb in flushbuffer (target=0x7ffcc56b4370) at
../../../../../home/andres/src/postgresql/src/port/snprintf.c:310
#2  0x0000000000dd2ff3 in pg_vfprintf (stream=0x7fa4765285c0 <_IO_2_1_stdout_>,
    fmt=0x1003f30 "\t%2d: %s%s%s%s\t(typeid = %u, len = %d, typmod = %d, byval = %c)\n", args=0x7ffcc56b43c0)
    at ../../../../../home/andres/src/postgresql/src/port/snprintf.c:259
#3  0x0000000000dd3193 in pg_printf (fmt=0x1003f30 "\t%2d: %s%s%s%s\t(typeid = %u, len = %d, typmod = %d, byval =
%c)\n")
    at ../../../../../home/andres/src/postgresql/src/port/snprintf.c:288
#4  0x000000000061c5fe in printatt (attributeId=1, attributeP=0xe1554d0, value=0x0)
    at ../../../../../home/andres/src/postgresql/src/backend/access/common/printtup.c:428
#5  0x000000000061c657 in debugStartup (self=0x15eace0 <debugtupDR>, operation=1, typeinfo=0xe1554a8)
    at ../../../../../home/andres/src/postgresql/src/backend/access/common/printtup.c:454
#6  0x00000000008a3182 in standard_ExecutorRun (queryDesc=0xe145c10, direction=ForwardScanDirection, count=0)
    at ../../../../../home/andres/src/postgresql/src/backend/executor/execMain.c:351
#7  0x00000000008a3008 in ExecutorRun (queryDesc=0xe145c10, direction=ForwardScanDirection, count=0)
    at ../../../../../home/andres/src/postgresql/src/backend/executor/execMain.c:303
#8  0x0000000000b82699 in PortalRunSelect (portal=0xdfe68f0, forward=true, count=0, dest=0x15eace0 <debugtupDR>)
    at ../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:916
#9  0x0000000000b822f0 in PortalRun (portal=0xdfe68f0, count=9223372036854775807, isTopLevel=true, dest=0x15eace0
<debugtupDR>,
    altdest=0x15eace0 <debugtupDR>, qc=0x7ffcc56b47a0) at
../../../../../home/andres/src/postgresql/src/backend/tcop/pquery.c:760
#10 0x0000000000b7a66b in exec_simple_query (query_string=0xe143160 "select 1;\n")
    at ../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:1277
#11 0x0000000000b80302 in PostgresMain (dbname=0xdf9fb80 "postgres", username=0xdf1e520 "andres")
    at ../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4812
#12 0x0000000000b7f596 in PostgresSingleUserMain (argc=135, argv=0xdf17810, username=0xdf1e520 "andres")
    at ../../../../../home/andres/src/postgresql/src/backend/tcop/postgres.c:4214
#13 0x000000000093edbf in main (argc=135, argv=0xdf17810) at
../../../../../home/andres/src/postgresql/src/backend/main/main.c:227




> > So that is pretty well-defined, and we could use poll() on stdin too.
>
> That said, WaitLatchOrSocket() et al. currently assume that the fd is a
> socket rather than a pipe or a file. It might work the same on most
> platforms, but I wonder about Windows. So it could be done, but it might
> require changes to waiteventset.c, which  might not be worth the trouble
> just for single-user mode.

It'd be nice to support pipes, but the last time I looked at it, it'd be
rather hard on windows. From what I understand windows only supports
completion based IO for pipes, there's no support at all for readiness based
interfaces.  Which in turn doesn't mesh very well with portable assumptions
that can be make on other systems.

IIUC IPC::Run does some really nasty gymnastics to make pipes work
transparently-ish on windows (something like a proxy subprocess that then
executes the piped command, allowing perl to communicate with the proxy
process via a readiness supporting socket connection).


> Hmm, how do we do this for pipes in COPY? If waiteventset supported pipes,
> would it be useful for COPY too ?

I wonder if the PROGRAM is properly interruptible on windows right now? I
think we just rely on SIGINT on a process group reaching subprocesses, which
then terminates the executed program, stopping being blocked on the
program. But on windows there's no process groups IIRC.

So yes, proper pipe support would be rather useful.

Another place it'd be really useful for is the startup process, what we do
there right now to react immediately to signals is quite nasty.

I think Thomas did some work around this?


Greetings,

Andres Freund