Обсуждение: Re: [HACKERS] [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.

Поиск

Список

Период

Сортировка

Re: [HACKERS] [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.

От

Tom Lane

Дата:

16 сентября 2017 г., 20:32:29

Andres Freund <andres@anarazel.de> writes:
> Perform only one ReadControlFile() during startup.

This patch or something closely related to it has broken the postmaster's
ability to recover from a backend crash.  For example, after exercising
the backend crash Andreas just reported:

regression=# select from information_schema.user_mapping_options;
server closed the connection unexpectedly       This probably means the server terminated abnormally       before or
whileprocessing the request.
 
The connection to the server was lost. Attempting reset: Failed.
!> \q

attempting to reconnect fails, because the postmaster isn't there.
It left a core file behind though, in which I find

Program terminated with signal 11, Segmentation fault.
#0  0x000000000088b792 in GetMemoryChunkContext (pointer=0x7fdb36fc3f00)   at
../../../../src/include/utils/memutils.h:124
124             AssertArg(MemoryContextIsValid(context));
(gdb) bt
#0  0x000000000088b792 in GetMemoryChunkContext (pointer=0x7fdb36fc3f00)   at
../../../../src/include/utils/memutils.h:124
#1  pfree (pointer=0x7fdb36fc3f00) at mcxt.c:951
#2  0x0000000000512843 in XLOGShmemInit () at xlog.c:4897
#3  0x0000000000737fd9 in CreateSharedMemoryAndSemaphores (   makePrivate=0 '\000', port=5440) at ipci.c:220
#4  0x00000000006e4a78 in reset_shared () at postmaster.c:2516
#5  PostmasterStateMachine () at postmaster.c:3832
#6  0x00000000006e541d in reaper (postgres_signal_arg=<value optimized out>)   at postmaster.c:3081
#7  <signal handler called>
#8  0x0000003b78ae1603 in __select_nocancel ()   at ../sysdeps/unix/syscall-template.S:82
#9  0x00000000008a432a in pg_usleep (microsec=<value optimized out>)   at pgsleep.c:56
#10 0x00000000006e75d7 in ServerLoop (argc=<value optimized out>,    argv=<value optimized out>) at postmaster.c:1705
#11 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)   at postmaster.c:1364

It's dying at "pfree(localControlFile)".  localControlFile seems to
be pointing at a region of memory that's entirely zeroes; certainly
the data that it just moved into shared memory is all zeroes.
It looks like someone didn't think hard enough about when to reset
ControlFile to null.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Perform only one ReadControlFile() duringstartup.

От

Andres Freund

Дата:

17 сентября 2017 г., 00:26:34

Hi,

On 2017-09-16 10:32:29 -0400, Tom Lane wrote:
> It's dying at "pfree(localControlFile)".  localControlFile seems to
> be pointing at a region of memory that's entirely zeroes; certainly
> the data that it just moved into shared memory is all zeroes.
> It looks like someone didn't think hard enough about when to reset
> ControlFile to null.

Yep. Just adding a LocalProcessControlFile() reset_shared() works after
removing the assert from LocalProcessControlFile().  That's not what I'm
proposing, just confirming that that's the issue.

Looking into it.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.

От

Tom Lane

Дата:

17 сентября 2017 г., 00:30:21

Andres Freund <andres@anarazel.de> writes:
> Looking into it.

I wonder whether we shouldn't just revert this patch altogether.
Certainly, extra reads of pg_control are not a noticeable performance
problem.  I'm now quite worried about whether we aren't introducing
hazards of using stale values from the file; if a system crash isn't
enough to get it to flush its cache, then what is?
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Andres Freund

Дата:

17 сентября 2017 г., 00:38:49

On 2017-09-16 14:30:21 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Looking into it.
> 
> I wonder whether we shouldn't just revert this patch altogether.
> Certainly, extra reads of pg_control are not a noticeable performance
> problem.

The problem is that the patch that makes the segment size configurable
also adds a bunch more ordering constraints due to the fact that the
contents of the control file influence how much shared buffers are
needed (via wal_buffers = -1, which requires the segment size, which is
read from the control file).  Reading things in the wrong order leads to
bad results too.

> I'm now quite worried about whether we aren't introducing
> hazards of using stale values from the file; if a system crash isn't
> enough to get it to flush its cache, then what is?

I don't think the problem here is stale values, it's "just" a stale
pointer pointing into shared memory that gets reiniitalized?

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.

От

Tom Lane

Дата:

17 сентября 2017 г., 01:59:01

Andres Freund <andres@anarazel.de> writes:
> On 2017-09-16 14:30:21 -0400, Tom Lane wrote:
>> I wonder whether we shouldn't just revert this patch altogether.

> The problem is that the patch that makes the segment size configurable
> also adds a bunch more ordering constraints due to the fact that the
> contents of the control file influence how much shared buffers are
> needed (via wal_buffers = -1, which requires the segment size, which is
> read from the control file).  Reading things in the wrong order leads to
> bad results too.

Maybe we could redefine wal_buffers to avoid that.  Or read the control
file at the time where we need it.  This does not seem like a problem
that justifies a system-wide change that's much more delicate than
you thought.

>> I'm now quite worried about whether we aren't introducing
>> hazards of using stale values from the file; if a system crash isn't
>> enough to get it to flush its cache, then what is?

> I don't think the problem here is stale values, it's "just" a stale
> pointer pointing into shared memory that gets reiniitalized?

The code that's crashing is attempting to install some pre-existing
copy of pg_control into shared memory.  Even if there were good reason
to think the copy were up to date, I'm not sure we should trust it
in a crash recovery context.  I'd rather see this hunk of code just
playing dumb and reading the file (which, I'm pretty sure, is what
it did up till this week).
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Andres Freund

Дата:

17 сентября 2017 г., 02:27:05

On 2017-09-16 15:59:01 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-09-16 14:30:21 -0400, Tom Lane wrote:
> >> I wonder whether we shouldn't just revert this patch altogether.
> 
> > The problem is that the patch that makes the segment size configurable
> > also adds a bunch more ordering constraints due to the fact that the
> > contents of the control file influence how much shared buffers are
> > needed (via wal_buffers = -1, which requires the segment size, which is
> > read from the control file).  Reading things in the wrong order leads to
> > bad results too.
> 
> Maybe we could redefine wal_buffers to avoid that.  Or read the control
> file at the time where we need it.

I'm curious - do you have a good idea how to do so? It's a bit too much
magic right now (assuming the wal_segment_size patch is applied),
depending on both wal_segment_size and shared_buffers.

> This does not seem like a problem that justifies a system-wide change
> that's much more delicate than you thought.

We need one more initialization call during crash-restart - that doesn't
seem particularly hard a fix. And see below, the alternatives aren't
pretty.

> >> I'm now quite worried about whether we aren't introducing
> >> hazards of using stale values from the file; if a system crash isn't
> >> enough to get it to flush its cache, then what is?
> 
> > I don't think the problem here is stale values, it's "just" a stale
> > pointer pointing into shared memory that gets reiniitalized?
> 
> The code that's crashing is attempting to install some pre-existing
> copy of pg_control into shared memory.  Even if there were good reason
> to think the copy were up to date, I'm not sure we should trust it
> in a crash recovery context.

We definitely should re-read it, agreed.  Note there's not really a copy
here, we just assume so because the ControlFile variable isn't NULL - in
hindsight a separate bool signalling the existance of the already read
data would've probably been a good idea.

> I'd rather see this hunk of code just playing dumb and reading the
> file (which, I'm pretty sure, is what it did up till this week).

The problem is that every process needs the results of the
ReadControlFile() call. Which adjusts GUCs and other process local
variables.  We can either start ferrying a lot more knowledge through
the backend variable stuff, by adding a bunch more special-case code to
the restore patch, or we're going to have to execute ReadControlFile()
in EXEC_BACKEND processes. But we can't do so once shared memory is
set-up because that'd constantly happen while other processes write to
the ControlFile.

Therefore my approach was to do a ReadControlFile() into local memory at
postmaster start (missing the crash-restart case that should also have
done so), and only do other reads in the EXEC_BACKEND case, before it
uses shared memory.

Alternatively or additionally we could split ReadControlFile() into two
parts, one that does various SetConfigOption({data_checksums,
wal_segment_size})s, computes UsableBytesInSegment etc, and one that
actually reads the file. The former would then liberally sprinkled
everywhere, because it has only process-local effects.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Andres Freund

Дата:

17 сентября 2017 г., 14:07:52

On 2017-09-16 13:27:05 -0700, Andres Freund wrote:
> > This does not seem like a problem that justifies a system-wide change
> > that's much more delicate than you thought.
> 
> We need one more initialization call during crash-restart - that doesn't
> seem particularly hard a fix.

FWIW, attached is that simple fix. Not quite ready for commit - needs
more comments, thoughts and a glass of wine less to commit.

I'll try to come up with a tap test that tests crash restarts tomorrow -
not sure if there's a decent way to trigger one on windows without
writing a C function. Will play around with that tomorrow.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

0001-Preliminary-fix-for-crash-restart.patch

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Andres Freund

Дата:

18 сентября 2017 г., 16:32:13

On 2017-09-17 01:07:52 -0700, Andres Freund wrote:
> On 2017-09-16 13:27:05 -0700, Andres Freund wrote:
> > > This does not seem like a problem that justifies a system-wide change
> > > that's much more delicate than you thought.
> > 
> > We need one more initialization call during crash-restart - that doesn't
> > seem particularly hard a fix.
> 
> FWIW, attached is that simple fix. Not quite ready for commit - needs
> more comments, thoughts and a glass of wine less to commit.
> 
> I'll try to come up with a tap test that tests crash restarts tomorrow -
> not sure if there's a decent way to trigger one on windows without
> writing a C function. Will play around with that tomorrow.

This took a bit longer than I anticipated. Attached.  There's
surprisingly many issues with timing that make this not as easy as I
hoped.  I think in later commits we should extend this to cover things
like the autovacuum / stats process / ... being killed - but that seems
like material for a separate commit.

One thing that I've noticed for a while, but that I was reminded of
again here. We very frequently allow psql to reconnect in case of crash,
just for postmaster to notice a child has gone and kill that session. I
don't recall that frequently happening, but these days it happens nearly
every time.

Regards,

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

0001-Add-tests-for-postmaster-crash-restarts.patch

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Robert Haas

Дата:

18 сентября 2017 г., 22:16:42

On Mon, Sep 18, 2017 at 6:32 AM, Andres Freund <andres@anarazel.de> wrote:
> One thing that I've noticed for a while, but that I was reminded of
> again here. We very frequently allow psql to reconnect in case of crash,
> just for postmaster to notice a child has gone and kill that session. I
> don't recall that frequently happening, but these days it happens nearly
> every time.

I don't understand what you're talking about here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Simon Riggs

Дата:

18 сентября 2017 г., 22:55:20

On 16 September 2017 at 21:27, Andres Freund <andres@anarazel.de> wrote:
> On 2017-09-16 15:59:01 -0400, Tom Lane wrote:

>> This does not seem like a problem that justifies a system-wide change
>> that's much more delicate than you thought.

+1

The change/reason ratio is too high.


-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Andres Freund

Дата:

19 сентября 2017 г., 00:04:39

On 2017-09-18 12:16:42 -0400, Robert Haas wrote:
> On Mon, Sep 18, 2017 at 6:32 AM, Andres Freund <andres@anarazel.de> wrote:
> > One thing that I've noticed for a while, but that I was reminded of
> > again here. We very frequently allow psql to reconnect in case of crash,
> > just for postmaster to notice a child has gone and kill that session. I
> > don't recall that frequently happening, but these days it happens nearly
> > every time.
> 
> I don't understand what you're talking about here.

I often see a backend crash, psql reacting to that crash by
reconnecting, successfully establish a new connection, just to be kicked
off by postmaster that does the crash restart cycle.  I've not yet
figured out when exactly this happens and when not.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.

От

Tom Lane

Дата:

19 сентября 2017 г., 01:33:29

Andres Freund <andres@anarazel.de> writes:
> On 2017-09-18 12:16:42 -0400, Robert Haas wrote:
>> I don't understand what you're talking about here.

> I often see a backend crash, psql reacting to that crash by
> reconnecting, successfully establish a new connection, just to be kicked
> off by postmaster that does the crash restart cycle.  I've not yet
> figured out when exactly this happens and when not.

It seems like this must indicate that psql sees the connection drop
significantly before the postmaster gets SIGCHLD.  Which is weird, but
if you have core dumps enabled, maybe your kernel is doing something like
(1) close files, (2) dump core, (3) exit process (resulting in SIGCHLD)?

I don't think I've ever seen this myself.  Peculiarity of some kernels,
perhaps.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Robert Haas

Дата:

19 сентября 2017 г., 17:48:42

On Mon, Sep 18, 2017 at 2:04 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-09-18 12:16:42 -0400, Robert Haas wrote:
>> On Mon, Sep 18, 2017 at 6:32 AM, Andres Freund <andres@anarazel.de> wrote:
>> > One thing that I've noticed for a while, but that I was reminded of
>> > again here. We very frequently allow psql to reconnect in case of crash,
>> > just for postmaster to notice a child has gone and kill that session. I
>> > don't recall that frequently happening, but these days it happens nearly
>> > every time.
>>
>> I don't understand what you're talking about here.
>
> I often see a backend crash, psql reacting to that crash by
> reconnecting, successfully establish a new connection, just to be kicked
> off by postmaster that does the crash restart cycle.  I've not yet
> figured out when exactly this happens and when not.

Oh, I've not seen that.  Mostly, what I think we should fix is the
fact that the libpq messages tend to report that the server crashed
even if it was an orderly shutdown.

[rhaas ~]$ psql
psql (11devel)
Type "help" for help.

rhaas=# select 1;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly   This probably means the server terminated abnormally   before or while
processingthe request.

The connection to the server was lost. Attempting reset: Failed.

It's sort of funny (but also sort of sad) that we've got libpq talking
down the server's reliability (and even in the face of evidence which
manifestly contradicts it).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Andres Freund

Дата:

19 сентября 2017 г., 22:10:12

On 2017-09-19 07:48:42 -0400, Robert Haas wrote:
> Oh, I've not seen that.  Mostly, what I think we should fix is the
> fact that the libpq messages tend to report that the server crashed
> even if it was an orderly shutdown.
> 
> [rhaas ~]$ psql
> psql (11devel)
> Type "help" for help.
> 
> rhaas=# select 1;
> FATAL:  terminating connection due to administrator command
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> 
> It's sort of funny (but also sort of sad) that we've got libpq talking
> down the server's reliability (and even in the face of evidence which
> manifestly contradicts it).

Yea, I'm not very happy with that either. I really think we should stop
emitting that if we got an actual message from the server.

Unfortunately the backends themselves also react with inaccurate error
messages to things like immediate shutdowns...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.

От

Tom Lane

Дата:

19 сентября 2017 г., 22:24:00

Andres Freund <andres@anarazel.de> writes:
> Unfortunately the backends themselves also react with inaccurate error
> messages to things like immediate shutdowns...

Yeah, those signals are kind of overloaded these days.  Not sure if
there's any good way to improve that.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Andres Freund

Дата:

19 сентября 2017 г., 22:51:11

On 2017-09-19 12:24:00 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Unfortunately the backends themselves also react with inaccurate error
> > messages to things like immediate shutdowns...
> 
> Yeah, those signals are kind of overloaded these days.  Not sure if
> there's any good way to improve that.

We could use the procsignal infrastructure (or some pared down
equivalent with just one 'server-status' byte in shmem) for the
non-crash immediate shutdown. That'd not be that a crazy amount of
shared memory that'd need to be touched in shared memory, and we're not
in an actual crash in that case, so no corrupted shmem. OTOH, that'd
still leave us with some processes that aren't connected to shmem
receiving the bare SIGQUIT - but I think they mostly already have more
terse error messages anyway.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Robert Haas

Дата:

19 сентября 2017 г., 23:00:33

On Tue, Sep 19, 2017 at 12:51 PM, Andres Freund <andres@anarazel.de> wrote:
> That'd not be that a crazy amount of
> shared memory that'd need to be touched in shared memory, ...

You mean, in the postmaster?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Andres Freund

Дата:

19 сентября 2017 г., 23:08:21

On 2017-09-19 13:00:33 -0400, Robert Haas wrote:
> On Tue, Sep 19, 2017 at 12:51 PM, Andres Freund <andres@anarazel.de> wrote:
> > That'd not be that a crazy amount of
> > shared memory that'd need to be touched in shared memory, ...
> 
> You mean, in the postmaster?

Yes. We try to avoid touch shmem there, but it's not like we're
succeeding fully. See e.g. the pgstat_get_crashed_backend_activity()
calls (which do rely on shmem being ok to some extent), pmsignal,
BackgroundWorkerStateChange(), ...

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.

От

Tom Lane

Дата:

19 сентября 2017 г., 23:15:28

Andres Freund <andres@anarazel.de> writes:
> On 2017-09-19 13:00:33 -0400, Robert Haas wrote:
>> You mean, in the postmaster?

> Yes. We try to avoid touch shmem there, but it's not like we're
> succeeding fully. See e.g. the pgstat_get_crashed_backend_activity()
> calls (which do rely on shmem being ok to some extent), pmsignal,
> BackgroundWorkerStateChange(), ...

Well, the point is to avoid touching data structures that could be
corrupted enough to confuse the postmaster.  I don't have any problem with
adding some more functionality to pmsignal, say.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Perform only oneReadControlFile() during startup.

От

Andres Freund

Дата:

20 сентября 2017 г., 00:24:17

On 2017-09-19 13:15:28 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-09-19 13:00:33 -0400, Robert Haas wrote:
> >> You mean, in the postmaster?
>
> > Yes. We try to avoid touch shmem there, but it's not like we're
> > succeeding fully. See e.g. the pgstat_get_crashed_backend_activity()
> > calls (which do rely on shmem being ok to some extent), pmsignal,
> > BackgroundWorkerStateChange(), ...
>
> Well, the point is to avoid touching data structures that could be
> corrupted enough to confuse the postmaster.  I don't have any problem with
> adding some more functionality to pmsignal, say.

Given that we're ok with reading pgstat shared memory entries, I think
adding a carefully coded variant of SendProcSignal() should be doable in
a safe manner.

Something roughly like

int
PostmasterSendProcSignal(pid_t pid, ProcSignalReason reason)
{   volatile ProcSignalSlot *slot;
   /*    * As this is running in postmaster, be careful not to dereference    * any pointers from shared memory that
couldbe corrupted, and to    * not to throw errors.    */

   for (i = 0; i < NumProcSignalSlots; i++)   {       slot = &ProcSignalSlots[i];
       if (slot->pss_pid == pid)       {           /*            * The note about race conditions in SendProcSignal
applies           * here, too            */

           /* Atomically set the proper flag */           slot->pss_signalFlags[reason] = true;           /* Send
signal*/           return kill(pid, SIGUSR1);       }   }

   errno = ESRCH;   return -1;
}

As all the memory offsets are computed based on postmaster process-local
variables, this should be safe.

I'd rather like to avoid a copy of the procsignal infrastructure if we
don't need it...

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: [HACKERS] [COMMITTERS] pgsql: Perform only one ReadControlFile() during startup.

Вложения

Вложения