Обсуждение: Order of operations in SubPostmasterMain()

Поиск
Список
Период
Сортировка

Order of operations in SubPostmasterMain()

От
Tom Lane
Дата:
I noticed that buildfarm member culicidae, which is running an
EXEC_BACKEND build on Linux, occasionally falls over like this:

FATAL:  could not reattach to shared memory (key=6280001, addr=0x7fa9df845000): Invalid argument

That's probably because Andres failed to disable ASLR on that machine,
but the exact cause isn't too important right now.  What I'm finding
distressing is that that message doesn't show up on the client side,
making it look like a postmaster crash:

*** /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/contrib/intarray/expected/_int.out    2016-09-15
22:02:39.512951557+0000
 
--- /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/contrib/intarray/results/_int.out    2016-09-29
17:52:56.921948612+0000
 
***************
*** 1,568 ****
! .... lots ...
--- 1,3 ----
! psql: server closed the connection unexpectedly
!     This probably means the server terminated abnormally
!     before or while processing the request.

The reason we don't see anything is that backend libpq hasn't been
initialized yet, so it won't send the ereport message to the client.

The original design of SubPostmasterMain() intended that such cases
would be reported, because it runs BackendInitialize (which calls
pq_init()) before CreateSharedMemoryAndSemaphores (which is where
the reattach used to happen --- the comments at postmaster.c 4730ff
and 4747ff reflect this expectation).  We broke it when we added
the earlier reattach call at lines 4669ff.

We could probably refactor things enough so that we do pq_init()
before PGSharedMemoryReAttach().  It would be a little bit ugly,
and it would fractionally increase the chance of a reattach failure
because pq_init() palloc's a few KB worth of buffers.  I'm not quite
sure if it's worth it; thoughts?  In any case the mentioned comments
are obsolete and need to be moved/rewritten.

Another thing that I'm finding distressing while I look at this is
the placement of ClosePostmasterPorts().  That needs to happen *early*,
because as long as the child process is holding open the postmaster side
of the postmaster death pipe, we are risking unhappiness.  Not only is
it not particularly early, it's after process_shared_preload_libraries()
which could invoke arbitrarily stupid stuff.  Whether or not we do
anything about moving pq_init(), I'm thinking we'd better move the
ClosePostmasterPorts() call so that it's done as soon as possible,
probably right after read_backend_variables() which loads the data
it needs.

Comments?
        regards, tom lane



Re: Order of operations in SubPostmasterMain()

От
Greg Stark
Дата:
On Thu, Sep 29, 2016 at 8:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We could probably refactor things enough so that we do pq_init()
> before PGSharedMemoryReAttach().  It would be a little bit ugly,
> and it would fractionally increase the chance of a reattach failure
> because pq_init() palloc's a few KB worth of buffers.  I'm not quite
> sure if it's worth it; thoughts?  In any case the mentioned comments
> are obsolete and need to be moved/rewritten.


Just speaking off the cuff without reviewing the code in detail...

Alternately we could call pq_init in the error path if it hasn't been
called yet. I'm sure there are problems with doing that in general but
for the specific errors that can happen before pq_init it might be
feasible since they obviously can't have very much shared state yet to
have corrupted.

-- 
greg



Re: Order of operations in SubPostmasterMain()

От
Tom Lane
Дата:
Greg Stark <stark@mit.edu> writes:
> On Thu, Sep 29, 2016 at 8:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We could probably refactor things enough so that we do pq_init()
>> before PGSharedMemoryReAttach().  It would be a little bit ugly,
>> and it would fractionally increase the chance of a reattach failure
>> because pq_init() palloc's a few KB worth of buffers.  I'm not quite
>> sure if it's worth it; thoughts?  In any case the mentioned comments
>> are obsolete and need to be moved/rewritten.

> Alternately we could call pq_init in the error path if it hasn't been
> called yet. I'm sure there are problems with doing that in general but
> for the specific errors that can happen before pq_init it might be
> feasible since they obviously can't have very much shared state yet to
> have corrupted.

Thanks for the suggestion, but I don't think it helps much.  Most of
the refactoring pain comes from the fact that we'd have to set up
MyProcPort earlier (since pqcomm.c needs that to be valid), and
it's not very clear just where to do that.  Having pq_init be called
behind the scenes does nothing to fix that issue. 
        regards, tom lane



Re: Order of operations in SubPostmasterMain()

От
Andres Freund
Дата:
On 2016-09-29 15:46:00 -0400, Tom Lane wrote:
> I noticed that buildfarm member culicidae, which is running an
> EXEC_BACKEND build on Linux, occasionally falls over like this:
> 
> FATAL:  could not reattach to shared memory (key=6280001, addr=0x7fa9df845000): Invalid argument
> 
> That's probably because Andres failed to disable ASLR on that machine,

Intentionally so, FWIW.  Want to enable PIE as well soon-ish.

Newer versions of windows / msvc want to enable ASLR by default, and I
think that'll break parallel query bgworkers, because they pass the
entry point around as a pointer. So I don't think we'll be able to duck
this issue for much longer.


> but the exact cause isn't too important right now.  What I'm finding
> distressing is that that message doesn't show up on the client side,
> making it look like a postmaster crash:

Indeed.


> The reason we don't see anything is that backend libpq hasn't been
> initialized yet, so it won't send the ereport message to the client.
> 
> The original design of SubPostmasterMain() intended that such cases
> would be reported, because it runs BackendInitialize (which calls
> pq_init()) before CreateSharedMemoryAndSemaphores (which is where
> the reattach used to happen --- the comments at postmaster.c 4730ff
> and 4747ff reflect this expectation).  We broke it when we added
> the earlier reattach call at lines 4669ff.
> 
> We could probably refactor things enough so that we do pq_init()
> before PGSharedMemoryReAttach().  It would be a little bit ugly,
> and it would fractionally increase the chance of a reattach failure
> because pq_init() palloc's a few KB worth of buffers.  I'm not quite
> sure if it's worth it; thoughts?  In any case the mentioned comments
> are obsolete and need to be moved/rewritten.

Hm.  It doesn't really seem necessary to directly allocate that
much. Seems pretty harmless to essentially let it start at 0 bytes (so
we can repalloc, but don't use a lot of memory)?


> Another thing that I'm finding distressing while I look at this is
> the placement of ClosePostmasterPorts().  That needs to happen *early*,
> because as long as the child process is holding open the postmaster side
> of the postmaster death pipe, we are risking unhappiness.  Not only is
> it not particularly early, it's after process_shared_preload_libraries()
> which could invoke arbitrarily stupid stuff.  Whether or not we do
> anything about moving pq_init(), I'm thinking we'd better move the
> ClosePostmasterPorts() call so that it's done as soon as possible,
> probably right after read_backend_variables() which loads the data
> it needs.

That seems like a good idea. Besides the benefit you mention, it looks
like that'll also reduce duplication.

Greetings,

Andres Freund