Re: "could not reattach to shared memory" on buildfarm member dory

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: "could not reattach to shared memory" on buildfarm member dory
Дата
Msg-id 14554.1537811585@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: "could not reattach to shared memory" on buildfarm member dory  (Noah Misch <noah@leadboat.com>)
Ответы Re: "could not reattach to shared memory" on buildfarm member dory  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
> On Tue, May 01, 2018 at 11:31:50AM -0400, Tom Lane wrote:
>> Well, at this point the only thing that's entirely clear is that none
>> of the ideas I had work.  I think we are going to be forced to pursue
>> Noah's idea of doing an end-to-end retry.  Somebody else will need to
>> take point on that; I lack a Windows environment and have already done
>> a lot more blind patch-pushing than I like in this effort.

> Having tried this, I find a choice between performance and complexity.  Both
> of my designs use proc_exit(4) to indicate failure to reattach.  The simpler,
> slower design has WIN32 internal_forkexec() block until the child reports (via
> SetEvent()) that it reattached to shared memory.  This caused a fivefold
> reduction in process creation performance[1].

Ouch.

> The less-simple, faster design
> stashes the Port structure and retry count in the BackendList entry, which
> reaper() uses to retry the fork upon seeing status 4.  Notably, this requires
> new code for regular backends, for bgworkers, and for others.

Messy as that is, I think actually the worse problem with it is:

> In this proof of concept, the
> postmaster does not close its copy of a backend socket until the backend
> exits.

That seems unworkable because it would interfere with detection of client
connection drops.  But since you say this is just a POC, maybe you
intended to fix that?  It'd probably be all right for the postmaster to
hold onto the socket until the new backend reports successful attach,
using the same signaling mechanism you had in mind for the other way.

Overall, I agree that neither of these approaches are exactly attractive.
We're paying a heck of a lot of performance or complexity to solve a
problem that shouldn't even be there, and that we don't understand well.
In particular, the theory that some privileged code is injecting a thread
into every new process doesn't square with my results at
https://www.postgresql.org/message-id/15345.1525145612%40sss.pgh.pa.us

I think our best course of action at this point is to do nothing until
we have a clearer understanding of what's actually happening on dory.
Perhaps such understanding will yield an idea for a less painful fix.

            regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Query is over 2x slower with jit=on