Re: Weird failure with latches in curculio on v15

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Weird failure with latches in curculio on v15
Дата
Msg-id 20230203074422.wx2ysoraqh46zqg2@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Weird failure with latches in curculio on v15  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Weird failure with latches in curculio on v15  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Weird failure with latches in curculio on v15  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
Hi,

On 2023-02-03 02:24:03 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Ugh, I think I might understand what's happening:
> 
> > The signal arrives just after the fork() (within system()). Because we
> > have all our processes configure themselves as process group leaders,
> > and we signal the entire process group (c.f. signal_child()), both the
> > child process and the parent will process the signal. So we'll end up
> > doing a proc_exit() in both. As both are trying to remove themselves
> > from the same PGPROC etc entry, that doesn't end well.
> 
> Ugh ...
> 
> > I don't see how we can solve that properly as long as we use system().
> 
> ... but I don't see how that's system()'s fault?  Doing the fork()
> ourselves wouldn't change anything about that.

If we did the fork ourselves, we'd temporarily change the signal mask
before the fork() and reset it immediately in the parent, but not in the
child. We can't do that with system(), because we don't get control back
early enough - we'd just block signals for the entire duration of
system().

I wonder if this shows a problem with the change in 14 to make pgarch.c
be attached to shared memory. Before that it didn't have to worry about
problems like the above in the archiver, but now we do. It's less severe
than the startup process issue, because we don't have a comparable
signal handler in pgarch, but still.

I'm e.g. not sure that there aren't issues with
procsignal_sigusr1_handler() or such executing in a forked process.


> > A workaround for the back branches could be to have a test in
> > StartupProcShutdownHandler() that tests if MyProcPid == getpid(), and
> > not do the proc_exit() if they don't match. We probably should just do
> > an _exit() in that case.
> 
> Might work.

I wonder if we should add code complaining loudly about such a mismatch
to proc_exit(), in addition to handling it more silently in
StartupProcShutdownHandler().   Also, an assertion in
[Auxiliary]ProcKill that proc->xid == MyProcPid == getpid() seems like a
good idea.


> > OTOH, the current approach only works on systems with setsid(2) support,
> > so we probably shouldn't rely so hard on it anyway.
> 
> setsid(2) is required since SUSv2, so I'm not sure which systems
> are of concern here ... other than Redmond's of course.

I was thinking of windows, yes.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: In-placre persistance change of a relation
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Weird failure with latches in curculio on v15