Re: Parallel worker hangs while handling errors.

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Parallel worker hangs while handling errors.
Дата
Msg-id CA+TgmoZAya0STWjtHdhr1FgeSV4EtLf69DkYFSG4ZjbHPLe7gw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parallel worker hangs while handling errors.  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: Parallel worker hangs while handling errors.  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Re: Parallel worker hangs while handling errors.  (vignesh C <vignesh21@gmail.com>)
Список pgsql-hackers
On Tue, Jul 28, 2020 at 5:35 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> The v4 patch looks good to me. Hang is not seen, make check and make
> check-world passes. I moved this to the committer for further review
> in https://commitfest.postgresql.org/29/2636/.

I don't think I agree with this approach. In particular, I don't
understand the rationale for unblocking only SIGUSR1. Above, Vignesh
says that he feels that unblocking only that signal would be the right
approach, but no reason is given. I have two reasons why I suspect
it's not the right approach. One, it doesn't seem to be what we do
elsewhere; the only existing cases where we have special handling for
particular signals are SIGQUIT and SIGPIPE, and those places have
comments explaining the reason why they are handled in a special way.
Two, SIGUSR1 is used for a LOT of things: look at all the different
cases procsignal_sigusr1_handler() checks. If the intention is to only
allow the things we know are safe, rather than all the signals there
are, I think this coding utterly fails to achieve that - and for
reasons that I don't think are really fixable.

My first idea about how to fix this was just to call
BackgroundWorkerUnblockSignals() before sigsetjmp(), but that doesn't
really work, because ParallelWorkerMain() needs to set the handler for
SIGTERM before unblocking signals. When you really look at it, the
code that does sigsetjmp() in StartBackgroundWorker() is entirely
bogus. The comment says "See notes in postgres.c about the design of
this coding," but if you go read that comment, it says that the point
of using sigsetjmp() is to make sure that signals are unblocked within
the if-block that follows, but the use in bgworker.c actually achieves
exactly the opposite, because signals have not yet been unblocked at
this point. So, whereas the postgres.c code unblocks signals if they
are blocked, this code blocks signals if they are unblocked. Given
that, maybe the right thing to do is to just start the if-block with a
call to BackgroundWorkerUnblockSignals(). Perhaps there's some reason
that would be unsafe, if the failure occurs too early: postgres.c
doesn't unblock signals until after BaseInit() and InitProcess() have
been called, but here an error in those functions would unblock
signals while it's being handled. Off-hand, I don't see why that would
matter, though. In the postgres.c case, there wouldn't be a
PG_exception_stack yet, so we'd end up in the long part of
pg_re_throw(), which basically promotes the ERROR to FATAL but
otherwise does pretty similar things to what this handler does anyway.
So I'm not really sure there's any reason not to just go this way.

Another approach would be to establish a new PG_exception_stack() with
a free sigsetjmp() call and fresh local buffer, inside
ParallelWorkerMain(). It would do the same thing as the existing
handler, but because it would be established after unblocking signals,
sigsetjmp() would behave as desired. This doesn't seem quite as good
to me because I think this pattern might end up getting copied into
many background workers, and it's a bunch of extra code for which I
don't really see a clear need. So at the moment I think a one line fix
in StartBackgroundWorker(), to just unblock signals while handling
errors, looks better.

Adding Alvaro to the CC line, since I think he wrote this code
originally. Not sure if he or anyone else might have an opinion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Release notes for next week's back-branch releases
Следующее
От: Fabrízio de Royes Mello
Дата:
Сообщение: Re: pg_dump bug for extension owned tables