Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
Дата
Msg-id CAB7nPqTSX45uxdopuWwosodpZHSQXReWAw3gOqv2G6oyzsPJWg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Andres Freund <andres@anarazel.de>)
Ответы Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Fri, Jun 2, 2017 at 9:29 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:
>> On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund <andres@anarazel.de> wrote:
>> > I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
>> > Normally INT is used cancel interrupts, and since walsender is now also
>> > working as a normal backend, this overlap is bad.  Even for plain
>> > walsender backend this seems bad, because now non-superusers replication
>> > users will terminate replication connections when they do
>> > pg_cancel_backend().  For replication=dbname users it's especially bad
>> > because there can be completely legitimate uses of pg_cancel_backend().
>>
>> Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
>> for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
>> for a non-am_walsender backend. Am I missing something?
>
> Yes, but nothing in those observeration actually addresses my point?

I am still confused by your previous email, which, at least it seems
to me, implies that logical WAL senders have been working correctly
with query cancellations. Now SIGINT is just ignored, which means that
pg_cancel_backend() has never worked for WAL senders until now, and
this behavior has not changed with 086221c. So there is no new
breakage introduced by this commit. I get your point to reuse SIGINT
for query cancellations though, but that's a new feature.

> Some points:
>
> 1) 086221cf6b1727c2baed4703c582f657b7c5350e changes things so walsender
>    backends use SIGINT for WalSndLastCycleHandler(), which is now
>    triggerable by pg_cancel_backend(). Especially for logical rep
>    walsenders it's not absurd sending that.
> 2) Walsenders now can run normal queries.
> 3) Afaict 086221cf6b1727c2baed4703c582f657b7c5350e doesn't really
>    address the PANIC problem for database connected walsenders at all,
>    because it doesn't even cancel non-replication commands.  I.e. an
>    already running query can just continue to run.  Which afaict just
>    entirely breaks shutdown.  If the connection is idle, or running a
>    query, we'll just wait forever.
> 4) the whole logic introduced in the above commit doesn't actually
>    appear to handle logical decoding senders properly - wasn't the whole
>    issue at hand that those can write WAL in some case?  But
>    nevertheless WalSndWaitForWal() does a
>    WalSndSetState(WALSNDSTATE_STOPPING); *and then continues decoding
>    and waiting* - which seems to obviate the entire point of that commit.
>
> I'm working on a patch rejiggering things so:
>
> a) upon shutdown checkpointer (so we can use procsignal), not
>    postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all walsenders (so
>    we don't have to use up a normal signal handler)

You'll need a second one that wakes up the latch of the WAL senders to
send more WAL records.

> b) Upon reception walsenders immediately exit if !replication_active,
>    otherwise sets got_STOPPING

Okay, that's what happens now anyway, any new replication command
received results in an error. I actually prefer the way of doing in
HEAD, which at least reports an error.

> c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as
>    currently done, moving to WALSNDSTATE_STOPPING.  Not yet quite sure
>    how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().

Wouldn't it make sense to have the logical receivers be able to
receive WAL up to the end of checkpoint record?

> d) Once all remaining walsenders are in stopping state, postmaster sends
>    SIGUSR2 to trigger shutdown (basically as before)

OK.
-- 
Michael



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [JDBC] [HACKERS] Channel binding support for SCRAM-SHA-256