Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
Дата
Msg-id 20170602002912.tqlwn4gymzlxpvs2@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Michael Paquier <michael.paquier@gmail.com>)
Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 2017-06-02 08:38:51 +0900, Michael Paquier wrote:
> On Fri, Jun 2, 2017 at 7:05 AM, Andres Freund <andres@anarazel.de> wrote:
> > I'm a unhappy how this is reusing SIGINT for WalSndLastCycleHandler.
> > Normally INT is used cancel interrupts, and since walsender is now also
> > working as a normal backend, this overlap is bad.  Even for plain
> > walsender backend this seems bad, because now non-superusers replication
> > users will terminate replication connections when they do
> > pg_cancel_backend().  For replication=dbname users it's especially bad
> > because there can be completely legitimate uses of pg_cancel_backend().
>
> Signals for WAL senders are set in WalSndSignals() which uses SIG_IGN
> for SIGINT now in ~9.6, and StatementCancelHandler does not get set up
> for a non-am_walsender backend. Am I missing something?

Yes, but nothing in those observeration actually addresses my point?

Some points:

1) 086221cf6b1727c2baed4703c582f657b7c5350e changes things so walsender  backends use SIGINT for
WalSndLastCycleHandler(),which is now  triggerable by pg_cancel_backend(). Especially for logical rep  walsenders it's
notabsurd sending that.
 
2) Walsenders now can run normal queries.
3) Afaict 086221cf6b1727c2baed4703c582f657b7c5350e doesn't really  address the PANIC problem for database connected
walsendersat all,  because it doesn't even cancel non-replication commands.  I.e. an  already running query can just
continueto run.  Which afaict just  entirely breaks shutdown.  If the connection is idle, or running a  query, we'll
justwait forever.
 
4) the whole logic introduced in the above commit doesn't actually  appear to handle logical decoding senders properly
-wasn't the whole  issue at hand that those can write WAL in some case?  But  nevertheless WalSndWaitForWal() does a
WalSndSetState(WALSNDSTATE_STOPPING);*and then continues decoding  and waiting* - which seems to obviate the entire
pointof that commit.
 

I'm working on a patch rejiggering things so:

a) upon shutdown checkpointer (so we can use procsignal), not  postmaster, sends PROCSIG_WALSND_INIT_STOPPING to all
walsenders(so  we don't have to use up a normal signal handler)
 
b) Upon reception walsenders immediately exit if !replication_active,  otherwise sets got_STOPPING
c) Logical decoding will finish if got_STOPPING is sent, *WITHOUT*, as  currently done, moving to WALSNDSTATE_STOPPING.
Not yet quite sure  how to best handle the WALSNDSTATE_STOPPING transition in WalSndLoop().
 
d) Once all remaining walsenders are in stopping state, postmaster sends  SIGUSR2 to trigger shutdown (basically as
before)

Does that seem to make sense?

- Andres



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: [HACKERS] Perfomance bug in v10
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher