Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher
Дата
Msg-id CAB7nPqTHQJ7zODLCLmhg9oz03qow6eh27NhEqzT+CT64eku5Ug@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] logical replication and PANIC during shutdowncheckpoint in publisher  (Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Список pgsql-hackers
On Sun, Apr 23, 2017 at 10:15 AM, Petr Jelinek
<petr.jelinek@2ndquadrant.com> wrote:
> On 21/04/17 06:11, Michael Paquier wrote:
>> On Fri, Apr 21, 2017 at 12:29 AM, Peter Eisentraut
>> <peter.eisentraut@2ndquadrant.com> wrote:
>> Hmm. I have been actually looking at this solution and I am having
>> doubts regarding its robustness. In short this would need to be
>> roughly a two-step process:
>> - In PostmasterStateMachine(), SIGUSR2 is sent to the checkpoint to
>> make it call ShutdownXLOG(). Prior doing that, a first signal should
>> be sent to all the WAL senders with
>> SignalSomeChildren(BACKEND_TYPE_WALSND). SIGUSR2 or SIGINT could be
>> used.
>> - At reception of this signal, all WAL senders switch to a stopping
>> state, refusing commands that can generate WAL.
>> - Checkpointer looks at the state of all WAL senders, looping with a
>> sleep call of a couple of ms, refusing to launch the shutdown
>> checkpoint as long as all WAL senders have not switched to the
>> stopping state.
>> - In reaper(), once checkpointer is confirmed as stopped, signal again
>> the WAL senders, and tell them to perform the last loop.

OK, I have been hacking that, finishing with the attached. In the
attached I am using SIGUSR2 to instruct the WAL senders to prepare for
stopping, and SIGINT to handle the last WAL flush loop. The shutdown
checkpoint moves on only if all active WAL senders are marked with a
STOPPING state. Reviews as welcome.

>> After that, I got a second, more simple idea.
>> CheckpointerShmem->ckpt_flags holds the information about checkpoints
>> currently running, so we could have the WAL senders look at this data
>> and prevent any commands generating WAL. The checkpointer may be
>> already stopped at the moment the WAL senders finish their loop, so we
>> need also to check if the checkpointer is running or not on those code
>> paths. Such safeguards may actually be enough for the problem of this
>> thread. Thoughts?
>>
>
> Hmm but how do we handle statements that are already in progress by the
> time ckpt_flags changes?

Yup, this does not handle well race conditions.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] TAP tests - installcheck vs check
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] identity columns