Re: walsender vs. XLogBackgroundFlush during shutdown

Поиск
Список
Период
Сортировка
От Alexander Kukushkin
Тема Re: walsender vs. XLogBackgroundFlush during shutdown
Дата
Msg-id CAFh8B=mOHwum5x3OhZ-P2KLuz3ObHyc97p2cdOxF5tZFusw5sg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: walsender vs. XLogBackgroundFlush during shutdown  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Thu, 2 May 2019 at 14:35, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> >From the client side perspective, it confirmed everything that it
> >should, but from the postgres side, this is not enough to shut down
> >cleanly. Maybe it is possible to change the check (sentPtr ==
> >replicatedPtr) to something like (lastMsgSentPtr <= replicatedPtr) or
> >it would be unsafe?
>
> I don't know.
>
> In general I think it's a bit strange that we're waiting for walsender
> processes to catch up even in fast shutdown mode, instead of just aborting
> them like other backends. But I assume there are reasons for that. OTOH it
> makes us vulnerable to issues like this, when a (presumably) misbehaving
> downstream prevents a shutdown.

IMHO waiting until remote side received and flushed all changes is a
right strategy, but physical and logical replication should be handled
slightly differently.
For a physical replication we want to make sure that remote side
received and flushed all changes, otherwise in case of switchover we
won't be able to join the former primary as a new standby.
Logical replication case is a bit different. I think we can safely
shutdown walsender when the client confirmed the last XLogData
message, while now we are waiting until the client confirms wal_end
received in the keepalive message. If we shutdown walsender too early,
and do a switchover, the client might miss some events, because
logical slots are not replicated :(


> >No, it didn't stuck there. During the shutdown postgres starts sending
> >a few thousand keepalive messages per second and receives back so many
> >feedback message, therefore the chances of interrupting somewhere in
> >the send are quite high.
>
> Uh, that seems a bit broken, perhaps?

Indeed,  this is broken psycopg2 behavior :(
I am thinking about submitting a patch fixing it.

Actually I quickly skimmed through the pgjdbc logical replication
source code and example
https://jdbc.postgresql.org/documentation/head/replication.html and I
think that it will also cause problems with the shutdown.

Regards,
--
Alexander Kukushkin



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Inconsistent error message wording for REINDEX CONCURRENTLY
Следующее
От: Raghav Jajodia
Дата:
Сообщение: Google Season of Docs 2019 - PostgreSQL