Re: Failback to old master

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Failback to old master
Дата
Msg-id CA+TgmoYDRgOBKY5L4rnpJTNfdE8YJf3dLf76o9t7h5qv=U=TJw@mail.gmail.com
обсуждение исходный текст
Ответ на Failback to old master  ("Maeldron T." <maeldron@gmail.com>)
Ответы Re: Failback to old master
Список pgsql-hackers
On Wed, Oct 29, 2014 at 6:21 AM, Maeldron T. <maeldron@gmail.com> wrote:
> I swear I have read a couple of old threads. Yet I am not sure if it safe to
> failback to the old master in case of async replication without base backup.
>
> Considering:
> I have the latest 9.3 server
> A: master
> B: slave
> B is actively connected to A
>
> I shut down A manually with -m fast (it's the default FreeBSD init script
> setting)
> I remove the recovery.conf from B
> I restart B
> I create a recovery.conf on A
> I start A
> I see nothing wrong in the logs
> I go for a lunch
> I shut down B
> I remove the recovery.conf on AI restart A
> I restore the recovery.conf on B
> I start B
> I see nothing wrong in the logs and I see that replication is working
>
> Can I say that my data is safe in this case?
>
> If the answer is yes, is it safe to do this if there was a power outage on A
> instead of manual shutdown? Considering that the log says nothing wrong. (Of
> course if it complains I'd do base backup from B).

The threshold question here is whether the original master might have
written (and thus, perhaps, applied) write-ahead log records that were
not replayed on the slave.  If A crashed, that is definitely possible,
so this is definitely not safe.  If A was shut down cleanly, then
streaming replication *should* take everything up through the shutdown
checkpoint and replicate those to the standby, which *should* replay
them.  If all goes according to plan, I think this will work.

I'm not sure we really have enough safeties to make this robust,
though: for example, at the point when the shutdown checkpoint is
written, I believe that the master is no longer accepting new
connections - so if the connection to the slave is broken before the
shutdown checkpoint record is replicated, then it's not safe any more,
but how will we detect that?  And, if you remove recovery.conf on the
slave, it will abort replay and enter normal running as soon as it
reaches what it thinks is end-of-WAL, with no cross-check to make sure
that's really the same was point that the master was actually at.  So
it strikes me that it might be quite difficult to really have
confidence that nothing will go wrong.

I'm definitely not the expert in this area on this mailing list, so
I'm curious what others think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Validating CHECK constraints with SPI
Следующее
От: Robert Haas
Дата:
Сообщение: Re: WIP: Access method extendability