Обсуждение: Failback with log shipping

Поиск
Список
Период
Сортировка

Failback with log shipping

От
Heikki Linnakangas
Дата:
At PGCon, several people asked me about restarting an old master as a 
standby after failover has happened. And it wasn't the first time people 
ask me about it, even before 9.0. We have no mention of that in the 
docs, which is a pretty serious oversight. What can we say about it?

I believe the current official policy is that you have to take a new 
base backup and restore from that. Rsync can be used to speed that up.

However, someone once asked me for a comment on a script he wrote to do 
that in a smarter way. I forget who and when and how exactly it worked, 
but it seems possible to do safely.

First of all, you have to shut down the master cleanly for this to work, 
otherwise there can be changes in the old master that never made it to 
the standby.

Assuming controlled shutdown and that the standby received all WAL from 
the old master before it was promoted, I think you can simply create a 
recovery.conf in the old master's data directory to turn it into a 
standby server, and restart. Am I missing something?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Failback with log shipping

От
Dimitri Fontaine
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Assuming controlled shutdown and that the standby received all WAL from the
> old master before it was promoted, I think you can simply create a
> recovery.conf in the old master's data directory to turn it into a standby
> server, and restart. Am I missing something?

Would that mean that a controlled restart of the old master so that the
recovery stops before applying the logs that were not shipped to the
slave would put it in the same situation?

How easy is it to script that? It seems all we need is the current XID
of the slave at the end of recovery. It should be in the log, maybe it's
easy enough to expose it at the SQL level…

Regards,
--
dim


Re: Failback with log shipping

От
Heikki Linnakangas
Дата:
On 28/05/10 16:11, Dimitri Fontaine wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> Assuming controlled shutdown and that the standby received all WAL from the
>> old master before it was promoted, I think you can simply create a
>> recovery.conf in the old master's data directory to turn it into a standby
>> server, and restart. Am I missing something?
>
> Would that mean that a controlled restart of the old master so that the
> recovery stops before applying the logs that were not shipped to the
> slave would put it in the same situation?

Not shipped before the first failover you mean? No, if any WAL records 
were created in the old master that were not shipped to the standby 
before failover, the corresponding changes to the data files might've 
been flushed to disk already, and you can't undo those by not replaying 
the WAL record on restart.

> How easy is it to script that? It seems all we need is the current XID
> of the slave at the end of recovery. It should be in the log, maybe it's
> easy enough to expose it at the SQL level…

XID doesn't help at all, LSN more likely, but I feel that I don't fully 
understand what you're saying.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Failback with log shipping

От
Dimitri Fontaine
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Not shipped before the first failover you mean? No, if any WAL records were
> created in the old master that were not shipped to the standby before
> failover, the corresponding changes to the data files might've been flushed
> to disk already, and you can't undo those by not replaying the WAL record on
> restart.

Ah yes you need to fail between when (WAL is written and not sent) and
CHECKPOINT for this to be possible. But automatic testing of the
situation (is the data already safe in PGDATA) might still be possible?

>> How easy is it to script that? It seems all we need is the current XID
>> of the slave at the end of recovery. It should be in the log, maybe it's
>> easy enough to expose it at the SQL level…
>
> XID doesn't help at all, LSN more likely, but I feel that I don't fully
> understand what you're saying.

Sorry I was unclear, I was thinking in terms of recovery.conf file and
either recovery_target_xid or recovery_target_time. The idea being that
if the old-master didn't CHECKPOINT the changes that the slave missed,
then we can do crash recovery and choose to stop before that point, then
apply WALs from the new master.

That might sounds like a strange thing to do, but if switching from
master to slave allows skipping the base backup to get a slave again, I
guess we'll see people choosing the all automated failover scripting
(with heartbeat and so on). The goal would be to reduce downtime the
more you can.

When possible I'd still choose manual failover to the slave after a
master's restart and crash recovery, but the downtime constraint might
not allow that everywhere.

So you're saying controlled failover could possibly skip base backup to
reuse old master as new slave, and I'm asking if by some luck (crash
happened before CHECKPOINT) and some recovery.conf setup we could get to
the same situation in case of hard failure. That would allow completely
automatic switchover / failover with no need to resync.

I'm not sure how much clearer I managed to be :)

Regards,
--
dim


Re: Failback with log shipping

От
Heikki Linnakangas
Дата:
On 28/05/10 22:20, Dimitri Fontaine wrote:
> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>> Not shipped before the first failover you mean? No, if any WAL records were
>> created in the old master that were not shipped to the standby before
>> failover, the corresponding changes to the data files might've been flushed
>> to disk already, and you can't undo those by not replaying the WAL record on
>> restart.
>
> Ah yes you need to fail between when (WAL is written and not sent) and
> CHECKPOINT for this to be possible.

Checkpoint only guarantees that everything before that is flushed to 
disk. It doesn't guarantee that nothing is flushed to disk until that. 
If there's a checkpoint that hasn't been shipped to the standby, you're 
certainly hosed, but if there is no checkpoint you don't know if the 
data files have changed or not.

> But automatic testing of the
> situation (is the data already safe in PGDATA) might still be possible?

Hmm, so the situation is this:
            D - E - crash!          /
A - B - C          \            d - f - g - h

The letters represent WAL records. C is the last WAL record that was 
shipped to the standby, D & E are WAL records that were generated in the 
old master before the crash but never sent to the standby, and d-h are 
WAL records created in the standby after failover.

I guess you could read the WAL in the old master and compare it with the 
WAL from the standby to figure out where the failover happened (C), and 
then scan all the data pages involved in records D - E, checking that 
the LSNs on the data pages touched by those records are earlier than C. 
That's a bit laborious, and requires knowledge of all different kinds of 
WAL records to figure out which data pages they touch, but seems 
possible in theory.

>>> How easy is it to script that? It seems all we need is the current XID
>>> of the slave at the end of recovery. It should be in the log, maybe it's
>>> easy enough to expose it at the SQL level…
>>
>> XID doesn't help at all, LSN more likely, but I feel that I don't fully
>> understand what you're saying.
>
> Sorry I was unclear, I was thinking in terms of recovery.conf file and
> either recovery_target_xid or recovery_target_time. The idea being that
> if the old-master didn't CHECKPOINT the changes that the slave missed,
> then we can do crash recovery and choose to stop before that point, then
> apply WALs from the new master.

Ah, I see. No, you don't want to use a recovery target, that would end 
the recovery and start the server. You just need to make sure to use 
WALs from the new master instead of the old one when both exist.

> So you're saying controlled failover could possibly skip base backup to
> reuse old master as new slave, and I'm asking if by some luck (crash
> happened before CHECKPOINT) and some recovery.conf setup we could get to
> the same situation in case of hard failure. That would allow completely
> automatic switchover / failover with no need to resync.

Yeah, that would be nice. In practice, I think you would get lucky more 
often than not, because whenever you modify and dirty a page, writing a 
WAL record, the usage count on the buffer is incremented and it won't be 
evicted from the buffer cache for a while.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Failback with log shipping

От
Fujii Masao
Дата:
On Fri, May 28, 2010 at 7:58 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> At PGCon, several people asked me about restarting an old master as a
> standby after failover has happened. And it wasn't the first time people ask
> me about it, even before 9.0. We have no mention of that in the docs, which
> is a pretty serious oversight. What can we say about it?
>
> I believe the current official policy is that you have to take a new base
> backup and restore from that. Rsync can be used to speed that up.
>
> However, someone once asked me for a comment on a script he wrote to do that
> in a smarter way. I forget who and when and how exactly it worked, but it
> seems possible to do safely.
>
> First of all, you have to shut down the master cleanly for this to work,
> otherwise there can be changes in the old master that never made it to the
> standby.
>
> Assuming controlled shutdown and that the standby received all WAL from the
> old master before it was promoted, I think you can simply create a
> recovery.conf in the old master's data directory to turn it into a standby
> server, and restart. Am I missing something?

Failover always increments the timeline ID of the old standby (i.e.,
new master).
Can that procedure work around the gap of the timeline ID between servers?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center