Обсуждение: Hot Standby Failover Scenario

Поиск

Список

Период

Сортировка

Hot Standby Failover Scenario

От

Lucky Haryadi

Дата:

28 февраля 2012 г., 02:06:13

Hi everybody.

I want to ask about hot-standby related issues. First of all, maybe I will describe my scenario of Postgres master-slave.

1. There are Master A and Slave B in different location, assumed different region of nation.

2. Configuring Master A and Slave B to become hot-standby is same as described in documentations.

3. When Master A fails to service, the database will failovered to Slave B by triggering with trigger file.

4. As soon as Slave B become standalone pg server, run pg_start_backup(), so that all transactions will only be recorded to WAL files.

5. Applications swinged to Standalone B, until Server A recovery is done.

6. When Server A has recovered (but still offline), run pg_stop_backup() and copy all WAL files from B to A.

7. Once the WAL files copied to A, set A's configuration back to Master and B to Slave again (for B, change recovery.done to recovery.conf and remove the trigger file).

8. Bring up A, restart B and all applications will be swinged back to A.

I've tried these methods with no luck. Before A fails to service, condition is A has 10 million records, and B has 10 million records too. Then I failovered to B, manually, simulating that A failed to service. I run pg_start_backup() and inserting bunch of data, let say the current condition is A still 10 million, B 20 million. So I tried to copy WAL files from B to A and hope that when A up again, the records will intact to B, A 20 million and B 20 million and hot-standby streaming will run as well. But my experiments failed to do so.

I've checked the log and found that the timeline is invalid. On Slave B's log, it appeared that timeline of primary server (Master A) does not match target timeline of standby server. Can anyone suggest for this case? Any suggestions will be greatly appreciated. Thank you.

Re: Hot Standby Failover Scenario

От

Greg Smith

Дата:

28 февраля 2012 г., 15:40:45

On 02/27/2012 10:05 PM, Lucky Haryadi wrote:
> 3. When Master A fails to service, the database will failovered to Slave
> B by triggering with trigger file.

As soon as you trigger a standby, it changes it to a new timeline. At
that point, the series of WAL files diverges. It's no longer possible
to apply them to a system that is still on the original timeline, such
as your original master A in this situation. There's a good reason for
that. Let's say that A committed an additional transaction before it
went down, but that commit wasn't replicated to B. You can't just move
records from B over anymore in that case. The only way to make sure A
is in sync again is to do a new base backup, which you can potentially
accelerate using rsync to only copy what has changed. I see a lot of
people try to bypass one of the steps recommended in the manual using
various schemes like yours, and they usually have a bug like this in
there--sometimes obvious like this, sometimes subtle. Trying to get too
clever here is dangerous to your database.

Warning: pgsql-hackers is the mailing list for people to discuss the
development of PostgreSQL, not how to use it. Questions like this
should be asked on either the pgsql-admin or pgsql-general mailing list. I'm not going to answer additional questions
likethis from you here

on this list, and I doubt anyone else will either.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Hot Standby Failover Scenario

Hot Standby Failover Scenario

Re: Hot Standby Failover Scenario