Обсуждение: WAL scenario valid?

Поиск
Список
Период
Сортировка

WAL scenario valid?

От
Keith Ouellette
Дата:

I am trying an new configuration and was wondering if it is valid. We have three PostgreSQL servers (DB1, DB2 and DB3). DB1 is currently the master with DB2 and DB3 setup as hot_standby using streaming WAL. DB2 is the backup in case DB1 fails. A virtual IP managed by Pacemaker moves between DB1 and DB2 depending on which is running and is Master. DB3 is used for read only queries for reporting. Its recovery.conf file points to the virtual IP so that in theory, it will connect to whichever server (DB1 or DB2) is active/master.

 

I got the initial WAL replication going between DB1 and DB2 and between the virtual IP (currently on DB1) and DB3. So far all is well. However, If I simulate a failover to DB2 (promote DB2 and move the virtual IP to it), WAL does not resume between the virtual IP and DB3. I tried restarting PostgreSQL on DB2, but that did not help. I also tried restarting PostgreSQL on DB2 to see if that would kick start it, but it did not. The only way I could get WAL between the Virutal IP and DB3 is to do a manual sync using rsync.

 

I have seen claims of similar configurations working with three servers in replication and having the third server re-establishing streaming WAL with a server after failure of the previous master. However, not much said on the configuration.

 

Is what I am trying to do possible?

 

Thanks,

Keith

Re: WAL scenario valid?

От
Sergey Konoplev
Дата:
On Mon, Jun 17, 2013 at 8:14 AM, Keith Ouellette
<Keith.Ouellette@airgas.com> wrote:
> I got the initial WAL replication going between DB1 and DB2 and between the
> virtual IP (currently on DB1) and DB3. So far all is well. However, If I
> simulate a failover to DB2 (promote DB2 and move the virtual IP to it), WAL
> does not resume between the virtual IP and DB3. I tried restarting
> PostgreSQL on DB2, but that did not help. I also tried restarting PostgreSQL
> on DB2 to see if that would kick start it, but it did not. The only way I
> could get WAL between the Virutal IP and DB3 is to do a manual sync using
> rsync.

[skipped]

> Is what I am trying to do possible?

It is.

In your situation, when both replicas following the master, in case of
the master's failure you need to find the most caught up replica. To
do this compare WAL replay locations on replicas and chose the one
with the biggest value.

SELECT pg_last_xlog_replay_location();

Note, that If you chose not the most caught up one, than other
replicas that have replayed later WAL entries must be reconfigured
from scratch. Otherwise their data might be corrupted and you will not
get any warnings about it.

In the case of the planned switchover, choose one that will be a new
master on your own.

Then stop all the slaves except the new master.

Use the command below to guarantee that the master and the remaining
slave are ahead of other (stopped) slaves if you are not sure that
they already are. The command creates a minimal WAL entry.

SELECT txid_current();

Touch the failover file on the remaining slave to promote it as a new master.

On the stopped slaves delete everything from the pg_xlog directory and
copy the pg_xlog/*.history there from the new master.

Then change the DSN to point to the new master and add the following
instruction in recovery.conf. This will make replicas to follow the
latest created timeline.

recovery_target_timeline = 'latest'

Start postgres on these slaves and that is it.

Alternatively, if your postgres version is 9.2, you can setup
cascading replicasion, so db2 will follow db1 and db3 will follow db2.
In case of db1 failover, all you need to do is to promote db2.
However, you need to remember that if db2 fails you will have to
change DSN in recovery.conf on db3 to point to db1, or to redirect
your db2's virtual IP. Yes, I should have probably start with this
solution.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com


Re: WAL scenario valid?

От
prakhar jauhari
Дата:
Hey..

I am facing a similar kind of problem, but in a two node setup. Streaming replication is being used with a high wal_keep_segments, using log shipping to bump up timelines on the standby server to setup SR(streaming replication).
DB1 - master
DB2 - standby

When a switchover happens DB2 becomes the new master and when DB1 comes up it will act as the standby to the new master (.history files from new master are copied to DB1 to bump up its timeline so as to setup SR). DB1 is not recreated from scratch. This runs fine in normal switchover, but there seems to be problem in the following situation, leading to database corruption:

Current state :
DB1 - master
DB2 - standby

Now the failing scenario:


1. DB2 machine goes down.
2. After some time DB1 machine also goes down (DB2 is still down).
3. Now DB2 comes up (it will join the cluster as master as DB1 is still down).
4. DB2 is started as master postgresql.
5. Now DB1 comes up (it will join the cluster as standby to DB2)
6. Now when DB1 attempts to SR with DB2, DB1 gets corrupted.

Looked into the issue and found that when DB1 went down initially, it created some WAL's which were not synced to DB2 as it was already down.
Now when DB2 started as master it still had not played the last few WAL's created by DB1(when it was master). DB2 starts as master properly.
When DB1 came as standby to DB2, it bumped it timeline using history file from DB2, but when SR was setup with DB2, DB1 gets corrupted.

Now the question is:

1. Is this a theoretically valid approach?
2. If it is a valid approach, then how can i detect such a scenario (where SR will corrupt the DB)? So that i can go for a basebackup in such situation.

regards,
Prakhar
 


On Tue, Jun 18, 2013 at 1:27 AM, Sergey Konoplev <gray.ru@gmail.com> wrote:
On Mon, Jun 17, 2013 at 8:14 AM, Keith Ouellette
<Keith.Ouellette@airgas.com> wrote:
> I got the initial WAL replication going between DB1 and DB2 and between the
> virtual IP (currently on DB1) and DB3. So far all is well. However, If I
> simulate a failover to DB2 (promote DB2 and move the virtual IP to it), WAL
> does not resume between the virtual IP and DB3. I tried restarting
> PostgreSQL on DB2, but that did not help. I also tried restarting PostgreSQL
> on DB2 to see if that would kick start it, but it did not. The only way I
> could get WAL between the Virutal IP and DB3 is to do a manual sync using
> rsync.

[skipped]

> Is what I am trying to do possible?

It is.

In your situation, when both replicas following the master, in case of
the master's failure you need to find the most caught up replica. To
do this compare WAL replay locations on replicas and chose the one
with the biggest value.

SELECT pg_last_xlog_replay_location();

Note, that If you chose not the most caught up one, than other
replicas that have replayed later WAL entries must be reconfigured
from scratch. Otherwise their data might be corrupted and you will not
get any warnings about it.

In the case of the planned switchover, choose one that will be a new
master on your own.

Then stop all the slaves except the new master.

Use the command below to guarantee that the master and the remaining
slave are ahead of other (stopped) slaves if you are not sure that
they already are. The command creates a minimal WAL entry.

SELECT txid_current();

Touch the failover file on the remaining slave to promote it as a new master.

On the stopped slaves delete everything from the pg_xlog directory and
copy the pg_xlog/*.history there from the new master.

Then change the DSN to point to the new master and add the following
instruction in recovery.conf. This will make replicas to follow the
latest created timeline.

recovery_target_timeline = 'latest'

Start postgres on these slaves and that is it.

Alternatively, if your postgres version is 9.2, you can setup
cascading replicasion, so db2 will follow db1 and db3 will follow db2.
In case of db1 failover, all you need to do is to promote db2.
However, you need to remember that if db2 fails you will have to
change DSN in recovery.conf on db3 to point to db1, or to redirect
your db2's virtual IP. Yes, I should have probably start with this
solution.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: WAL scenario valid?

От
Albe Laurenz
Дата:
prakhar jauhari wrote:
> I am facing a similar kind of problem, but in a two node setup. Streaming replication is being used
> with a high wal_keep_segments, using log shipping to bump up timelines on the standby server to setup
> SR(streaming replication).
> DB1 - master
> 
> DB2 - standby
> 
> 
> When a switchover happens DB2 becomes the new master and when DB1 comes up it will act as the standby
> to the new master (.history files from new master are copied to DB1 to bump up its timeline so as to
> setup SR). DB1 is not recreated from scratch. This runs fine in normal switchover, but there seems to
> be problem in the following situation, leading to database corruption:
> 
> 
> Current state :
> DB1 - master
> DB2 - standby
> 
> 
> Now the failing scenario:
> 
> 
> 
> 1. DB2 machine goes down.
> 
> 2. After some time DB1 machine also goes down (DB2 is still down).
> 
> 3. Now DB2 comes up (it will join the cluster as master as DB1 is still down).
> 
> 4. DB2 is started as master postgresql.
> 
> 5. Now DB1 comes up (it will join the cluster as standby to DB2)
> 
> 6. Now when DB1 attempts to SR with DB2, DB1 gets corrupted.
> 
> 
> Looked into the issue and found that when DB1 went down initially, it created some WAL's which were
> not synced to DB2 as it was already down.
> 
> Now when DB2 started as master it still had not played the last few WAL's created by DB1(when it was
> master). DB2 starts as master properly.
> 
> When DB1 came as standby to DB2, it bumped it timeline using history file from DB2, but when SR was
> setup with DB2, DB1 gets corrupted.
> 
> 
> Now the question is:
> 
> 
> 1. Is this a theoretically valid approach?
> 
> 2. If it is a valid approach, then how can i detect such a scenario (where SR will corrupt the DB)? So
> that i can go for a basebackup in such situation.

If you want to use the old primary as new standby without a new backup,
you have to ascertain that all transactions from the former have
been replayed at the latter.

To figure out where the primary currently is, you can
   SELECT pg_current_xlog_location();

To figure how much the standby has replayed, you can
   SELECT pg_last_xlog_replay_location();

Of course this only works if both are up.

I think that it would be tricky to automatize that; I'd choose
making a new backup after each failover.

In the event of a controlled failover it might be an option.

I am surprised that the scenario you described leads to
corruption; I would have expected an error message.

Yours,
Laurenz Albe

Re: WAL scenario valid?

От
prakhar jauhari
Дата:
Hi,

Ya the error logs came, but only when the DB1 was corrupted. Here is a excerpt from the logs.

LOG:  database system was shut down at 2013-06-19 09:40:55 UTC
LOG:  restored log file "00000004.history" from archive
cp: cannot stat `/data/pgsql/archivedir/00000005.history': No such file or directory
LOG:  restored log file "00000004.history" from archive
LOG:  entering standby mode
cp: cannot stat `/data/pgsql/archivedir/000000040000000000000013': No such file or directory
cp: cannot stat `/data/pgsql/archivedir/000000030000000000000013': No such file or directory
LOG:  consistent recovery state reached at 0/13000080
LOG:  record with zero length at 0/13000080
LOG:  database system is ready to accept read only connections
cp: cannot stat `/data/pgsql/archivedir/000000040000000000000013': No such file or directory
cp: cannot stat `/data/pgsql/archivedir/000000030000000000000013': No such file or directory
cp: cannot stat `/data/pgsql/archivedir/00000005.history': No such file or directory
LOG:  streaming replication successfully connected to primary
ERROR:  cannot execute CREATE ROLE in a read-only transaction
STATEMENT:  CREATE USER replicationuser REPLICATION;
LOG:  invalid record length at 0/13000080
FATAL:  terminating walreceiver process due to administrator command
cp: cannot stat `/data/pgsql/archivedir/000000040000000000000013': No such file or directory
cp: cannot stat `/data/pgsql/archivedir/000000040000000000000013': No such file or directory
cp: cannot stat `/data/pgsql/archivedir/00000005.history': No such file or directory
LOG:  invalid record length at 0/13000080


This is what happened..
When DB1 came up as a new standby, It connected SR with DB2 (new master) then after some time it terminated the walreceiver process.
And the invalid record length logs went on, until i took base backup on DB1 and restarted it.


Also On DB1 (before DB1 was started in standby mode)  : pg_controldata /data/pgsql/9.2/data/

pg_control version number:            922
Catalog version number:               201204301
Database system identifier:           5891091665573732008
Database cluster state:               shut down
pg_control last modified:             Wed Jun 19 09:40:55 2013
Latest checkpoint location:           0/13000020
Prior checkpoint location:            0/124F1BC0
Latest checkpoint's REDO location:    0/13000020
Latest checkpoint's TimeLineID:       3
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:          0/8409
Latest checkpoint's NextOID:          18470
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        669
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Wed Jun 19 09:40:54 2013
Minimum recovery ending location:     0/0
Backup start location:                0/0
Backup end location:                  0/0
End-of-backup record required:        no
Current wal_level setting:            hot_standby
Current max_connections setting:      300
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value


Will it be wrong to compare the "last replayed xlog id" on DB2 (which is the new master) with "Latest checkpoint location" on DB1 before starting DB1 in standby mode and if "Latest checkpoint location" on DB1(old master) is greater than "last replayed xlog id" on DB2(new master) then i have to go for basebackup.


regards,
Prakhar.


On Wed, Jun 19, 2013 at 1:11 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
prakhar jauhari wrote:
> I am facing a similar kind of problem, but in a two node setup. Streaming replication is being used
> with a high wal_keep_segments, using log shipping to bump up timelines on the standby server to setup
> SR(streaming replication).
> DB1 - master
>
> DB2 - standby
>
>
> When a switchover happens DB2 becomes the new master and when DB1 comes up it will act as the standby
> to the new master (.history files from new master are copied to DB1 to bump up its timeline so as to
> setup SR). DB1 is not recreated from scratch. This runs fine in normal switchover, but there seems to
> be problem in the following situation, leading to database corruption:
>
>
> Current state :
> DB1 - master
> DB2 - standby
>
>
> Now the failing scenario:
>
>
>
> 1. DB2 machine goes down.
>
> 2. After some time DB1 machine also goes down (DB2 is still down).
>
> 3. Now DB2 comes up (it will join the cluster as master as DB1 is still down).
>
> 4. DB2 is started as master postgresql.
>
> 5. Now DB1 comes up (it will join the cluster as standby to DB2)
>
> 6. Now when DB1 attempts to SR with DB2, DB1 gets corrupted.
>
>
> Looked into the issue and found that when DB1 went down initially, it created some WAL's which were
> not synced to DB2 as it was already down.
>
> Now when DB2 started as master it still had not played the last few WAL's created by DB1(when it was
> master). DB2 starts as master properly.
>
> When DB1 came as standby to DB2, it bumped it timeline using history file from DB2, but when SR was
> setup with DB2, DB1 gets corrupted.
>
>
> Now the question is:
>
>
> 1. Is this a theoretically valid approach?
>
> 2. If it is a valid approach, then how can i detect such a scenario (where SR will corrupt the DB)? So
> that i can go for a basebackup in such situation.

If you want to use the old primary as new standby without a new backup,
you have to ascertain that all transactions from the former have
been replayed at the latter.

To figure out where the primary currently is, you can
   SELECT pg_current_xlog_location();

To figure how much the standby has replayed, you can
   SELECT pg_last_xlog_replay_location();

Of course this only works if both are up.

I think that it would be tricky to automatize that; I'd choose
making a new backup after each failover.

In the event of a controlled failover it might be an option.

I am surprised that the scenario you described leads to
corruption; I would have expected an error message.

Yours,
Laurenz Albe

Re: WAL scenario valid?

От
Sergey Konoplev
Дата:
On Tue, Jun 18, 2013 at 11:20 PM, prakhar jauhari <prak840@gmail.com> wrote:
> When a switchover happens DB2 becomes the new master and when DB1 comes up
> it will act as the standby to the new master (.history files from new master
> are copied to DB1 to bump up its timeline so as to setup SR). DB1 is not
> recreated from scratch. This runs fine in normal switchover, but there seems

I am not sure this works fine. You have probably got some silent
corruption on db1.

What you need is to resync db1 from db2. You can do it either the
canonical way, base backup with pg_basebackup or rsync, or using
pg_rewind.

From pg_rewind docs:

pg_rewind is a tool for synchronizing a PostgreSQL data directory with another
PostgreSQL data directory that was forked from the first one. The result is
equivalent to rsyncing the first data directory (referred to as the old cluster
from now on) with the second one (the new cluster). The advantage of pg_rewind
over rsync is that pg_rewind uses the WAL to determine changed data blocks,
and does not require reading through all files in the cluster. That makes it
a lot faster when the database is large and only a small portion of it differs
between the clusters.

https://github.com/vmware/pg_rewind

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com