Обсуждение: WAL Log Shipping - Warm Standby not working under 8.3.7

Поиск
Список
Период
Сортировка

WAL Log Shipping - Warm Standby not working under 8.3.7

От
Keaton Adams
Дата:
We had WAL Log shipping (warm standby) working fine under 8.1.4 but under 8.3.7 we can’t get the slave to come up properly.  Nothing has changed in our process with regard to start_backup, rsync, stop_backup, bring up the warm standby server in continuous recovery mode, but the failover DB won’t start with the following error:

<2010-01-07 10:54:23 MST>LOG:  received immediate shutdown request
/mxl/var/pgsql/data/ha_copy.sh: line 103: 13976 Quit                    sleep 5
File /mxl/var/pgsql/data/stopslave found. Aborting Process.
<2010-01-07 10:54:28 MST>LOG:  could not open file "pg_xlog/00000001000000F6000000E9" (log file 246, segment 233): No such file or directory
<2010-01-07 10:54:28 MST>LOG:  redo done at F6/E8FFE378
<2010-01-07 10:54:28 MST>LOG:  last completed transaction was at log time 2009-12-28 10:18:04.893307-07
Waiting for log: 00000001000000F6000000E8
<2010-01-07 11:24:49 MST>FATAL:  could not restore file "00000001000000F6000000E8" from archive: return code 15

The log file in reference is very old and is not on the Master PG server in pg_xlogs and the “last completed transaction” can’t be right either.  Is this a bug or it is something we are doing wrong?

Thanks,

Keaton


psql (PostgreSQL) 8.3.7
contains support for command-line editing
RHEL 5 64 Bit
Linux ourservername 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Re: WAL Log Shipping - Warm Standby not working under 8.3.7

От
Keaton Adams
Дата:
I did find some references to a fix of last-completed transaction time and I looked in the postgresql-bugs archive, but I’m not having any luck confirming that this is a problem in 8.3.7 and an upgrade to 8.3.9 would fix the issue.

postgresql 8.3.7 .... Fix incorrect logging of last-completed-transaction time during PITR ..... Last transaction end time is now logged at end of recovery and at each logged restart point (Simon) ...


On 1/7/10 12:53 PM, "Keaton Adams" <kadams@mxlogic.com> wrote:

We had WAL Log shipping (warm standby) working fine under 8.1.4 but under 8.3.7 we can’t get the slave to come up properly.  Nothing has changed in our process with regard to start_backup, rsync, stop_backup, bring up the warm standby server in continuous recovery mode, but the failover DB won’t start with the following error:

<2010-01-07 10:54:23 MST>LOG:  received immediate shutdown request
/mxl/var/pgsql/data/ha_copy.sh: line 103: 13976 Quit                    sleep 5
File /mxl/var/pgsql/data/stopslave found. Aborting Process.
<2010-01-07 10:54:28 MST>LOG:  could not open file "pg_xlog/00000001000000F6000000E9" (log file 246, segment 233): No such file or directory
<2010-01-07 10:54:28 MST>LOG:  redo done at F6/E8FFE378
<2010-01-07 10:54:28 MST>LOG:  last completed transaction was at log time 2009-12-28 10:18:04.893307-07
Waiting for log: 00000001000000F6000000E8
<2010-01-07 11:24:49 MST>FATAL:  could not restore file "00000001000000F6000000E8" from archive: return code 15

The log file in reference is very old and is not on the Master PG server in pg_xlogs and the “last completed transaction” can’t be right either.  Is this a bug or it is something we are doing wrong?

Thanks,

Keaton


psql (PostgreSQL) 8.3.7
contains support for command-line editing
RHEL 5 64 Bit
Linux ourservername 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux


Re: WAL Log Shipping - Warm Standby not working under 8.3.7

От
Keaton Adams
Дата:
OK,

So what am I doing wrong here?

Installed PG 8.3.7 on Slave machine

Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.

Shut down the PG instance on the slave machine

Ran a script that does the following:

select pg_start_backup('Master_Backup');
rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
select pg_stop_backup();
ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start

When the slave PG database attempts to come up in recovery mode, it aborts because it is looking for a log file that is extremely old and does not exist on the master DB server.  I believe the Master PG instance was restarted on 12/28/09 and has been running ever since.  Is there a way to reset the “last completed transaction” on a DB?  Why is PG looking so far back for a WAL log to begin recovery when so much has been done since the 28th including daily backups?

<2010-01-07 10:54:23 MST>LOG:  received immediate shutdown request
/mxl/var/pgsql/data/ha_copy.sh: line 103: 13976 Quit                    sleep 5
File /mxl/var/pgsql/data/stopslave found. Aborting Process.
<2010-01-07 10:54:28 MST>LOG:  could not open file "pg_xlog/00000001000000F6000000E9" (log file 246, segment 233): N
o such file or directory
<2010-01-07 10:54:28 MST>LOG:  redo done at F6/E8FFE378
<2010-01-07 10:54:28 MST>LOG:  last completed transaction was at log time 2009-12-28 10:18:04.893307-07
Waiting for log: 00000001000000F6000000E8
<2010-01-07 11:24:49 MST>FATAL:  could not restore file "00000001000000F6000000E8" from archive: return code 15

Again, nothing was changed with the scripts or the replication process and this worked just fine under 8.1.4.

Thanks!



On 1/8/10 8:10 AM, "Keaton Adams" <kadams@mxlogic.com> wrote:

I did find some references to a fix of last-completed transaction time and I looked in the postgresql-bugs archive, but I’m not having any luck confirming that this is a problem in 8.3.7 and an upgrade to 8.3.9 would fix the issue.

postgresql 8.3.7 .... Fix incorrect logging of last-completed-transaction time during PITR ..... Last transaction end time is now logged at end of recovery and at each logged restart point (Simon) ...


On 1/7/10 12:53 PM, "Keaton Adams" <kadams@mxlogic.com> wrote:

We had WAL Log shipping (warm standby) working fine under 8.1.4 but under 8.3.7 we can’t get the slave to come up properly.  Nothing has changed in our process with regard to start_backup, rsync, stop_backup, bring up the warm standby server in continuous recovery mode, but the failover DB won’t start with the following error:

<2010-01-07 10:54:23 MST>LOG:  received immediate shutdown request
/mxl/var/pgsql/data/ha_copy.sh: line 103: 13976 Quit                    sleep 5
File /mxl/var/pgsql/data/stopslave found. Aborting Process.
<2010-01-07 10:54:28 MST>LOG:  could not open file "pg_xlog/00000001000000F6000000E9" (log file 246, segment 233): No such file or directory
<2010-01-07 10:54:28 MST>LOG:  redo done at F6/E8FFE378
<2010-01-07 10:54:28 MST>LOG:  last completed transaction was at log time 2009-12-28 10:18:04.893307-07
Waiting for log: 00000001000000F6000000E8
<2010-01-07 11:24:49 MST>FATAL:  could not restore file "00000001000000F6000000E8" from archive: return code 15

The log file in reference is very old and is not on the Master PG server in pg_xlogs and the “last completed transaction” can’t be right either.  Is this a bug or it is something we are doing wrong?

Thanks,

Keaton


psql (PostgreSQL) 8.3.7
contains support for command-line editing
RHEL 5 64 Bit
Linux ourservername 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux



Re: WAL Log Shipping - Warm Standby not working under 8.3.7

От
Erik Jones
Дата:
On Jan 8, 2010, at 9:50 AM, Keaton Adams wrote:

> OK,
>
> So what am I doing wrong here?
>
> Installed PG 8.3.7 on Slave machine
>
> Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.
>
> Shut down the PG instance on the slave machine
>
> Ran a script that does the following:
>
> select pg_start_backup('Master_Backup');
> rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
> select pg_stop_backup();
> ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
> ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
> ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start

Is ${slave_backup_path} your archive directory?  Why are you deleting all of you archives there?

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






Re: WAL Log Shipping - Warm Standby not working under 8.3.7

От
Erik Jones
Дата:
On Jan 8, 2010, at 4:50 PM, Erik Jones wrote:

>
> On Jan 8, 2010, at 9:50 AM, Keaton Adams wrote:
>
>> OK,
>>
>> So what am I doing wrong here?
>>
>> Installed PG 8.3.7 on Slave machine
>>
>> Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.
>>
>> Shut down the PG instance on the slave machine
>>
>> Ran a script that does the following:
>>
>> select pg_start_backup('Master_Backup');
>> rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
>> select pg_stop_backup();
>> ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
>> ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
>> ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start
>
> Is ${slave_backup_path} your archive directory?  Why are you deleting all of you archives there?

Also, what are the contents of your recovery.conf file?  Are you using pg_standby?  The typical setup is to clear
<datadir>/pg_xlogon your standby and use pg_standby to recovery files directly from your archive directory? 

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






Re: WAL Log Shipping - Warm Standby not working under 8.3.7

От
Keaton Adams
Дата:
We are still on PG 8.3.7 and I don’t see pg_standby.  It looks like it was added with 8.3.8.  I will put in a request to update to 8.3.9 as a part of our next software upgrade (internal product we write) but for now  I need to try to figure out why this isn’t working under 8.3.7.  I did manage to bring up replication under 8.3.7 in the test environment just fine, so the question is now why the standby PG instance won’t come up in recovery mode and why it is looking so far back for the next WAL log file.

-K


On 1/8/10 5:55 PM, "Erik Jones" <ejones@engineyard.com> wrote:



On Jan 8, 2010, at 4:50 PM, Erik Jones wrote:

>
> On Jan 8, 2010, at 9:50 AM, Keaton Adams wrote:
>
>> OK,
>>
>> So what am I doing wrong here?
>>
>> Installed PG 8.3.7 on Slave machine
>>
>> Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.
>>
>> Shut down the PG instance on the slave machine
>>
>> Ran a script that does the following:
>>
>> select pg_start_backup('Master_Backup');
>> rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
>> select pg_stop_backup();
>> ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
>> ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
>> ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start
>
> Is ${slave_backup_path} your archive directory?  Why are you deleting all of you archives there?

Also, what are the contents of your recovery.conf file?  Are you using pg_standby?  The typical setup is to clear <datadir>/pg_xlog on your standby and use pg_standby to recovery files directly from your archive directory?

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






Re: WAL Log Shipping - Warm Standby not working under 8.3.7

От
Erik Jones
Дата:
On Jan 9, 2010, at 7:24 AM, Keaton Adams wrote:

> Restored from last evening's backup from the master DB to make the rsync across the network finish sooner.
> >>
> >> Shut down the PG instance on the slave machine
> >>
> >> Ran a script that does the following:
> >>
> >> select pg_start_backup('Master_Backup');
> >> rsync -rvlpogtz ${masterdb}/* ${slave_dbuser}@${slave_host}:${slavedb}
> >> select pg_stop_backup();
> >> ssh ${slave_dbuser}@${slave_host} rm ${slavedb}/postmaster.pid 2>/dev/null
> >> ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null
> >> ssh ${slave_dbuser}@${slave_host} ${PSQL_BIN}/pg_ctl -D ${slavedb} -l logfile start
> >
> > Is ${slave_backup_path} your archive directory?  Why are you deleting all of you archives there?
>
> Also, what are the contents of your recovery.conf file?  Are you using pg_standby?  The typical setup is to clear
<datadir>/pg_xlogon your standby and use pg_standby to recovery files directly from your archive directory? 

> We are still on PG 8.3.7 and I don’t see pg_standby.  It looks like it was added with 8.3.8.  I will put in a request
toupdate to 8.3.9 as a part of our next software upgrade (internal product we write) but for now  I need to try to
figureout why this isn’t working under 8.3.7.  I did manage to bring up replication under 8.3.7 in the test environment
justfine, so the question is now why the standby PG instance won’t come up in recovery mode and why it is looking so
farback for the next WAL log file. 
>
> -K

Keaton,

Please, don't top post (I've fixed it in this reply), it makes it difficult for others to join the thread and know
what'scurrently going on. 

Anyways, with regards to pg_standby, it should be included in all versions >= 8.3.0.  Regardless, it will work with
anythingversion since at least 8.2 so you can just grab it from a source install and build it against your built source
treeor use your package manager of choice to get it. 

Regardless:

> >> ssh ${slave_dbuser}@${slave_host} rm ${slave_backup_path}/0* 2>/dev/null

You are doing that *after* you run pg_start_backup() so you're deleting WAL files that are needed for the standby -- it
needsall WAL files generated from the start_backup() call on. 


Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k