Обсуждение: PG 9.1 Looking for old WAL when promoting from recovery to master

Поиск
Список
Период
Сортировка

PG 9.1 Looking for old WAL when promoting from recovery to master

От
David Morton
Дата:
I'm implementing replica servers which will use a trigger file to promote from hot standby to full read/write. I've configured streaming replication as well as a recovery.conf which copies old WAL files from a repository if required.

When placing the trigger file the system assumes the read/write roll without issue but insists on looking for a really old WAL file ... the below log file shows restoration from the previous nights full online backup (rsync) along with the trigger file detection and then attempting to find the old WAL file.

Is this behavior normal ? From what i can see its not writing any new WAL files until it is satisfied with the state of this old one. If I create the file its expecting to see it archives it off and then complains about the next in the series.

2012-08-28 23:30:33 UTC   LOG:  restored log file "000000010000002E00000030" from archive
2012-08-28 23:30:34 UTC   LOG:  restored log file "000000010000002E00000031" from archive
2012-08-28 23:30:36 UTC   LOG:  restored log file "000000010000002E00000032" from archive
2012-08-28 23:30:37 UTC   LOG:  restored log file "000000010000002E00000033" from archive
2012-08-28 23:30:39 UTC   LOG:  restored log file "000000010000002E00000034" from archive
2012-08-28 23:30:42 UTC   LOG:  restored log file "000000010000002E00000035" from archive
2012-08-28 23:30:44 UTC   LOG:  restored log file "000000010000002E00000036" from archive
2012-08-28 23:30:45 UTC   LOG:  restored log file "000000010000002E00000037" from archive
cp: cannot stat `/NFS/current/wal/depot/000000010000002E00000038': No such file or directory
2012-08-28 23:30:47 UTC   LOG:  streaming replication successfully connected to primary
2012-08-28 23:42:09 UTC   LOG:  trigger file found: /home/depot/data/transition_to_master.trigger
2012-08-28 23:42:09 UTC   FATAL:  terminating walreceiver process due to administrator command
cp: cannot stat `/NFS/current/wal/depot/000000010000002E00000039': No such file or directory
2012-08-28 23:42:09 UTC   LOG:  record with zero length at 2E/39079E00
cp: cannot stat `/NFS/current/wal/depot/000000010000002E00000039': No such file or directory
2012-08-28 23:42:09 UTC   LOG:  redo done at 2E/39079DC0
2012-08-28 23:42:09 UTC   LOG:  last completed transaction was at log time 2012-08-28 23:42:02.226546+00
cp: cannot stat `/NFS/current/wal/depot/000000010000002E00000039': No such file or directory
cp: cannot stat `/NFS/current/wal/depot/00000002.history': No such file or directory
2012-08-28 23:42:09 UTC   LOG:  selected new timeline ID: 2
cp: cannot stat `/NFS/current/wal/depot/00000001.history': No such file or directory
2012-08-28 23:42:10 UTC   LOG:  archive recovery complete
2012-08-28 23:42:10 UTC   LOG:  database system is ready to accept connections
2012-08-28 23:42:10 UTC   LOG:  autovacuum launcher started
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:42:10 UTC   LOG:  archive command failed with exit code 1
2012-08-28 23:42:10 UTC   DETAIL:  The failed archive command was: /DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023 000000010000001D00000023
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:42:11 UTC   LOG:  archive command failed with exit code 1
2012-08-28 23:42:11 UTC   DETAIL:  The failed archive command was: /DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023 000000010000001D00000023
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:42:13 UTC   LOG:  archive command failed with exit code 1
2012-08-28 23:42:13 UTC   DETAIL:  The failed archive command was: /DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023 000000010000001D00000023
2012-08-28 23:42:13 UTC   WARNING:  transaction log file "000000010000001D00000023" could not be archived: too many failures
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:43:13 UTC   LOG:  archive command failed with exit code 1
2012-08-28 23:43:13 UTC   DETAIL:  The failed archive command was: /DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023 000000010000001D00000023
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:43:14 UTC   LOG:  archive command failed with exit code 1
2012-08-28 23:43:14 UTC   DETAIL:  The failed archive command was: /DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023 000000010000001D00000023
pg_xlog/000000010000001D00000023: No such file or directory
2012-08-28 23:43:15 UTC   LOG:  archive command failed with exit code 1
2012-08-28 23:43:15 UTC   DETAIL:  The failed archive command was: /DB_SHARED/dbcommon/scripts/logarchive.sh pg_xlog/000000010000001D00000023 000000010000001D00000023
2012-08-28 23:43:15 UTC   WARNING:  transaction log file "000000010000001D00000023" could not be archived: too many failures

Re: PG 9.1 Looking for old WAL when promoting from recovery to master

От
Fujii Masao
Дата:
On Tue, Sep 4, 2012 at 7:01 AM, David Morton <davidmorton78@gmail.com> wrote:
> I'm implementing replica servers which will use a trigger file to promote
> from hot standby to full read/write. I've configured streaming replication
> as well as a recovery.conf which copies old WAL files from a repository if
> required.
>
> When placing the trigger file the system assumes the read/write roll without
> issue but insists on looking for a really old WAL file ... the below log
> file shows restoration from the previous nights full online backup (rsync)
> along with the trigger file detection and then attempting to find the old
> WAL file.
>
> Is this behavior normal ?

No. I think that the cause of the failure of archive_command is that
the archive status file of old WAL file exists in pg_xlog/archive_status
directory. Though I'm not sure why that happened. That's strange
since the archive status file should be removed when its corresponding
WAL file is removed. Anyway, if you delete the archive status file,
archive_command would be completed successfully.

Regards,

--
Fujii Masao