Missing WAL file after running pg_rewind

Поиск
Список
Период
Сортировка
От Dylan Luong
Тема Missing WAL file after running pg_rewind
Дата
Msg-id ab82d7fd35ef4394bc5dfc6a6e2f1266@ITUPW-EXMBOX3B.UniNet.unisa.edu.au
обсуждение исходный текст
Ответы Re: Missing WAL file after running pg_rewind  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-general

Hi

 

We had a failover situation where our monitoring watchdog processes promoted the slave to become the new master.

I restarted the old master database to ensure a clean stop/start and performed pg_rewind on the old master to resync with the new master. However, after successful rewind, there was an error restarting the new slave.

The steps I took were:

1.       Stop all watchdogs

2.       Start/stop the old master

3.       Run ‘checkpoint’ on new master

4.       Run the pg_rewind on old master to resync with new master

5.       Start the old master (as new slave)

 

Step 4 pg_rewind was successful with the new slave rewind to the same new timeline of the new master, however during the restart of the new slave it failed to start with the following errors:

 

80) FATAL:  the database system is starting up

cp: cannot stat ‘/pg_backup/backup/archive_sync/0000000400000383000000BF’: No such file or directory

cp: cannot stat ‘/pg_backup/backup/archive_sync/0000000300000383000000BF’: No such file or directory

cp: cannot stat ‘/pg_backup/backup/archive_sync/0000000200000383000000BF’: No such file or directory

cp: cannot stat ‘/pg_backup/backup/archive_sync/0000000100000383000000BF’: No such file or directory

2018-01-11 23:21:59 ACDT [112235]: [1-1] db=,user= app=,host= LOG:  started streaming WAL from primary at

383/BE000000 on timeline 6

2018-01-11 23:21:59 ACDT [112235]: [2-1] db=,user= app=,host= FATAL:  could not receive data from WAL stre

am: ERROR:  requested WAL segment 0000000600000383000000BE has already been removed

 

I checked the both the archive and pg_xlog directories on the new master and cannot locate missing file.

 

Has anyone experience this before with pg_rewind?

 

The earliest wall files in the archive directory was around just after the failover occurred.

 

Eg, in the archive directory on the new Master:

$ ls -l

total 15745032

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000500000383000000C0.partial

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C0

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C1

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C2

-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C

 

And on the pg_xlog directory on the new Master:

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000080

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000081

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000082

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000083

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000084

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000085

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000086

-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000087

 

Thanks

Dylan

 

В списке pgsql-general по дате отправления:

Предыдущее
От: Curt Tilmes
Дата:
Сообщение: Multiple central connection service files
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: Multiple central connection service files