point-in-time-recovery issue when applying WAL files, "invalid magic number 0000 in log file" error

Поиск
Список
Период
Сортировка
От Radovan Jablonovsky
Тема point-in-time-recovery issue when applying WAL files, "invalid magic number 0000 in log file" error
Дата
Msg-id CAJYcdTv9uLPejVdiL4hUmmAUL50jL2MkkMkp0qKAoc1_nfCWzg@mail.gmail.com
обсуждение исходный текст
Список pgsql-admin
Hello,

We have issue with point-in-time recovery. When applying WAL files on restored postgresql server (we used pg_basebackup to create full cluster backup) we got "invalid magic number 0000 in log file" error. The recovery process applied WAL files from archive location and it was set to specific point in time.

In recovery.conf file were parameters set to: 
restore_command = 'cp /db/archive/transaction_logs/%f %p'
recovery_target_time = '2014-06-11 16:45:00'

The recovery stopped at 
2014-06-16 15:40:30.514 MDT [32021]: [16580-1] LOG:  restored log file "0000001A00001172000000D1" from archive
2014-06-16 15:40:35.957 MDT [32021]: [16581-1] LOG:  restored log file "0000001A00001172000000D2" from archive
2014-06-16 15:40:46.457 MDT [32021]: [16582-1] LOG:  invalid magic number 0000 in log file 4466, segment 210, offset 6594560
2014-06-16 15:40:46.457 MDT [32021]: [16583-1] LOG:  redo done at 1172/D2649068
2014-06-16 15:40:46.457 MDT [32021]: [16584-1] LOG:  last completed transaction was at log time 2014-06-11 12:34:13.496717-06
2014-06-16 15:40:46.481 MDT [32021]: [16585-1] LOG:  restored log file "0000001A00001172000000D2" from archive
cp: cannot stat `/db/archive/transaction_logs/0000001B.history': No such file or directory
2014-06-16 15:40:46.499 MDT [32021]: [16586-1] LOG:  selected new timeline ID: 27
cp: cannot stat `/db/archive/transaction_logs/0000001A.history': No such file or directory
2014-06-16 15:40:49.766 MDT [32021]: [16587-1] LOG:  archive recovery complete
2014-06-16 15:55:16.171 MDT [6791]: [1-1] LOG:  autovacuum launcher started
2014-06-16 15:55:16.172 MDT [32019]: [1-1] LOG:  database system is ready to accept connections

What is interesting, the postgresql server streaming replica/slave, which is set to asynchronous replication had no issue. When we checked the log from replica/slave there was no issue with applying the WAL stream on slave.

Streaming replica/slave log:
2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26862-1] LOG:  restartpoint complete: wrote 77860 buffers (7.4%); 0 transaction log file(s) added, 0 removed, 23 recycled; write=719.761 s, sync=0.983 s, total=720.748 s; sync files=5119, longest=0.261 s, average=0.000 s
2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26863-1] LOG:  recovery restart point at 1172/6C678380
2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26864-1] DETAIL:  last completed transaction was at log time 2014-06-11 12:20:45.928542-06
2014-06-11 12:23:40.170 MDT [11787] - [] - [] - []: [26865-1] LOG:  restartpoint starting: time
2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26866-1] LOG:  restartpoint complete: wrote 110827 buffers (10.6%); 0 transaction log file(s) added, 0 removed, 30 recycled; write=719.931 s, sync=3.832 s, total=723.766 s; sync files=6957, longest=1.349 s, average=0.000 s
2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26867-1] LOG:  recovery restart point at 1172/88B75B48
2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26868-1] DETAIL:  last completed transaction was at log time 2014-06-11 12:35:49.145232-06
2014-06-11 12:35:44.095 MDT [11787] - [] - [] - []: [26869-1] LOG:  restartpoint starting: xlog
2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26870-1] LOG:  restartpoint complete: wrote 176758 buffers (16.9%); 0 transaction log file(s) added, 0 removed, 28 recycled; write=670.320 s, sync=10.588 s, total=680.945 s; sync files=7398, longest=3.899 s, average=0.001 s
2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26871-1] LOG:  recovery restart point at 1172/BE5F4928
2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26872-1] DETAIL:  last completed transaction was at log time 2014-06-11 12:47:10.216659-06
2014-06-11 12:47:05.224 MDT [11787] - [] - [] - []: [26873-1] LOG:  restartpoint starting: xlog
2014-06-11 12:58:27.067 MDT [11787] - [] - [] - []: [26874-1] LOG:  restartpoint complete: wrote 495857 buffers (47.3%); 0 transaction log file(s) added, 0 removed, 54 recycled; write=670.682 s, sync=11.148 s, total=681.843 s; sync files=5627, longest=4.773 s, average=0.001 s

What could cause this issue? I suspect one possibility could be the WAL file itself was corrupted when it was archived. 
Is it possible to recover/fix WAL file?

--

Radovan Jablonovsky | SaaS DBA | Phone 1-403-262-6519 (ext. 256) | Fax 1-403-233-8046


Replicon | Hassle-Free Time & Expense Management Software - 7,300 Customers - 70 Countries
www.replicon.com | facebook | twitter | blog | contact us

We are hiring! | search jobs

В списке pgsql-admin по дате отправления:

Предыдущее
От: Oliver
Дата:
Сообщение: Re: Best backup strategy for production systems
Следующее
От: Oliver
Дата:
Сообщение: Postgresql not getting assigned memory