point-in-time-recovery issue when applying WAL files, "invalid magic number 0000 in log file" error

Поиск

Список

Период

Сортировка

От	Radovan Jablonovsky
Тема	point-in-time-recovery issue when applying WAL files, "invalid magic number 0000 in log file" error
Дата	19 июня 2014 г. 23:09:01
Msg-id	CAJYcdTv9uLPejVdiL4hUmmAUL50jL2MkkMkp0qKAoc1_nfCWzg@mail.gmail.com обсуждение исходный текст
Список	pgsql-admin

Дерево обсуждения

Hello,

We have issue with point-in-time recovery. When applying WAL files on restored postgresql server (we used pg_basebackup to create full cluster backup) we got "invalid magic number 0000 in log file" error. The recovery process applied WAL files from archive location and it was set to specific point in time.

In recovery.conf file were parameters set to:

restore_command = 'cp /db/archive/transaction_logs/%f %p'

recovery_target_time = '2014-06-11 16:45:00'

The recovery stopped at

2014-06-16 15:40:30.514 MDT [32021]: [16580-1] LOG: restored log file "0000001A00001172000000D1" from archive

2014-06-16 15:40:35.957 MDT [32021]: [16581-1] LOG: restored log file "0000001A00001172000000D2" from archive

2014-06-16 15:40:46.457 MDT [32021]: [16582-1] LOG: invalid magic number 0000 in log file 4466, segment 210, offset 6594560

2014-06-16 15:40:46.457 MDT [32021]: [16583-1] LOG: redo done at 1172/D2649068

2014-06-16 15:40:46.457 MDT [32021]: [16584-1] LOG: last completed transaction was at log time 2014-06-11 12:34:13.496717-06

2014-06-16 15:40:46.481 MDT [32021]: [16585-1] LOG: restored log file "0000001A00001172000000D2" from archive

cp: cannot stat `/db/archive/transaction_logs/0000001B.history': No such file or directory

2014-06-16 15:40:46.499 MDT [32021]: [16586-1] LOG: selected new timeline ID: 27

cp: cannot stat `/db/archive/transaction_logs/0000001A.history': No such file or directory

2014-06-16 15:40:49.766 MDT [32021]: [16587-1] LOG: archive recovery complete

2014-06-16 15:55:16.171 MDT [6791]: [1-1] LOG: autovacuum launcher started

2014-06-16 15:55:16.172 MDT [32019]: [1-1] LOG: database system is ready to accept connections

What is interesting, the postgresql server streaming replica/slave, which is set to asynchronous replication had no issue. When we checked the log from replica/slave there was no issue with applying the WAL stream on slave.

Streaming replica/slave log:

2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26862-1] LOG: restartpoint complete: wrote 77860 buffers (7.4%); 0 transaction log file(s) added, 0 removed, 23 recycled; write=719.761 s, sync=0.983 s, total=720.748 s; sync files=5119, longest=0.261 s, average=0.000 s

2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26863-1] LOG: recovery restart point at 1172/6C678380

2014-06-11 12:20:40.832 MDT [11787] - [] - [] - []: [26864-1] DETAIL: last completed transaction was at log time 2014-06-11 12:20:45.928542-06

2014-06-11 12:23:40.170 MDT [11787] - [] - [] - []: [26865-1] LOG: restartpoint starting: time

2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26866-1] LOG: restartpoint complete: wrote 110827 buffers (10.6%); 0 transaction log file(s) added, 0 removed, 30 recycled; write=719.931 s, sync=3.832 s, total=723.766 s; sync files=6957, longest=1.349 s, average=0.000 s

2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26867-1] LOG: recovery restart point at 1172/88B75B48

2014-06-11 12:35:43.936 MDT [11787] - [] - [] - []: [26868-1] DETAIL: last completed transaction was at log time 2014-06-11 12:35:49.145232-06

2014-06-11 12:35:44.095 MDT [11787] - [] - [] - []: [26869-1] LOG: restartpoint starting: xlog

2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26870-1] LOG: restartpoint complete: wrote 176758 buffers (16.9%); 0 transaction log file(s) added, 0 removed, 28 recycled; write=670.320 s, sync=10.588 s, total=680.945 s; sync files=7398, longest=3.899 s, average=0.001 s

2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26871-1] LOG: recovery restart point at 1172/BE5F4928

2014-06-11 12:47:05.041 MDT [11787] - [] - [] - []: [26872-1] DETAIL: last completed transaction was at log time 2014-06-11 12:47:10.216659-06

2014-06-11 12:47:05.224 MDT [11787] - [] - [] - []: [26873-1] LOG: restartpoint starting: xlog

2014-06-11 12:58:27.067 MDT [11787] - [] - [] - []: [26874-1] LOG: restartpoint complete: wrote 495857 buffers (47.3%); 0 transaction log file(s) added, 0 removed, 54 recycled; write=670.682 s, sync=11.148 s, total=681.843 s; sync files=5627, longest=4.773 s, average=0.001 s

What could cause this issue? I suspect one possibility could be the WAL file itself was corrupted when it was archived.

Is it possible to recover/fix WAL file?

Radovan Jablonovsky | SaaS DBA | Phone 1-403-262-6519 (ext. 256) | Fax 1-403-233-8046

We are hiring! | search jobs

В списке pgsql-admin по дате отправления:

Предыдущее

От: Oliver
Дата: 19 июня 2014 г., 13:44:22
Сообщение: Re: Best backup strategy for production systems

Следующее

От: Oliver
Дата: 20 июня 2014 г., 14:51:36
Сообщение: Postgresql not getting assigned memory

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

point-in-time-recovery issue when applying WAL files, "invalid magic number 0000 in log file" error

Предыдущее

Следующее