Обсуждение: BUG #13977: Strange behavior with WAL archive recovery

Поиск
Список
Период
Сортировка

BUG #13977: Strange behavior with WAL archive recovery

От
eshkinkot@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      13977
Logged by:          Sergey Burladyan
Email address:      eshkinkot@gmail.com
PostgreSQL version: 9.4.6
Operating system:   Debian GNU/Linux 8.2 (jessie)
Description:

Hello

I see in postgresql.log what server constantly receives the same file from
WAL archive. I does not see this in 9.2 version.

recovery.conf:
restore_command = 'restore_cmd /mnt/WALs %f %p'
recovery_target_timeline = 'latest'
standby_mode = on

2016-02-22 17:23:15.820 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C1" from archive
2016-02-22 17:23:16.323 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C2" from archive
2016-02-22 17:23:16.639 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C3" from archive
2016-02-22 17:23:16.983 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C4" from archive
2016-02-22 17:23:17.119 MSK pid=19436,user=,db=,host= LOG: unexpected
pageaddr 74A4/C8000000 in log segment 00000001000074A7000000C5, offset 0
2016-02-22 17:23:20.279 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C4" from archive
2016-02-22 17:23:20.473 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C5" from archive
2016-02-22 17:23:20.809 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C6" from archive
2016-02-22 17:23:21.189 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C7" from archive
2016-02-22 17:23:21.361 MSK pid=19436,user=,db=,host= LOG: unexpected
pageaddr 74A4/D5000000 in log segment 00000001000074A7000000C8, offset 0
2016-02-22 17:23:25.527 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C7" from archive
2016-02-22 17:23:30.695 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C7" from archive
2016-02-22 17:23:35.864 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C7" from archive
2016-02-22 17:23:41.033 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C7" from archive
2016-02-22 17:23:45.200 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C7" from archive
2016-02-22 17:23:50.374 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C7" from archive
2016-02-22 17:23:55.546 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C7" from archive
2016-02-22 17:24:00.717 MSK pid=19436,user=,db=,host= LOG: restored log file
"00000001000074A7000000C7" from archive

Re: BUG #13977: Strange behavior with WAL archive recovery

От
Oleksii Kliukin
Дата:
Hi,

> On 22 Feb 2016, at 15:31, eshkinkot@gmail.com wrote:
>=20
> The following bug has been logged on the website:
>=20
> Bug reference:      13977
> Logged by:          Sergey Burladyan
> Email address:      eshkinkot@gmail.com
> PostgreSQL version: 9.4.6
> Operating system:   Debian GNU/Linux 8.2 (jessie)
> Description:       =20
>=20
> Hello
>=20
> I see in postgresql.log what server constantly receives the same file =
from
> WAL archive. I does not see this in 9.2 version.
>=20
> recovery.conf:
> restore_command =3D 'restore_cmd /mnt/WALs %f %p'
> recovery_target_timeline =3D 'latest'
> standby_mode =3D on
>=20
> 2016-02-22 17:23:15.820 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C1" from archive
> 2016-02-22 17:23:16.323 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C2" from archive
> 2016-02-22 17:23:16.639 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C3" from archive
> 2016-02-22 17:23:16.983 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C4" from archive
> 2016-02-22 17:23:17.119 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
unexpected
> pageaddr 74A4/C8000000 in log segment 00000001000074A7000000C5, offset =
0
> 2016-02-22 17:23:20.279 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C4" from archive
> 2016-02-22 17:23:20.473 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C5" from archive
> 2016-02-22 17:23:20.809 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C6" from archive
> 2016-02-22 17:23:21.189 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C7" from archive
> 2016-02-22 17:23:21.361 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
unexpected
> pageaddr 74A4/D5000000 in log segment 00000001000074A7000000C8, offset =
0
> 2016-02-22 17:23:25.527 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C7" from archive
> 2016-02-22 17:23:30.695 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C7" from archive
> 2016-02-22 17:23:35.864 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C7" from archive
> 2016-02-22 17:23:41.033 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C7" from archive
> 2016-02-22 17:23:45.200 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C7" from archive
> 2016-02-22 17:23:50.374 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C7" from archive
> 2016-02-22 17:23:55.546 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C7" from archive
> 2016-02-22 17:24:00.717 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: =
restored log file
> "00000001000074A7000000C7" from archive

I think it=E2=80=99s expected. PostgreSQL reports the latest file it has =
restored, meaning that once it restored 00000001000074A7000000C7 it was =
not able to get the following one (presumably, 00000001000074A7000000C8, =
unless there was a timeline change).=20

If there is a 00000001000074A7000000C8 in the archive, then perhaps =
there is something wrong with it (check the md5). Sometimes turning on =
log_min_messages=3D=E2=80=98debug=E2=80=99 helps finding out the reason =
why PostgreSQL is unable to restore the given segment.

--
Oleksii

Re: BUG #13977: Strange behavior with WAL archive recovery

От
Michael Paquier
Дата:
On Thu, Feb 25, 2016 at 12:54 AM, Oleksii Kliukin <alexk@hintbits.com> wrot=
e:
>> On 22 Feb 2016, at 15:31, eshkinkot@gmail.com wrote:
>> 2016-02-22 17:23:16.983 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: resto=
red log file
>> "00000001000074A7000000C4" from archive
>> 2016-02-22 17:23:17.119 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: unexp=
ected
>> pageaddr 74A4/C8000000 in log segment 00000001000074A7000000C5, offset 0

[...]

>> 2016-02-22 17:23:21.189 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: resto=
red log file
>> "00000001000074A7000000C7" from archive
>> 2016-02-22 17:23:21.361 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: unexp=
ected
>> pageaddr 74A4/D5000000 in log segment 00000001000074A7000000C8, offset 0

Those two entries are weird. A segment has been restored, but then a
complain comes from segment+1 in the error context given here. Looking
at xlogreader.c there is no one-off kind of problems in the reader
machinery.

>> 2016-02-22 17:23:25.527 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: resto=
red log file
>> "00000001000074A7000000C7" from archive
>> [...]
>> 2016-02-22 17:23:30.695 MSK pid=3D19436,user=3D,db=3D,host=3D LOG: resto=
red log file
>> "00000001000074A7000000C7" from archive
>
> I think it=E2=80=99s expected. PostgreSQL reports the latest file it has =
restored, meaning that once it restored 00000001000074A7000000C7 it was not=
 able to get the following one (presumably, 00000001000074A7000000C8, unles=
s there was a timeline change).

This error message is reported by Postgres once a segment is being
correctly restored in pg_xlog and that it is found to have a correct
size. What is actually restore_cmd doing?
--=20
Michael