On Thu, Nov 30, 2017, at 00:22, Alvaro Herrera wrote:
> Alex Kliukin wrote:
>
> > 2017-11-15 13:15:46.673 CET,,,15154,,5a0c2ff1.3b32,5,,2017-11-15
> > 13:15:45 CET,,0,PANIC,XX000,"replication checkpoint has wrong magic
> > 5714534 instead of 307747550",,,,,,,,,""
>
> Uhh ... I had never heard of this "replication checkpoint" thing. It is
> part of replication origins feature, which is fairly new stuff (see
> src/backend/replication/logical/origin.c). I'd bet this problem is
> related to a bug in logical replication "origins" code rather than any
> procedural problems in your base-backup taking setup ...
We are not using logical replication or logical decoding on those hosts.
On the master, pg_replication_origin is empty as well as
pg_replication_slots
Those masters were fairly recently (around 2 months ago) upgraded from
9.3.
>
> I wonder if there is some truncation of the 0x1257DADE value that
> produces the 5714534 value you're seeing -- something related to a
> pg_logical/replorigin_checkpoint file being written partially while the
> backup is being taken.
307747550 = 0x1257DADE
0001 0010 0101 0111 1101 1010 1101 1110
5714534 = 0x573266 = w2f ASCII
0000 0000 0101 0111 0011 0010 0110 0110
I see no patterns here.
What is interesting is that 0x573266 is actually mentioned in relcache.c
#define RELCACHE_INIT_FILENAME "pg_internal.init"
#define RELCACHE_INIT_FILEMAGIC 0x573266 /* version ID
value */
it's a file magic for the relcache init files, but given that the copy
is performed by just compressing and decompressing the original files I
don't see how those 2 could be confused by software.
>
> Another point towards not including pg_logical/ contents when taking a
> base backup, I guess ...
In our case wouldn't it just mask the real issue?
--
Sincerely,
Alex