Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas

Поиск
Список
Период
Сортировка
От Alex Kliukin
Тема Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas
Дата
Msg-id 1512038669.1366070.1189170512.3D900EE2@webmail.messagingengine.com
обсуждение исходный текст
Ответ на Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas  (Andres Freund <andres@anarazel.de>)
Ответы Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas
Список pgsql-admin

On Thu, Nov 30, 2017, at 01:41, Andres Freund wrote:
> 
> > It is part of replication origins feature, which is fairly new stuff
> > (see src/backend/replication/logical/origin.c).  I'd bet this problem
> > is related to a bug in logical replication "origins" code rather than
> > any procedural problems in your base-backup taking setup ...
> 
> Possible.
> 
> What's the max_replication_origins setting? Is the system receiving
> logical replication data? Could you describe the setup a bit? Any chance
> the system's partially been running without fsync? Could you attach both
> a corrupt and a non-corrupt state file?

max_replication_slots is 5 and logical replication is not used
altogether there. fsync is always turned on, the other configuration
settings from the master are attached. 

The replica configuration is almost identical to the master (we
decreased random_page_costs for systems running on SSDs).

diff /tmp/settings_master.txt /tmp/settings_replica.txt
115c115
< krb_server_keyfile    FILE:/server/postgres/9.6.5/etc/krb5.keytab
---
> krb_server_keyfile    FILE:/server/postgres/9.6.6/etc/krb5.keytab
186c186
< random_page_cost      3
---
> random_page_cost    1.5
194,195c194,195
< server_version        9.6.5
< server_version_num    90605
---
> server_version    9.6.6
> server_version_num    90606
222c222
< tcp_keepalives_interval       75
---
> tcp_keepalives_interval    90
239c239
< transaction_read_only off
---
> transaction_read_only    on
273c273

The system is a typical OLTP, the master normally has a single streaming
physical replica and one delayed one. At the time the issue happened the
replica in question was the second physical replica, after it has been
created  the previous replica  has been decommissioned.

Unfortunately, I don't have a 'corrupt' file from the replica, as the
data has been reinitialized afterwards.  I will try to reproduce the
issue by cloning it couple more times. The replorigin_checkpoint from
the master is attached, but its magic seems to be fine:

od -x replorigin_checkpoint
0000000 dade 1257 b236 6a00
0000010

The same file from the current replica is identical.

-- 
Sincerely,
Alex



Вложения

В списке pgsql-admin по дате отправления:

Предыдущее
От: Alex Kliukin
Дата:
Сообщение: Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas
Следующее
От: Andres Freund
Дата:
Сообщение: Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas