Re: [GENERAL] BDR node removal and rejoin

Поиск

Список

Период

Сортировка

От	Zhu, Joshua
Тема	Re: [GENERAL] BDR node removal and rejoin
Дата	13 июля 2017 г. 22:09:22
Msg-id	f5ac30a5684548e5ba672810e528c80b@EXUSDAGORL01.INTERNAL.ROOT.TES обсуждение исходный текст
Ответ на	Re: [GENERAL] BDR node removal and rejoin (Craig Ringer <craig@2ndquadrant.com>)
Ответы	Re: [GENERAL] BDR node removal and rejoin
Список	pgsql-general

Дерево обсуждения

Found these log entries from one of the other node:

t=2017-07-13 08:35:34 PDT p=27292 a=DEBUG: 00000: found valid replication identifier 15

t=2017-07-13 08:35:34 PDT p=27292 a=LOCATION: bdr_establish_connection_and_slot, bdr.c:604

t=2017-07-13 08:35:34 PDT p=27292 a=ERROR: 53400: no free replication state could be found for 15, increase max_replication_slots

Increased max_replication_slots, things are looking good now, thanks.

This does bring up a couple of questions:

Given the fact there is no real increase in the number of nodes in this repeated removal/rejoining exercise, yet it caused replication slots being used up, wouldn’t removal of a node also automatically free up the replication slot allocated for the node? Or is there a way to manually free up no longer needed slots? (the don’t seem to show up in pg_replication_slots view, I made sure to use pg_drop_replication_slot when they do show up there)
If there is such a thing, what is the rule of thumb for best value of max_replication_slots (are they somehow related to the value max_wal_senders as well), with respect to, say, the max number of nodes intended to support?

Thanks

From: Craig Ringer [mailto:craig@2ndquadrant.com]
Sent: Wednesday, July 12, 2017 11:59 PM
To: Zhu, Joshua <jzhu@thalesesec.net>
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] BDR node removal and rejoin

On 13 July 2017 at 01:56, Zhu, Joshua <jzhu@vormetric.com> wrote:

Thanks for the clarification.

Looks like I am running into a different issue: while trying to pin down precisely the steps (and the order in which to perform them) needed to remove/rejoin a node, the removal/rejoining exercise was repeated a number of times, and stuck again:

The status of the re-joining node (node4) on other nodes is “I”
The status of the re-joining node on the node4 itself started at “I”, changed to “o”, then stuck there
From the log file for node4, the following entries are constantly being generated:

2017-07-12 10:37:46 PDT [24943:bdr (6334686800251932108,1,43865,):receive:::1(33883)]DEBUG: 00000: received replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:46 PDT [24943:bdr (6334686800251932108,1,43865,):receive:::1(33883)]LOCATION: exec_replication_command, walsender.c:1309
2017-07-12 10:37:46 PDT [24943:bdr (6334686800251932108,1,43865,):receive:::1(33883)]DEBUG: 08003: unexpected EOF on client connection
2017-07-12 10:37:46 PDT [24943:bdr (6334686800251932108,1,43865,):receive:::1(33883)]LOCATION: SocketBackend, postgres.c:355
2017-07-12 10:37:46 PDT [24944:bdr (6408408103171110238,1,24713,):receive:::1(33884)]DEBUG: 00000: received replication command: IDENTIFY_SYSTEM
2017-07-12 10:37:46 PDT [24944:bdr (6408408103171110238,1,24713,):receive:::1(33884)]LOCATION: exec_replication_command, walsender.c:1309
2017-07-12 10:37:46 PDT [24944:bdr (6408408103171110238,1,24713,):receive:::1(33884)]DEBUG: 08003: unexpected EOF on client connection

Check the logs on the other end.

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-general по дате отправления:

Предыдущее

От: Melvin Davidson
Дата: 13 июля 2017 г., 22:00:14
Сообщение: Re: [GENERAL] Get table OID

Следующее

От: Melvin Davidson
Дата: 13 июля 2017 г., 22:10:20
Сообщение: Re: [GENERAL] I can't cancel/terminate query.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [GENERAL] BDR node removal and rejoin

Предыдущее

Следующее