Error promoting slave on cascading replication using replication slots

Поиск

Список

Период

Сортировка

От	Alvaro Melo
Тема	Error promoting slave on cascading replication using replication slots
Дата	17 декабря 2015 г. 15:55:53
Msg-id	5672DAFE.4070003@atua.com.br обсуждение исходный текст
Ответы	Re: Error promoting slave on cascading replication using replication slots
Список	pgsql-general

Дерево обсуждения

Hi,

I'm configuring a cascading replication environment, with replication
slots, but I'm having a problem when the master goes down and I promote
a slave. All servers start from a cluster created from scratch, with
default config options. The process that I'm using to set up the
cascading replication it is:

1 - On master:
wal_level = hot_standby
max_wal_senders = 3
max_wal_replication_slots = 3
hot_standby = on

2 - On slave1:
Stop Server
Apply the same configuration from above
Erase the old cluster
Run pg_basebackup -v -P -R -X stream -c fast -h IP -U postgres -D PGDATA

3 - On master:
pg_create_physical_replication_slot('NAME')

4 - On slave1:
Add the primary_slot_name to recovery.conf
Start cluster

Everything run smoothly, according to with "SELECT * FROM
pg_stat_replication" and "SELECT * FROM pg_replication_slots". The steps
2, 3 and 4 are repeated on slave2 wich points to slave1. The problem
happens when I stop the master, and run a

pg_ctl -D /var/lib/postgresql/9.4/main promote

on slave1. At this point, slave2 throws the following log, and stops
receiving WAL through the replication slot:

2015-12-17 11:23:06 BRST [944-2] LOG:  replication terminated by primary
server
2015-12-17 11:23:06 BRST [944-3] DETAIL:  End of WAL reached on timeline
1 at 0/30001A0.
2015-12-17 11:23:06 BRST [944-4] LOG:  fetching timeline history file
for timeline 2 from primary server
2015-12-17 11:23:06 BRST [937-7] LOG:  record with zero length at 0/30001A0
2015-12-17 11:23:06 BRST [944-5] LOG:  restarted WAL streaming at
0/3000000 on timeline 1
2015-12-17 11:23:06 BRST [944-6] LOG:  replication terminated by primary
server
2015-12-17 11:23:06 BRST [944-7] DETAIL:  End of WAL reached on timeline
1 at 0/30001A0.
2015-12-17 11:23:11 BRST [944-8] LOG:  restarted WAL streaming at
0/3000000 on timeline 1
2015-12-17 11:23:11 BRST [944-9] LOG:  replication terminated by primary
server

I found a instruction to add the following line to recovery.conf:
recovery_target_timeline = 'latest'

When this line is added, slave2 keeps its replication with slave 1:
2015-12-17 13:37:54 BRST [868-2] LOG:  replication terminated by primary
server
2015-12-17 13:37:54 BRST [868-3] DETAIL:  End of WAL reached on timeline
1 at 0/3001358.
2015-12-17 13:37:54 BRST [868-4] LOG:  fetching timeline history file
for timeline 2 from primary server
2015-12-17 13:37:54 BRST [863-7] LOG:  new target timeline is 2
2015-12-17 13:37:54 BRST [863-8] LOG:  record with zero length at 0/3001358
2015-12-17 13:37:54 BRST [868-5] LOG:  restarted WAL streaming at
0/3000000 on timeline 2

My question is: is this the right procedure, or am I missing something?

Best regards,

--
Álvaro Nunes Melo    Atua Sistemas de Informação
alvaro@atua.com.br   http://www.atua.com.br
(54) 9976-0106       (54) 3045-8100

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Error promoting slave on cascading replication using replication slots