Re: Broken replication in 3 node setup, master halt,standby 1 promoted, standby 2 - replication not working

Поиск

Список

Период

Сортировка

От	Rui DeSousa
Тема	Re: Broken replication in 3 node setup, master halt,standby 1 promoted, standby 2 - replication not working
Дата	13 января 2018 г. 01:04:07
Msg-id	45FEB64D-CE15-40CA-9BF3-DECEAC21B268@icloud.com обсуждение исходный текст
Ответ на	Re: Broken replication in 3 node setup, master halt, standby 1promoted, standby 2 - replication not working (Pritam Barhate <pritambarhate@gmail.com>)
Список	pgsql-admin

Дерево обсуждения

You might want to consider sync replication instead to avoid split brain syndrome. It readily depending on how things are promoted and managed.

On Jan 12, 2018, at 10:50 AM, Pritam Barhate <pritambarhate@gmail.com> wrote:

I am using async replication.

As both of you said:

recovery_target_timeline=latest

Was missing. Adding it to the recovery.conf did the trick.

Thanks a lot.

Regards,

Pritam.

On Thu, Jan 11, 2018 at 4:09 AM, Rui DeSousa <rui.desousa@icloud.com> wrote:
It appears that your missing is following in the recovery.conf file. It instructs the replica to follow a new timeline when presented; without it will refuse to follow the promoted node.

recovery_target_timeline=latest

Are you using sync replication?

> On Jan 10, 2018, at 2:06 PM, Pritam Barhate <pritambarhate@gmail.com> wrote:
>
> Hi,
>
> I am trying to test the failover scenario on a brand new 3 node setup.
>
> pg1 is master
> pg2 and pg3 are standby hosts in streaming replication, both following pg1.
>
> When I halt pg1, promote pg2 and update recovery.conf on pg3 to point to pg2, the replication is broken. Can you please help me identify the correct procedure to achieve this failover?
>
> Here are the details:
>
> Software Versions:
>
> PostgreSQL: 9.6
> Ubuntu 16.04
> PgBackRest 1.25
>
> Here is the scenario:
>
> Set up 3 nodes in streaming replication:
>
> master (pg1), standby 1 (pg2), standby 2 (pg3)
>
> Also, I have set up pgbackrest, to back up from standby. It's working correctly.
>
> Run the following on pg1:
>
> sudo -u postgres psql -x -c "select * from pg_stat_replication;"
>
> -[ RECORD 1 ]----+------------------------------
> pid | 20714
> usesysid | 16384
> usename | replicator
> application_name | walreceiver
> client_addr | 10.0.15.11
> client_hostname |
> client_port | 48640
> backend_start | 2018-01-10 18:17:46.105694+00
> backend_xmin |
> state | streaming
> sent_location | 0/6000060
> write_location | 0/6000060
> flush_location | 0/6000060
> replay_location | 0/6000060
> sync_priority | 0
> sync_state | async
> -[ RECORD 2 ]----+------------------------------
> pid | 20715
> usesysid | 16384
> usename | replicator
> application_name | walreceiver
> client_addr | 10.0.15.12
> client_hostname |
> client_port | 42198
> backend_start | 2018-01-10 18:17:48.712575+00
> backend_xmin |
> state | streaming
> sent_location | 0/6000060
> write_location | 0/6000060
> flush_location | 0/6000060
> replay_location | 0/6000060
> sync_priority | 0
> sync_state | async
>
>
> It reports both the standbys working correctly.
>
> Halt pg1.
>
> On pg2:
>
> sudo -u postgres pg_ctlcluster 9.6 main promote
>
> On pg3:
>
> sudo service postgresql stop
>
> sudo -u postgres vim /var/lib/postgresql/9.6/main/recovery.conf
>
> modify it's contents are as follows:
>
> ```
> primary_conninfo = 'host=pg2 port=5432 user=replicator'
> standby_mode = 'on'
> restore_command = '/usr/bin/pgbackrest --stanza=main archive-get %f "%p"'
> ```
>
> sudo service postgresql start
>
> On pg2:
>
> sudo -u postgres psql -x -c "select * from pg_stat_replication;"
>
> -[ RECORD 1 ]----+------------------------------
> pid | 21238
> usesysid | 16384
> usename | replicator
> application_name | walreceiver
> client_addr | 10.0.15.12
> client_hostname |
> client_port | 44852
> backend_start | 2018-01-10 18:22:20.980701+00
> backend_xmin |
> state | startup
> sent_location |
> write_location |
> flush_location |
> replay_location |
> sync_priority | 0
> sync_state | async
>
> As you can see sent_location, write_location, flush_location all are empty.
>
> On pg3:
>
> tail -f /var/log/postgresql/postgresql-9.6-main.log
> 2018-01-10 18:22:19.490 UTC [21229] LOG: restored log file "000000010000000000000005" from archive
> 2018-01-10 18:22:19.497 UTC [21229] LOG: redo starts at 0/5000028
> 2018-01-10 18:22:19.757 UTC [21243] postgres@postgres FATAL: the database system is starting up
> 2018-01-10 18:22:20.188 UTC [21229] LOG: restored log file "000000010000000000000006" from archive
> 2018-01-10 18:22:20.195 UTC [21229] LOG: consistent recovery state reached at 0/7000000
> 2018-01-10 18:22:20.195 UTC [21228] LOG: database system is ready to accept read only connections
> 2018-01-10 18:22:20.812 UTC [21229] LOG: invalid magic number 0000 in log segment 000000010000000000000007, offset 0
> 2018-01-10 18:22:20.824 UTC [21255] LOG: fetching timeline history file for timeline 2 from primary server
> 2018-01-10 18:22:20.828 UTC [21255] LOG: primary server contains no more WAL on requested timeline 1
> 2018-01-10 18:22:21.417 UTC [21255] LOG: primary server contains no more WAL on requested timeline 1
>
>
> At this point is I insert data in pg2, it doesn't reflect in pg3.
>
> Please help.
>
> Regards,
>
> Pritam.
>

В списке pgsql-admin по дате отправления:

Предыдущее

От: Keith
Дата: 13 января 2018 г., 00:46:53
Сообщение: Re: Rolling forward to 10.1, new issue...

Следующее

От: Karl Denninger
Дата: 13 января 2018 г., 01:27:46
Сообщение: Re: Rolling forward to 10.1, new issue...

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Broken replication in 3 node setup, master halt,standby 1 promoted, standby 2 - replication not working

Предыдущее

Следующее