Fwd: how to decrease the promotion time when performing a multiple failovers.....

Поиск

Список

Период

Сортировка

От	Shay Cohavi
Тема	Fwd: how to decrease the promotion time when performing a multiple failovers.....
Дата	1 января 2016 г. 06:29:31
Msg-id	CAMFxmHGXLNhM_Ny75tJFfLrwRNh2VKpakov1xrtVg6gOeKSjvg@mail.gmail.com обсуждение исходный текст
Ответы	Re: how to decrease the promotion time when performing a multiple failovers.....
Список	pgsql-admin

Дерево обсуждения

Hi,

I have postgresql 9.3 setup with 2 nodes (active/standby with streaming replication & continuos archiving).

I have created 2 failover & failback script in order to perform a switchover between the DB servers:

1. failover - create a trigger file in order to promote the new primary.

2. failback - perform a base backup as mentions in :

a. start backup on the primary.

b. stop the failed node .

didn't delete the DB directory on the failed node

c. performing rsync between the nodes.

d.stopping the backup on the primary.

e.performing rsync on the pg_xlog.

f. creating a recovery.conf

standby_mode = 'on'

primary_conninfo = 'host=10.50.1.153 port=5432 user=usr password=pass'

restore_command = 'scp 10.50.1.153:/home/postgres/archive/%f %p'

trigger_file = '/home/postgres/databases/fabrix/trigger'

archive_cleanup_command = 'ssh 10.50.1.153 /home/postgres/pg_utils/archive_cleanup.sh %r'

g. starting the failed node as secondary.

the switchover method:

1. stop the primary node.

2. promote the secondary node (failover.sh).

3. perform failback on the failed node.

4. start the failed node.

this method works great!

but if I perform multiple switchovers (>20), each time the new primary gets promoted (trigger file) - it takes longer because it searches the timelines on the archive.

is there any way to prevent the multiple 'scp' archive commands which makes the promotion longer!

for example:

[2015-12-12 20:35:10.769 IST] LOG: trigger file found: /home/postgres/databases/fabrix/trigger

[2015-12-12 20:35:10.769 IST] FATAL: terminating walreceiver process due to administrator command

scp: /home/postgres/archive/0000009400000002000000DC: No such file or directory

[2015-12-12 20:35:10.893 IST] LOG: record with zero length at 2/DC000168

[2015-12-12 20:35:10.893 IST] LOG: redo done at 2/DC000100

scp: /home/postgres/archive/0000009400000002000000DC: No such file or directory

scp: /home/postgres/archive/0000009300000002000000DC: No such file or directory

scp: /home/postgres/archive/0000009200000002000000DC: No such file or directory

scp: /home/postgres/archive/0000009100000002000000DC: No such file or directory

scp: /home/postgres/archive/0000009000000002000000DC: No such file or directory

scp: /home/postgres/archive/00000095.history: No such file or directory

[2015-12-12 20:35:11.801 IST] LOG: selected new timeline ID: 149

[2015-12-12 20:35:11.931 IST] LOG: restored log file "00000094.history" from archive

[2015-12-12 20:35:12.173 IST] LOG: archive recovery complete

[2015-12-12 20:35:12.181 IST] LOG: database system is ready to accept connections

[2015-12-12 20:35:12.181 IST] LOG: autovacuum launcher started

this could take for a least 1 min.....or more.

is there any way to skip the timeline searching in order to decrease the promotion?

Thanks,

ShayC

В списке pgsql-admin по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Fwd: how to decrease the promotion time when performing a multiple failovers.....