Timeline switch problem with streaming replication with 3 nodes

Поиск
Список
Период
Сортировка
От Mads.Tandrup@schneider-electric.com
Тема Timeline switch problem with streaming replication with 3 nodes
Дата
Msg-id OF80BBB332.B495F5C6-ONC1257A83.00430216-C1257A83.00455AFC@apcc.com
обсуждение исходный текст
Ответы Re: Timeline switch problem with streaming replication with 3 nodes  (Stuart Bishop <stuart@stuartbishop.net>)
Список pgsql-general
Hi All

I've set up a 3 postgresql nodes 1 master and 2 slaves. They have been
configured for streaming replication with synchronous on. I've set up an
virtual IP that points to the current master node.

When I kill the master node. The slave that was synchronous gets promoted
to master and gets the shared virtual IP

But sometimes the other slave don't accept the switch and instead the log
on the slave says:

2012-09-24 10:45:06 GMT 4663  FATAL:  replication terminated by primary
server
2012-09-24 10:45:06 GMT 4662  LOG:  record with zero length at 0/200009E8
2012-09-24 10:45:06 GMT 10209  FATAL:  could not connect to the primary
server: could not connect to server: Connection refused
                Is the server running on host "10.216.73.60" and accepting
                TCP/IP connections on port 5432?

2012-09-24 10:45:11 GMT 10272  FATAL:  could not connect to the primary
server: FATAL:  recovery is still in progress, can't accept WAL streaming
connections

2012-09-24 10:45:16 GMT 10326  FATAL:  timeline 10 of the primary does not
match recovery target timeline 9
2012-09-24 10:45:21 GMT 10388  FATAL:  timeline 10 of the primary does not
match recovery target timeline 9
2012-09-24 10:45:26 GMT 10451  FATAL:  timeline 10 of the primary does not
match recovery target timeline 9
...

And it continues to repeat the last line.

The new master says:
2012-09-24 10:45:06 GMT 8394  FATAL:  replication terminated by primary
server
2012-09-24 10:45:06 GMT 8393  LOG:  record with zero length at 0/200009E8
2012-09-24 10:45:11 GMT 8393  LOG:  trigger file
found: /tmp/postgresql_trigger
2012-09-24 10:45:11 GMT 8393  LOG:  redo done at 0/20000990
2012-09-24 10:45:11 GMT 8393  LOG:  last completed transaction was at log
time 2012-09-24 10:45:01.917175+00
2012-09-24 10:45:11 GMT 8393  LOG:  selected new timeline ID: 10
2012-09-24 10:45:11 GMT 10741 [unknown] FATAL:  recovery is still in
progress, can't accept WAL streaming connections
2012-09-24 10:45:12 GMT 8393  LOG:  archive recovery complete
2012-09-24 10:45:12 GMT 8391  LOG:  database system is ready to accept
connections
2012-09-24 10:45:12 GMT 10743  LOG:  autovacuum launcher started

The recovery.conf is:
standby_mode = 'on'
primary_conninfo = 'host=10.216.73.60  port=5432 user=root password=onyx
application_name=10.216.73.195'
recovery_target_timeline = 'latest'
trigger_file = '/tmp/postgresql_trigger'

I've found a discussion
(http://archives.postgresql.org/pgsql-general/2011-12/msg00553.php) on a
similar issue a while back. They talk about sharing WAL files as the
solution. But I thought that the idea with streaming replication was that I
would not need a shared storage.

Is that the only solution or is there another way?

Best regards,
Mads



В списке pgsql-general по дате отправления:

Предыдущее
От: salah jubeh
Дата:
Сообщение: Re: 9.1 vs 8.4 performance
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: 9.1 vs 8.4 performance