Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

Поиск

Список

Период

Сортировка

От	Michael Paquier
Тема	Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave
Дата	22 января 2013 г. 03:25:10
Msg-id	CAB7nPqTGnonaydRDx2KQoLAt+AM_nMFqeR6inYZZAo8EeHKwfw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave (Michael Paquier <michael.paquier@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Jan 22, 2013 at 9:06 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

On Fri, Jan 18, 2013 at 6:20 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
Hmm, so it's the same issue I thought I fixed yesterday. My patch only fixed it for the case that the timeline switch is in the first page of the segment. When it's not, you still get two calls for a WAL record, first one for the first page in the segment, to verify that, and then the page that actually contains the record. The first call leads XLogPageRead to think it needs to read from the old timeline.

We didn't have this problem before the xlogreader refactoring because XLogPageRead() was always called with the RecPtr of the record, even when we actually read the segment header from the file first. We'll have to somehow get that same information, the RecPtr of the record we're actually interested in, to XLogPageRead(). We could add a new argument to the callback for that, or we could keep xlogreader.c as it is and pass it through from ReadRecord to XLogPageRead() in the private struct.

An explicit argument to the callback is probably best. That's straightforward, and it might be useful for the callback to know the actual WAL position that xlogreader.c is interested in anyway. See attached.
Just to let you know that I am still getting the error even after commit 2ff6555.
With the same scenario:
1) Start a master with 2 slaves
2) Kill/Stop slave
3) Promote slave 1, it switches to timeline 2
Log on slave 1

LOG: selected new timeline ID: 2
4) Reconnect slave 2 to save 1, slave 2 remains stuck in timeline 1 even if recovery_target_timeline is set to latest
Log on slave 1 at this moment:
DEBUG: received replication command: IDENTIFY_SYSTEM
DEBUG: received replication command: TIMELINE_HISTORY 2
DEBUG: received replication command: START_REPLICATION 0/5000000 TIMELINE 1
Slave 1 receives command to start replication with timeline 1, while it is sync with timeline 2.
Log on slave 2 at this moment:
LOG: restarted WAL streaming at 0/5000000 on timeline 1

LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1 at 0/5014200
DEBUG: walreceiver ended streaming and awaits new instructions

The timeline history file is the same for both nodes:
$ cat 00000002.history
1 0/5014200 no recovery target specified

I might be wrong, but shouldn't there be also an entry for timeline 2 in this file?

Am I missing something?

Sorry, there are no problems...
I simply forgot to set up recovery_target_timeline to 'latest' in recovery.conf...

--
Michael Paquier
http://michael.otacoo.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Michael Paquier
Дата: 22 января 2013 г., 03:06:53
Сообщение: Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

Следующее

От: Robert Haas
Дата: 22 января 2013 г., 03:27:53
Сообщение: Re: Request for vote to move forward with recovery.conf overhaul

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

Предыдущее

Следующее