Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave
Дата
Msg-id 50F871E2.7090504@vmware.com
обсуждение исходный текст
Ответ на Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On 17.01.2013 21:57, Heikki Linnakangas wrote:
> On 17.01.2013 20:08, Andres Freund wrote:
>> On 2013-01-18 03:05:47 +0900, Fujii Masao wrote:
>>> I encountered the problem that the timeline switch is not performed
>>> expectedly.
>>> I set up one master, one standby and one cascade standby. All the
>>> servers
>>> share the archive directory. restore_command is specified in the
>>> recovery.conf
>>> in those two standbys.
>>>
>>> I shut down the master, and then promoted the standby. In this case, the
>>> cascade standby should switch to new timeline and replication should be
>>> successfully restarted. But the timeline was never changed, and the
>>> following
>>> log messages were kept outputting.
>>>
>>> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
>>> sby2 LOG: replication terminated by primary server
>>> sby2 DETAIL: End of WAL reached on timeline 1
>>> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
>>> sby2 LOG: replication terminated by primary server
>>> sby2 DETAIL: End of WAL reached on timeline 1
>>> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
>>> sby2 LOG: replication terminated by primary server
>>> sby2 DETAIL: End of WAL reached on timeline 1
>>> ....
>>
>> That's after the commit or before? Because in passing I think I
>> noticed/fixed a bug that could cause exactly that problem...
>
> I think I broke that with the "teach pg_receivexlog to cross timelines"
> patch. Will take a look...

Ok, there was a couple of issues. First, as I suspected above, I added a 
new result set after copy has ended in START_STREAMING command, but 
forgot to teach walreceiver about it. Fixed that.

Secondly, there's an interesting interaction between the new xlogreader 
code and streaming replication and timeline switches:

The first call to the page_read callback in xlogreader is always made 
with size SizeOfXLogRecord, counting from the beginning of the page. 
That's bogus; there can be no WAL record in the very beginning of page, 
because of the page header, so I think that was supposed to be 
SizeXLogShortPHD. But that won't fix the issue.

The problem is that XLogPageRead callback uses the page address and 
requested length to decide what timeline to stream from. For example, 
imagine that there's a timeline switch at offset 1000 in a page, and we 
have already read all the WAL up to that point. When xlogreader first 
asks to fetch the first 32 bytes from the page (the bogus 
SizeOfXLogRecord), XLogPageRead deduces that that byte range is still on 
the old timeline, and starts streaming from the old timeline. Next, 
xlogreader needs the rest of the page, up to 1000 + SizeOfPageHeader, to 
read the first record it's actually interested in, XLogPageRead will 
return an error because that range is not on the timeline that's 
currently streamed. And we loop back to retry, and run into the same 
problem again.

This interaction is a bit too subtle for my taste, but the 
straightforward fix is to just modify xlogreader so that the first 
read_page call requests all the bytes up to record we're actually 
interested in. That seems like a smart thing to do anyway.

I committed a patch for that second issue too, but please take a look in 
case you come up with a better idea.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: HS locking broken in HEAD
Следующее
От: Dimitri Fontaine
Дата:
Сообщение: Re: Event Triggers: adding information