Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?

Поиск
Список
Период
Сортировка
От Kyotaro HORIGUCHI
Тема Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?
Дата
Msg-id 20170907.123347.101584520.horiguchi.kyotaro@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least9.5)?  (Andres Freund <andres@anarazel.de>)
Ответы Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
Hello,

At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund <andres@anarazel.de> wrote in
<20170906192353.ufp2dq7wm5fd6qa7@alap3.anarazel.de>
> On 2017-09-06 17:36:02 +0900, Kyotaro HORIGUCHI wrote:
> > The problem is that the current ReadRecord needs the first one of
> > a series of continuation records from the same source with the
> > other part, the master in the case.
> 
> What's the problem with that?  We can easily keep track of the beginning
> of a record, and only confirm the address before that.

After failure while reading a record locally, ReadRecored tries
streaming to read from the beginning of a record, which is not on
the master, then retry locally and.. This loops forever.

> > A (or the) solution closed in the standby side is allowing to
> > read a seris of continuation records from muliple sources.
> 
> I'm not following. All we need to use is the beginning of the relevant
> records, that's easy enough to keep track of. We don't need to read the
> WAL or anything.

The beginning is already tracked and nothing more to do. 

I reconsider that way and found that it doesn't need such
destructive refactoring.

The first *problem* was WaitForWALToBecomeAvaialble requests the
beginning of a record, which is not on the page the function has
been told to fetch. Still tliRecPtr is required to determine the
TLI to request, it should request RecPtr to be streamed.

The rest to do is let XLogPageRead retry other sources
immediately. To do this I made ValidXLogPageHeader@xlogreader.c
public (and renamed to XLogReaderValidatePageHeader).

The patch attached fixes the problem and passes recovery
tests. However, the test for this problem is not added. It needs
to go to the last page in a segment then put a record continues
to the next segment, then kill the standby after receiving the
previous segment but before receiving the whole record.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chapman Flack
Дата:
Сообщение: Re: [HACKERS] Replication vs. float timestamps is a disaster
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Setting pd_lower in GIN metapage