Re: Start Walreceiver completely before shut down it on standbyserver.

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Start Walreceiver completely before shut down it on standbyserver.
Дата
Msg-id 20191211.143737.1611770954720850726.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Start Walreceiver completely before shut down it on standby server.  (Ashwin Agrawal <aagrawal@pivotal.io>)
Ответы Re: Start Walreceiver completely before shut down it on standby server.  (jiankang liu <liujk1994@gmail.com>)
Список pgsql-hackers
At Tue, 10 Dec 2019 10:40:53 -0800, Ashwin Agrawal <aagrawal@pivotal.io> wrote in 
> On Tue, Dec 10, 2019 at 3:06 AM jiankang liu <liujk1994@gmail.com> wrote:
> 
> > Start Walreceiver completely before shut down it on standby server.
> >
> > The walreceiver will be shut down, when read an invalid record in the
> > WAL streaming from master.And then, we retry from archive/pg_wal again.
> >
> > After that, we start walreceiver in RequestXLogStreaming(), and read
> > record from the WAL streaming. But before walreceiver starts, we read
> > data from file which be streamed over and present in pg_wal by last
> > time, because of walrcv->receivedUpto > RecPtr and the wal is actually
> > flush on disk. Now, we read the invalid record again, what the next to
> > do? Shut down the walreceiver and do it again.
> >
> 
> I am missing something here, if walrcv->receivedUpto > RecPtr, why are we
> getting / reading invalid record?

I bet on that the standby is connecting to a wrong master. For
example, something like happens when the master has been reinitalized
from a backup and experienced another history, then the standby was
initialized from the reborn master but the stale archive files on the
standby are left alone.

Anyway that cannot happen on correctly running replication set and
what to do in the case is starting from a new basebackup of the
master, making sure to erase stale archive files if any.

About the proposed fix, it doesn't seem to cause start process to
rewind WAL to that LSN. Even if that happens, it leads to no better
than a broken database.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Wrong assert in TransactionGroupUpdateXidStatus
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Windows buildfarm members vs. new async-notify isolation test