Thank you Alvaro for the comment (on my comment).
At Fri, 13 Dec 2019 18:33:44 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
> On 2019-Dec-13, Kyotaro Horiguchi wrote:
>
> > At Thu, 12 Dec 2019 22:50:20 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in
>
> > > The crux of the issue seems to be that XLogWrite() does not wait for
> > > the entire record to be written to disk before creating the ".ready"
> > > file. Instead, it just waits for the last page of the segment to be
> > > written before notifying the archiver. If PostgreSQL crashes before
> > > it is able to write the rest of the record, it will end up reusing the
> > > ".ready" segment at the end of crash recovery. In the meantime, the
> > > archiver process may have already processed the old version of the
> > > segment.
> >
> > Year, that can happen if the server restarted after the crash.
>
> ... which is the normal way to run things, no?
Yes. In older version (< 10), the default value for wal_level was
minimal. In 10, the default only for wal_level was changed to
replica. Still I'm not sure if restart_after_crash can be recommended
for streaming replcation...
> Why is it bad? It's the default value.
I reconsider it more deeply. And concluded that's not harm replication
as I thought.
WAL-buffer overflow may write partial continuation record and it can
be flushed immediately. That made me misunderstood that standby can
receive only the first half of a continuation record. Actually, that
write doesn't advance LogwrtResult.Flush. So standby doesn't receive a
split record on page boundary. (The cases where crashed mater is used
as new standby as-is might contaminate my thought..)
Sorry for the bogus comment. My conclusion here is that
restart_after_crash doesn't seem to harm standby immediately.
> > The standby can be incosistent at the time of master crash, so it
> > should be fixed using pg_rewind or should be recreated from a base
> > backup.
>
> Surely the master will just come up and replay its WAL, and there should
> be no inconsistency.
>
> You seem to be thinking that a standby is promoted immediately on crash
> of the master, but this is not a given.
Basically no, but it might be mixed a bit. Anyway returning to the
porposal, I think that XLogWrite can be called during at
WAL-buffer-full and it can go into the last page in a segment. The
proposed patch doesn't work since the XLogWrite call didn't write the
whole continuation record. But I'm not sure that corner-case is worth
amendint..
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center