Re: archive status ".ready" files may be created too early

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: archive status ".ready" files may be created too early
Дата
Msg-id 20191217.192724.1673213777530336030.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: archive status ".ready" files may be created too early  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
Thank you Alvaro for the comment (on my comment).

At Fri, 13 Dec 2019 18:33:44 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> On 2019-Dec-13, Kyotaro Horiguchi wrote:
> 
> > At Thu, 12 Dec 2019 22:50:20 +0000, "Bossart, Nathan" <bossartn@amazon.com> wrote in 
> 
> > > The crux of the issue seems to be that XLogWrite() does not wait for
> > > the entire record to be written to disk before creating the ".ready"
> > > file.  Instead, it just waits for the last page of the segment to be
> > > written before notifying the archiver.  If PostgreSQL crashes before
> > > it is able to write the rest of the record, it will end up reusing the
> > > ".ready" segment at the end of crash recovery.  In the meantime, the
> > > archiver process may have already processed the old version of the
> > > segment.
> > 
> > Year, that can happen if the server restarted after the crash.
> 
> ... which is the normal way to run things, no?

Yes. In older version (< 10), the default value for wal_level was
minimal. In 10, the default only for wal_level was changed to
replica. Still I'm not sure if restart_after_crash can be recommended
for streaming replcation...

> Why is it bad?  It's the default value.

I reconsider it more deeply. And concluded that's not harm replication
as I thought.

WAL-buffer overflow may write partial continuation record and it can
be flushed immediately. That made me misunderstood that standby can
receive only the first half of a continuation record. Actually, that
write doesn't advance LogwrtResult.Flush. So standby doesn't receive a
split record on page boundary. (The cases where crashed mater is used
as new standby as-is might contaminate my thought..)

Sorry for the bogus comment.  My conclusion here is that
restart_after_crash doesn't seem to harm standby immediately.

> > The standby can be incosistent at the time of master crash, so it
> > should be fixed using pg_rewind or should be recreated from a base
> > backup.
> 
> Surely the master will just come up and replay its WAL, and there should
> be no inconsistency.
> 
> You seem to be thinking that a standby is promoted immediately on crash
> of the master, but this is not a given.

Basically no, but it might be mixed a bit. Anyway returning to the
porposal, I think that XLogWrite can be called during at
WAL-buffer-full and it can go into the last page in a segment. The
proposed patch doesn't work since the XLogWrite call didn't write the
whole continuation record. But I'm not sure that corner-case is worth
amendint..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Allow cluster owner to bypass authentication
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: automating pg_config.h.win32 maintenance