Re: archive status ".ready" files may be created too early

Поиск
Список
Период
Сортировка
От Bossart, Nathan
Тема Re: archive status ".ready" files may be created too early
Дата
Msg-id F133B7B1-451E-41E9-9C07-B5019F641CBF@amazon.com
обсуждение исходный текст
Ответ на Re: archive status ".ready" files may be created too early  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: archive status ".ready" files may be created too early  ("Bossart, Nathan" <bossartn@amazon.com>)
Список pgsql-hackers
On 12/23/19, 6:09 PM, "Kyotaro Horiguchi" <horikyota.ntt@gmail.com> wrote:
> You're right. That doesn't seem to work. Another thing I had in my
> mind was that XLogWrite had an additional flag so that
> AdvanceXLInsertBuffer can tell not to mark .ready. The function is
> called while it *is* writing a complete record. So even if
> AdvanceXLInsertBuffer inhibit marking .ready the succeeding bytes
> comes soon and we can mark the old segment as .ready at the time.
>
> ..
> + * If record_write == false, we don't mark the last segment as .ready
> + * if the caller requested to write up to segment boundary.
> ..
>   static void
> - XLogWrite(XLogwrtRqst WriteRqst, bool flexible)
> + XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool record_write)
>
> When XLogWrite is called with record_write = false, we don't mark
> .ready and don't advance lastSegSwitchTime/LSN. At the next time
> XLogWrite is called with record_write=true, if lastSegSwitchLSN is
> behind the latest segment boundary before or equal to
> LogwrtResult.Write, mark the skipped segments as .ready and update
> lastSegSwitchTime/LSN.

Thanks for the suggestion.  I explored this proposal a bit today.
It looks like there are three places where XLogWrite() is called:
AdvanceXLInsertBuffer(), XLogFlush(), and XLogBackgroundFlush().  IIUC
while XLogFlush() generally seems to be used to write complete records
to disk, this might not be true for XLogBackgroundFlush(), and we're
reasonably sure we cannot make such an assumption for
AdvanceXLInsertBuffer().  Therefore, we would likely only set
record_write to true for XLogFlush() and for certain calls to
XLogBackgroundFlush (e.g. flushing asynchronous commits).

I'm worried that this approach could be fragile and that we could end
up waiting an arbitrarily long time before marking segments as ready
for archival.  Even if we pay very close attention to the latest
flushed LSN, it seems possible that a non-record_write call to
XLogWrite() advances things such that we avoid ever calling it with
record_write = true.  For example, XLogBackgroundFlush() may have
flushed the completed blocks, which we cannot assume are complete
records.  Then, XLogFlush() would skip calling XLogWrite() if
LogwrtResult.Flush is sufficiently far ahead.  In this scenario, I
don't think we would mark any eligible segments as ".ready" until the
next call to XLogWrite() with record_write = true, which may never
happen.

The next approach I'm going to try is having the callers of
XLogWrite() manage marking segments ready.  That might make it easier
to mitigate some of my concerns above, but I'm not tremendously
optimistic that this approach will fare any better.

Nathan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: [PATCH] Increase the maximum value track_activity_query_size
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] Block level parallel vacuum