Re: Incorrect handling of OOM in WAL replay leading to data loss

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Incorrect handling of OOM in WAL replay leading to data loss
Дата
Msg-id 20230801.135113.1095735354684995020.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Incorrect handling of OOM in WAL replay leading to data loss  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Incorrect handling of OOM in WAL replay leading to data loss  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
At Tue, 1 Aug 2023 12:43:21 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> A colleague, Ethan Mertz (in CC), has discovered that we don't handle
> correctly WAL records that are failing because of an OOM when
> allocating their required space.  In the case of Ethan, we have bumped
> on the failure after an allocation failure on XLogReadRecordAlloc():
> "out of memory while trying to decode a record of length"

I believe a database server is not supposed to be executed under such
a memory-constrained environment.

> In crash recovery, any records after the OOM would not be replayed.
> At quick glance, it seems to me that this can also impact standbys,
> where recovery could stop earlier than it should once a consistent
> point has been reached.

Actually the code is assuming that OOM happens solely due to a broken
record length field. I believe that we intentionally put that
assumption.

> A patch is registered in the commit fest to improve the error
> detection handling, but as far as I can see it fails to handle the OOM
> case and replaces ReadRecord() to use a WARNING in the redo loop:
> https://www.postgresql.org/message-id/20200228.160100.2210969269596489579.horikyota.ntt%40gmail.com

It doesn't change behavior unrelated to the case where the last record
is followed by zeroed trailing bytes.

> On top of my mind, any solution I can think of needs to add more
> information to XLogReaderState, where we'd either track the type of
> error that happened close to errormsg_buf which is where these errors
> are tracked, but any of that cannot be backpatched, unfortunately.

One issue on changing that behavior is that there's not a simple way
to detect a broken record before loading it into memory. We might be
able to implement a fallback mechanism for example that loads the
record into an already-allocated buffer (which is smaller than the
specified length) just to verify if it's corrupted. However, I
question whether it's worth the additional complexity. And I'm not
sure what if the first allocation failed.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andy Fan
Дата:
Сообщение: Extract numeric filed in JSONB more effectively
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: Fix compilation warnings when CFLAGS -Og is specified