Re: Incorrect handling of OOM in WAL replay leading to data loss

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Incorrect handling of OOM in WAL replay leading to data loss
Дата
Msg-id 20230801.152854.605125182959292988.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Incorrect handling of OOM in WAL replay leading to data loss  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Incorrect handling of OOM in WAL replay leading to data loss
Re: Incorrect handling of OOM in WAL replay leading to data loss
Список pgsql-hackers
At Tue, 1 Aug 2023 14:03:36 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Tue, Aug 01, 2023 at 01:51:13PM +0900, Kyotaro Horiguchi wrote:
> > I believe a database server is not supposed to be executed under such
> > a memory-constrained environment.
> 
> I don't really follow this argument.  The backend and the frontends
> are reliable on OOM, where we generate ERRORs or even FATALs depending
> on the code path involved.  A memory bounded environment is something
> that can easily happen if one's not careful enough with the sizing of

I didn't meant that OOM should not happen. I mentioned an environemnt
where allocation failure can happen while crash recovery. Anyway I
didn't meant that we shouldn't "fix" it.

> the instance.  For example, this error can be triggered on a standby
> with read-only queries that put pressure on the host's memory.

I thoght that the failure on a stanby results in continuing to retry
reading the next record. However, I found that there's a case where
start process stops in response to OOM [1].

> > One issue on changing that behavior is that there's not a simple way
> > to detect a broken record before loading it into memory. We might be
> > able to implement a fallback mechanism for example that loads the
> > record into an already-allocated buffer (which is smaller than the
> > specified length) just to verify if it's corrupted. However, I
> > question whether it's worth the additional complexity. And I'm not
> > sure what if the first allocation failed.
> 
> Perhaps we could rely more on a fallback memory, especially if it is
> possible to use that for the header validation.  That seems like a
> separate thing, still.

Once a record have been read, that size of memory is already
allocated.

While we will not agree, we could establish a defalut behavior where
an OOM during recovery immediately triggers an ERROR. Then, we could
introduce a *GUC* that causes recovery to regard OOM as an
end-of-recovery error.

regards.

[1] https://www.postgresql.org/message-id/17928-aa92416a70ff44a2%40postgresql.org

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiro Ikeda
Дата:
Сообщение: Fix pg_stat_reset_single_table_counters function
Следующее
От: vignesh C
Дата:
Сообщение: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication