Re: Incorrect handling of OOM in WAL replay leading to data loss

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Incorrect handling of OOM in WAL replay leading to data loss
Дата
Msg-id 20230809.161353.1075613419067875207.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Incorrect handling of OOM in WAL replay leading to data loss  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: Incorrect handling of OOM in WAL replay leading to data loss  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
At Wed, 9 Aug 2023 15:03:21 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> I have spent more time on 0001, polishing it and fixing a few bugs
> that I have found while reviewing the whole.  Most of them were
> related to mistakes in resetting the error state when expected.  I
> have also expanded DecodeXLogRecord() to use an error structure
> instead of only an errmsg, giving more consistency.  The error state
> now relies on two structures:
> +typedef enum XLogReaderErrorCode
> +{
> +   XLOG_READER_NONE = 0,
> +   XLOG_READER_OOM,            /* out-of-memory */
> +   XLOG_READER_INVALID_DATA,   /* record data */
> +} XLogReaderErrorCode;
> +typedef struct XLogReaderError
> +{
> +   /* Buffer to hold error message */
> +   char       *message;
> +   bool        message_deferred;
> +   /* Error code when filling *message */
> +   XLogReaderErrorCode code;
> +} XLogReaderError;
> 
> I'm kind of happy with this layer, now.

I'm not certain if message_deferred is a property of the error
struct. Callers don't seem to need that information.

The name "XLOG_RADER_NONE" seems too generic. XLOG_READER_NOERROR will
be clearer.

> I have also spent some time on finding a more elegant solution for the
> WAL replay, relying on the new facility from 0001.  And it happens
> that it is easy enough to loop if facing an out-of-memory failure when
> reading a record when we are in crash recovery, as the state is
> actually close to what a standby does.  The trick is that we should
> not change the state and avoid tracking a continuation record.  This
> is done in 0002, making replay more robust.  With the addition of the

0002 shifts the behavior for the OOM case from ending recovery to
retrying at the same record.  If the last record is really corrupted,
the server won't be able to finish recovery. I doubt we are good with
this behavior change.

> error injection tweak in 0003, I am able to finish recovery while the
> startup process loops if under memory pressure.  As mentioned
> previously, there are more code paths to consider, but that's a start
> to fix the data loss problems.

(The file name "xlogreader_oom" is a bit trickeier to type than "hoge"
 or "foo"X( )

> Comments are welcome.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Laetitia Avrot
Дата:
Сообщение: Re: Adding a pg_servername() function
Следующее
От: Juan José Santamaría Flecha
Дата:
Сообщение: Re: Cirrus-ci is lowering free CI cycles - what to do with cfbot, etc?