Re: BUG #17928: Standby fails to decode WAL on termination of primary

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #17928: Standby fails to decode WAL on termination of primary
Дата
Msg-id CA+hUKG+jFN64gKVGcmfNTekqWn3cemRx99-B8DDBaDyzWnfpkw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Sergei Kornilov <sk@zsrv.org>)
Ответы Re: BUG #17928: Standby fails to decode WAL on termination of primary  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-bugs
Thanks for looking/testing, Sergei.  Thanks for the changes, Michael,
these all look good.  I've squashed them and added you as co-author.

A couple more small comment/text changes:

1.  In the place where we fail to allocate memory for an oversized
record, I copied the comment about treating that as a "bogus data"
condition.  I suspect that we will soon be converting that to a FATAL
error[1], and that'll need to be done in both places.

2.  In this version of the commit message I said we'd only back-patch
to 15 for now.  After sleeping on this for a week, I realised that the
reason I keep vacillating on that point is that I am not sure what
your plan is for the malloc-failure-means-end-of-wal policy ([1],
ancient code from 0ffe11abd3a).  If we're going to fix that in master
only but let sleeping dogs lie in the back-branches, then it becomes
less important to go back further than 15 with THIS patch.  But if you
want to be able to distinguish garbage from out-of-memory, and thereby
end-of-wal from a FATAL please-insert-more-RAM condition, I think
you'd really need this industrial strength validation in all affected
branches, and I'd have more work to do, right?  The weak validation we
are fixing here is the *real* underlying problem going back many
years, right?

I also wondered about strengthening the validation of various things
like redo begin/end LSNs etc in these tests.  But we can always
continue to improve all this later...

Here also is a version for 15 (and a CI run[2]), since we tweaked many
of the error messages between 15 and 16.

[1] https://www.postgresql.org/message-id/flat/ZMh/WV%2BCuknqePQQ%40paquier.xyz
[2] https://cirrus-ci.com/task/4533280897236992 (failed on some
unrelated pgbench test, reported in another thread)

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: BUG #18077: PostgreSQL server subprocess crashed by a SELECT statement with WITH clause
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17928: Standby fails to decode WAL on termination of primary