On Mon, Jul 17, 2023 at 09:53:32AM +0200, Laurenz Albe wrote:
> On Mon, 2023-07-17 at 05:03 +0000, PG Bug reporting form wrote:
>> Scenario is like, there was checkpoint operation failures going on the DB
>> server since last 8 hours which means no successful checkpoint happened in
>> the DB server since last 8 hours. Then DB server went into the crash mode
>> due to the exhausted disk space and did not came up as part of crash
>> recovery.
>
> Mistake #1: you did not monitor disk space.
max_wal_size is a very critical piece to adjust. It is usually
recommended to split pg_wal/ into its own partition so as the space
allocated for WAL records is predictable across checkpoints. This is
not a perfect science as max_wal_size is a soft limit so usually one
needs an extra margin with a WAL partition. There have been some
patches floating around to make that a hard limit, as well, but I
don't think we've ever agreed on the semantics that would be
acceptable when reaching the upper limit authorized.
--
Michael