Achilleas Mantzios <achill@matrix.gatewaynet.com> writes:
> Remember : postgresql checkpointer decided to remove 5000+ files before shutdown. If any conditions were keeping
thosefiles afloat should also hold at this point, right.
> The question is why didn't Postgresql removed them earlier.
WAL files get removed/recycled after completion of a checkpoint. So
apparently, checkpoints were not finishing during normal operation,
but the shutdown checkpoint managed to terminate normally. That
eliminates a lot of the usual theories about why checkpoints might
not be succeeding (like a dirty buffer that always fails to be
written, say as a result of broken permissions on its file).
The only theory that comes to mind is that the checkpointer process
was stuck somehow, but just "soft" stuck, in a way that allowed the
postmaster's time-to-shut-down-please signal to unstick it. No,
I have no idea how that could happen exactly. If it happens again,
it'd be really interesting to attach to the checkpointer with a
debugger and collect a stack trace.
regards, tom lane