On Tue, Jul 12, 2022 at 05:48:53PM -0700, Andres Freund wrote:
> On 2022-07-11 05:31:35 -0700, David G. Johnston wrote:
> > On Saturday, July 9, 2022, PG Bug reporting form <noreply@postgresql.org> wrote:
> > > Postgresql server with csvlog log_destination enabled will have malformed
> > > CSV upon a disk space error. This causes any loading of the malformed *.csv
> > > log file to error
> > >
> > > ASK: Can the CSV file be written to in a safer way which ensures proper
> > > format even upon disk error?
> > I’d have to say that there is little interest in sacrificing performance
> > for safety here, which seems like an unavoidable proposition.
>
> I agree in general, but this specific issue seems easy enough to address. We
> could just track whether the last write failed, and if so, emit an additional
> newline.
>
> But that just fixes the simple case - if the last successful write contained
> the start of an escaped string, the newline won't necessarily be recognized as
> the end of a "row".
Here's one approach avoiding that problem. After ENOSPC causes the logfile to
end with a prefix of a message, issue ftruncate(logfile, logfile_length -
written_bytes_of_message_prefix).
An alternative would be to periodically posix_fallocate() substantial space in
the logfile, and write messages only to already-allocated space. At rotation,
clean shutdown, or startup, ftruncate() away trailing NUL bytes. I figure
this is inferior to the other approach, because the trailing NUL bytes will be
user-visible after OS crashes and when tailing active logs.
(Neither approach prevents CSV corruption if the OS crashes in the middle of
syslogger's processing of one record. I don't know a low-cost, general fix
for that. One tough case is a field that should have been "foo""bar" getting
truncated to "foo".)