Hi,
On 2023-01-11 22:39:49 -0600, Justin Pryzby wrote:
> Thomas raised a good question, which was how the tests were passing when
> SnapBuildSerialize() was raising an error, which is what it would've
> been doing when I used data_sync_retry=no.
Presumably some test not checking for failures in a part of the test.
> So .. why is wal_sync_method being used to control fsync for things
> other than WAL?
Historical raisins, I think. The problem is that macOS lies about fsync, and
one needs special magic to make it behave like a real fsync. Somebody thought
that instead of inventing a separate GUC to control whether the "real" fsync
is used for other subsystems, it'd be better to reuse wal_sync_method.
Note that this isn't the function that is actually used for WAL (that's
issue_xlog_fsync()), and that pg_fsync() only uses the GUC to know whether to
use pg_fsync_writethrough() or pg_fsync_no_writethrough(fd).
> See 6dc7760ac (c. 2005) which added wal_fsync_writethrough, at which
> point (since 9b178555f, c. 2004) wal_sync_method was already being used
> for SLOG.
>
> Now, it's also being used for logical decoding (since b89e1510 and
> 858ec1185, c. 2014) in rewriteheap.c/snapbuild.c. And pidfiles (since
> ee0e525bf, 2010). And the control file (8b938d36f7, 2019). Note that
> data_sync_retry wasn't added until 9ccdd7f66 (c. 2018)
>
> It looks like logical decoding may be the "most wrong" place that
> wal_sync_method is being used, so maybe my change is reasonable to
> consider, and not just a workaround.
I don't follow. What does using fsync_fname() vs fsync_fname_ext() have to do
with pg_fsync() using wal_sync_method? fsync_fname() is just a wrapper around
fsync_fname_ext(). Both end up in pg_fsync().
Are you actually proposing that we don't PANIC after an fsync for the category
of files that you list here, even with data_sync_retry set?
Greetings,
Andres Freund