pgsql@mohawksoft.com writes:
> After delving into this a little, it seems to me that if you are going to
> do this:
> write(file, buffer, size);
> f[data]sync(file);
> Opening with O_SYNC seems to be an optimization specifically to this
> methodology.
What you are missing is that we don't necessarily do that. Writes and
flushes of xlog don't always occur together: we may write out a buffer
to make room in shared memory even though we do not yet need it flushed
to disk. In this situation it is better *not* to have O_SYNC on because
we don't need to force (and wait for) a write just then. With a little
luck the kernel will write the buffer before we actually need a flush
to occur, and so there will be no actual delaying for it at all.
In particular this scenario applies for bulk-update transactions that
create vast amounts of WAL traffic but don't need an fsync till the very
end.
regards, tom lane