On Thu, Dec 28, 2006 at 09:28:48PM +0000, Heikki Linnakangas wrote:
> Tom Lane wrote:
> >To my mind the problem with fsync is not that it gives us too little
> >control but that it gives too much: we have to specify a particular
> >order of writing out files. What we'd really like is a version of
> >sync(2) that tells us when it's done but doesn't constrain the I/O
> >scheduler's choices at all. Unfortunately there's no such API ...
>
> The problem I see with fsync is that it causes an immediate I/O storm as
> the OS tries to flush everything out as quickly as possible. But we're
> not in a hurry. What we'd need is a lazy fsync, that would tell the
> operating system "let me know when all these dirty buffers are written
> to disk, but I'm not in a hurry, take your time". It wouldn't change the
> scheduling of the writes, just inform the caller when they're done.
>
> If we wanted more precise control of the flushing, we could use
> sync_file_range on Linux, but that's not portable. Nevertheless, I think
> it would be OK to have an ifdef and use it on platforms that support
> it, if it gave a benefit.
I believe there's something similar for OS X as well. The question is:
would it be better to do that, or to just delay calling fsync until the
OS has had a chance to write things out.
> As a side note, with full_page_writes on, a checkpoint wouldn't actually
> need to fsync those pages that have been written to WAL after the
> checkpoint started. Doesn't make much difference in most cases, but we
> could take that into account if we start taking more control of the
> flushing.
Hrm, interesting point, but I suspect the window involved there is too
small to be worth worrying about.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)