Обсуждение: mmap for zeroing WAL log
[ redirected to pgsql-hackers instead of -patches ]
Matthew Kirkwood <matthew@hairy.beasts.org> writes:
> On Sat, 24 Feb 2001, Bruce Momjian wrote:
>> I am confused why mmap() is better than writing to a real file.
> It isn't, except that it allows to initialise the logfile in
> one syscall, without first allocating and zeroing (and hence
> dirtying) 16Mb of memory.
Uh, the existing code does not zero 16Mb of memory... it zeroes
8K and then writes that block repeatedly. It's possible that the
overhead of a syscall for each 8K block is significant, but on the
other hand writing a block at a time is a heavily used and heavily
optimized path in all Unixen. It's at least as plausible that the
mmap-as-source-of-zeroes path will be slower!
I think this is worth looking into, but I'm very far from being
sold on it...
regards, tom lane
Matthew Kirkwood <matthew@hairy.beasts.org> writes:
> I had assumed that the overhead would come from synchronous
> metadata incurring writes of at least the inode, block bitmap
> and probably an indirect block for each syscall.
No Unix that I've ever heard of forces metadata to disk after each
"write" call; anyone who tried it would have abysmal performance.
That's what fsync and the syncer daemon are for.
regards, tom lane
On Sat, 24 Feb 2001, Tom Lane wrote: > >> I am confused why mmap() is better than writing to a real file. > > > It isn't, except that it allows to initialise the logfile in > > one syscall, without first allocating and zeroing (and hence > > dirtying) 16Mb of memory. > > Uh, the existing code does not zero 16Mb of memory... it zeroes > 8K and then writes that block repeatedly. See the "one syscall" bit above. > It's possible that the overhead of a syscall for each 8K block is > significant, I had assumed that the overhead would come from synchronous metadata incurring writes of at least the inode, block bitmap and probably an indirect block for each syscall. > but on the other hand writing a block at a time is a heavily used and > heavily optimized path in all Unixen. It's at least as plausible that > the mmap-as-source-of-zeroes path will be slower! Results: On Linux/ext2, it appears good for a gain of 3-5% for log creations (via a fairly minimal test program). On FreeBSD 4.1-RELEASE/ffs (with all of sync/async/softupdates) it is a couple of percent worse in elapsed time, but consumes around a third more system CPU time (12sec vs 9sec on one test system). I am awaiting numbers from reiserfs but, for now, it looks like I am far from vindicated. Matthew.
On Tue, 27 Feb 2001, Tom Lane wrote: > Matthew Kirkwood <matthew@hairy.beasts.org> writes: > > I had assumed that the overhead would come from synchronous > > metadata incurring writes of at least the inode, block bitmap > > and probably an indirect block for each syscall. > > No Unix that I've ever heard of forces metadata to disk after each > "write" call; anyone who tried it would have abysmal performance. > That's what fsync and the syncer daemon are for. My understanding was that that's exactly what ffs' synchronous metadata writes do. Am I missing something here? Do they jsut schedule I/O, but return without waiting for its completion? Matthew.