Robert Haas <robertmhaas@gmail.com> writes:
> I guess the word "run" is misleading (I wrote the program in 5
> minutes); it's just zeroing the same chunk twice and measuring the
> times. The difference is presumably the page fault overhead, which
> implies that faulting is two-thirds of the overhead on MacOS X and
> three-quarters of the overhead on Linux.
Ah, cute solution to the measurement problem. I replicated the
experiment just as a cross-check:
Fedora 13 on x86_64 (recent Nehalem):
first run: 346767
second run: 103143
Darwin on x86_64 (not-so-recent Penryn):
first run: 341289
second run: 64535
HPUX on HPPA:
first run: 2191136
second run: 1199879
(On the last two machines I had to cut the array size to 256MB to avoid
swapping.) All builds with "gcc -O2".
> This makes me pretty
> pessimistic about the chances of a meaningful speedup here.
Yeah, this is confirmation that what you are seeing in the original test
is mostly about faulting pages in, not about the zeroing. I think it
would still be interesting to revisit the micro-optimization of
MemSet(), but it doesn't look like massive restructuring to avoid it
altogether is going to be worthwhile.
regards, tom lane