On 12/20/2016 02:19 PM, Andres Freund wrote:
> On 2016-12-20 08:10:29 -0500, Robert Haas wrote:
>> We could use the GUC assign hook to compute a mask and a shift, so
>> that this could be written as (CurrPos & mask_variable) == 0. That
>> would avoid the division instruction, though not the memory access.
>
> I suspect that'd be fine.
>
>
>> I hope this is all in the noise, though.
>
> Could very well be.
>
>
>> I know this is code is hot but I think it'll be hard to construct a
>> test case where the bottleneck is anything other than the speed at
>> which the disk can absorb bytes.
>
> I don't think that's really true. Heikki's WAL changes made a *BIG*
> difference. And pretty small changes in xlog.c can make noticeable
> throughput differences both in single and multi-threaded
> workloads. E.g. witnessed by the fact that the crc computation used to
> be a major bottleneck (and the crc32c instruction still shows up
> noticeably in profiles). SSDs have become fast enough that it's
> increasingly hard to saturate them.
>
It's not just SSDs. RAID controllers with write cache (which is
typically just DRR3 memory anyway) have about the same effect even with
spinning rust.
So yes, this might make a measurable difference.
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services