On 11/21/2014 01:06 PM, Andres Freund wrote:
> On 2014-11-21 13:01:50 +0200, Heikki Linnakangas wrote:
>> On 11/21/2014 12:11 PM, Abhijit Menon-Sen wrote:
>>> At 2014-11-20 13:47:00 +0530, ams@2ndQuadrant.com wrote:
>>>>
>>>>> Suggestions for how to address (b) are welcome.
>>>
>>> With help from Andres, I set up a workload where XLogInsert* was at the
>>> top of my profiles: server with fsync and synchronous_commit off, and
>>> pgbench running a multiple-row insert into a single-text-column table
>>> with -M prepared -c 25 -t 250000 -f script.
>>
>> How wide is the row, in terms of bytes? You should see bigger improvement
>> with wider rows, as you get longer contiguous chunks of data to CRC that
>> way. With very narrow rows, you might not see much difference because the
>> chunks are so small.
>
> The primary goal, as I understood it, was to test very short records to
> test whether there's a regression due to slice-by-8. There doesn't seem
> to be any.
Ah, OK. Mission accomplished, then.
> It's, IIRC, trivial to reproduce significant performance benefits in
> workloads with lots of FPWs or large records (like COPY), but those
> weren't what you and Tom were concerned about...
Yeah.
>> If that's the problem, it might be beneficial to memcpy() all the data to a
>> temporary buffer, and calculate the CRC over the whole, instead of CRC'ing
>> each XLogRecData chunk separately. XLogRecordAssemble() uses a scratch area,
>> hdr_scratch, for building all the headers. You could check how much
>> rmgr-specific data there is, and if there isn't much, just append the data
>> to that scratch area too.
>
> I think that might very well make sense - but it's imo something better
> done separately from the smarter CRC algorithm.
Agreed.
- Heikki