On Sat, Jan 21, 2017 at 09:02:25PM +0200, Ants Aasma wrote:
> > It might be worth looking into using the CRC CPU instruction to reduce this
> > overhead, like we do for the WAL checksums. Since that is a different
> > algorithm it would be a compatibility break and we would need to support the
> > old algorithm for upgraded clusters..
>
> We looked at that when picking the algorithm. At that point it seemed
> that CRC CPU instructions were not universal enough to rely on them.
> The algorithm we ended up on was designed to be fast on SIMD hardware.
> Unfortunately on x86-64 that required SSE4.1 integer instructions, so
> with default compiles there is a lot of performance left on table. A
> low hanging fruit would be to do CPU detection like the CRC case and
> enable a SSE4.1 optimized variant when those instructions are
> available. IIRC it was actually a lot faster than the naive hardware
> CRC that is used for WAL and about on par with interleaved CRC.
Uh, I thought already did compile-time testing for SSE4.1 and used them
if present. Why do you say "with default compiles there is a lot of
performance left on table?"
-- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB
http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +