On Fri, Feb 24, 2017 at 10:02 PM, Bruce Momjian <bruce@momjian.us> wrote:
> Uh, as far as I know, the best you are going to get from llvm is
> standard assembly, while the SSE4.1 instructions use special assembly
> instructions, so they would be faster, and in a way they are a GPU built
> into CPUs.
Both LLVM and GCC are capable of compiling the code that we have to a
vectorized loop using SSE4.1 or AVX2 instructions given the proper
compilation flags. This is exactly what was giving the speedup in the
test I showed in my e-mail.
Regards,
Ants Aasma