Re: Improve CRC32C performance on SSE4.2
От | Soumyadeep Chakraborty |
---|---|
Тема | Re: Improve CRC32C performance on SSE4.2 |
Дата | |
Msg-id | CAE-ML+-X8mnx-AsD-9QtB7rkWvCmcb4+VJWOrg0KPu5K2mucSA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Improve CRC32C performance on SSE4.2 (Andy Fan <zhihuifan1213@163.com>) |
Список | pgsql-hackers |
On Tue, Jun 17, 2025 at 1:55 AM John Naylor <johncnaylorls@gmail.com> wrote:
I took the minimal repro from [1] and took a look at the code generated
between clang 17 -O0 [2] and clang 17 -O3 [3]. I saw that -O3 (and
actually -O1 and -O2) generated the following code for:
castval = _mm512_castsi128_si512(_mm_cvtsi32_si128(crc0));
x0 = _mm512_xor_si512(castval, x0);
vinserti128 ymm0, ymm0, xmmword ptr [rip + .LCPI1_0], 0
vpxorq zmm0, zmm0, zmmword ptr [rdi]
Reading vpxorq's pseudocode [4], it seems that it zeroes out the leading
bits:
DEST[MAXVL-1:VL] := 0
Same thing for clang 17 -O0, if we are using _mm512_zextsi128_si512
instead [5] - vpxor and vbroadcast128 are used which seem to also
zero out leading bits.
So, -O1..-O3 were indeed emitting instructions that zero-extend and, thus
between clang 17 -O0 [2] and clang 17 -O3 [3]. I saw that -O3 (and
actually -O1 and -O2) generated the following code for:
castval = _mm512_castsi128_si512(_mm_cvtsi32_si128(crc0));
x0 = _mm512_xor_si512(castval, x0);
vinserti128 ymm0, ymm0, xmmword ptr [rip + .LCPI1_0], 0
vpxorq zmm0, zmm0, zmmword ptr [rdi]
Reading vpxorq's pseudocode [4], it seems that it zeroes out the leading
bits:
DEST[MAXVL-1:VL] := 0
Same thing for clang 17 -O0, if we are using _mm512_zextsi128_si512
instead [5] - vpxor and vbroadcast128 are used which seem to also
zero out leading bits.
So, -O1..-O3 were indeed emitting instructions that zero-extend and, thus
avoiding the undefined behavior.
[1] https://www.postgresql.org/message-id/PH8PR11MB8286A89AF2B104044187E54DFB70A%40PH8PR11MB8286.namprd11.prod.outlook.com
[2] https://godbolt.org/z/ahx9PePYr
[3] https://godbolt.org/z/W4WPzjnbb
[4] https://www.felixcloutier.com/x86/pxor#vpxorq--evex-encoded-versions-
[5] https://godbolt.org/z/46brvrnnv
Regards,
Deep (VMware)
В списке pgsql-hackers по дате отправления: