RE: Improve CRC32C performance on SSE4.2
От | Devulapalli, Raghuveer |
---|---|
Тема | RE: Improve CRC32C performance on SSE4.2 |
Дата | |
Msg-id | PH8PR11MB8286F38A75AE71EE3A856E64FBC92@PH8PR11MB8286.namprd11.prod.outlook.com обсуждение исходный текст |
Ответ на | Re: Improve CRC32C performance on SSE4.2 (John Naylor <johncnaylorls@gmail.com>) |
Ответы |
Re: Improve CRC32C performance on SSE4.2
|
Список | pgsql-hackers |
Hi John, > You raised some interesting points, which deserve a thoughtful response. After > sleeping on it, however I came to the conclusion that a sweeping change in > runtime checks, with either of our approaches, has downsides and unresolved > questions. Perhaps we can come back to it at a later time. Sounds good to me. I did get derailed into something beyond the scope of this patch. I will make a separate proposal oncethis is merged. > Here's what I came up with in v11: Some feedback on v11: if ((exx[2] & (1 << 20)) != 0) /* SSE 4.2 */ { pg_comp_crc32c = pg_comp_crc32c_sse42; #ifdef USE_PCLMUL_WITH_RUNTIME_CHECK if ((exx[2] & (1 << 1)) != 0) /* PCLMUL */ pg_comp_crc32c = pg_comp_crc32c_pclmul; #endif } #ifdef USE_SSE42_CRC32C_WITH_RUNTIME_CHECK else pg_comp_crc32c = pg_comp_crc32c_sb8; #endif Is the #ifdef USE_SSE42_CRC32C_WITH_RUNTIME_CHECK at the right place? Shouldn’t it guard SSE4.2 function pointer assignment? /* WIP: configure checks */ #ifdef __x86_64__ #define USE_PCLMUL_WITH_RUNTIME_CHECK #endif Minor consideration: gcc 4.3 (released in 2011) is the only compiler that supports -msse4.2 and not -mpclmul. gcc >= 4.4supports both. If you are okay causing a regression on gcc4.3, we could combine USE_PCLMUL_WITH_RUNTIME_CHECK with USE_SSE42_CRC32C_WITH_RUNTIME_CHECKinto a single macro to reduce the number of #ifdef's in the codebase and simplify configure/mesoncompiler checks. > 0001: same benchmark module as before > 0002: For SSE4.2 builds, arrange so that constant input uses an inlined path so > that the compiler can emit unrolled loops anywhere. When building with meson, it looks like we build with -O2 and that is not enough for the compiler to unroll the SSE42 CRC32Cloop. It requires -O3 or -O2 with -funroll-loops (see https://gcc.godbolt.org/z/4Eaq981aT). Perhaps we should checkdisassembly to see if the unroll is really happening on constant input? Also, the reason you have pg_comp_crc32c_sse42_inline defined separately in a header file is because you want to (a) inlinethe function and (b) unroll for constant inputs. Couldn't both of these be achieved by adding function __attribute__((always_inline))on the pg_comp_crc32c_sse42 function with the added -funroll-loops compiler flag? Raghuveer
В списке pgsql-hackers по дате отправления: