Re: Improve CRC32C performance on SSE4.2
От | John Naylor |
---|---|
Тема | Re: Improve CRC32C performance on SSE4.2 |
Дата | |
Msg-id | CANWCAZZSB12uW8r7GCqnZRdEcO55OYO4smKJvkvWdgEyeMVeBQ@mail.gmail.com обсуждение исходный текст |
Ответ на | RE: Improve CRC32C performance on SSE4.2 ("Devulapalli, Raghuveer" <raghuveer.devulapalli@intel.com>) |
Ответы |
RE: Improve CRC32C performance on SSE4.2
|
Список | pgsql-hackers |
On Wed, Feb 19, 2025 at 1:28 AM Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com> wrote: > The runtime detection code could also be appended with function __attribute__((constructor)) so that it gets executed beforemain. Hmm! I didn't know about that, thanks! It works on old gcc/clang, so that's good. I can't verify online if Arm etc. executes properly since the execute button is greyed out, but I don't see why it wouldn't: https://godbolt.org/z/3as9MGr3G Then we could have: #ifdef FRONTEND pg_attribute_constructor() #endif void pg_cpucap_initialize(void) { ... } Now, there is no equivalent on MSVC and workarounds are fragile [1]. Maybe we could only assert initialization happened for the backend and for frontend either - add a couple manual initializations to to the frontend programs where we don't want to lose performance for non-gcc/clang. - require CRC on x86-64 MSVC since Windows 10 is EOL soon, going by Thomas M.'s earlier findings on popcount (also SSE4.2) [2] The first is less risky but less tidy. > > I think the runtime dispatch is fairly simple for this case. > + pg_comp_crc32c_sse42(pg_crc32c crc, const void *data, size_t len) > + { > + .... > + #ifdef HAVE_PCLMUL_RUNTIME > + if (len >= PCLMUL_THRESHOLD && (pg_cpucap & PGCPUCAP_CLMUL)) > > IMO, the pclmul and sse4.2 versions should be dispatched independently and the dispatch logic should be moved out of thepg_crc32c.h to it own pg_crc32c_dispatch.c file. That makes sense, but it does close the door on future inlining. > Also, thank you for taking lead on developing this patch. Since the latest patch seems to be in a good shape, I'm happyto provide feedback and review your work, or can continue development work based on any feedback. Please let me knowwhich you'd prefer. Sure. Depending on any feedback on the above constructor technique, I'll work on the following, since I've prototyped most of the first and the second is mostly copy-and-paste from your earlier work: > > pg_cpucap_crc32c > > I like the idea of moving all CPU runtime detection to a single module outside of implementations. This makes it easy toextend in the future and keeps the functionalities separate. > > > - Rename the CRC choose*.c files to pg_cpucap_{x86,arm}.c and build them > > unconditionally for each platform > > +1. Sounds perfect. We should also move the avx512 runtime detection of popcount here. [1] https://stackoverflow.com/questions/1113409/attribute-constructor-equivalent-in-vc [2] https://www.postgresql.org/message-id/CA+hUKGKS64zJezV9y9mPcB-J0i+fLGiv3FAdwSH_3SCaVdrjyQ@mail.gmail.com -- John Naylor Amazon Web Services
В списке pgsql-hackers по дате отправления: