Re: Popcount optimization using AVX512
От | Nathan Bossart |
---|---|
Тема | Re: Popcount optimization using AVX512 |
Дата | |
Msg-id | 20240418210158.GA3776258@nathanxps13 обсуждение исходный текст |
Ответ на | Re: Popcount optimization using AVX512 (David Rowley <dgrowleyml@gmail.com>) |
Ответы |
RE: Popcount optimization using AVX512
|
Список | pgsql-hackers |
On Thu, Apr 18, 2024 at 08:24:03PM +0000, Devulapalli, Raghuveer wrote: >> This seems to contradict the note about doing step 3 at any point, and >> given step 1 is the OSXSAVE check, I'm not following what this means, >> anyway. > > It is recommended that you run the xgetbv code before you check for cpu > features avx512-popcnt and avx512-bw. The way it is written now is the > opposite order. I would also recommend splitting the cpuid feature check > for avx512popcnt/avx512bw and xgetbv section into separate functions to > make them modular. Something like: > > static inline > int check_os_avx512_support(void) > { > // (1) run cpuid leaf 1 to check for xgetbv instruction support: > unsigned int exx[4] = {0, 0, 0, 0}; > __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]); > if ((exx[2] & (1 << 27)) == 0) /* xsave */ > return false; > > /* Does XGETBV say the ZMM/YMM/XMM registers are enabled? */ > return (_xgetbv(0) & 0xe0) == 0xe0; > } > >> I'm also wondering if we need to check that (_xgetbv(0) & 0xe6) == 0xe6 >> instead of just (_xgetbv(0) & 0xe0) != 0, as the status of the lower >> half of some of the ZMM registers is stored in the SSE and AVX state >> [0]. I don't know how likely it is that 0xe0 would succeed but 0xe6 >> wouldn't, but we might as well make it correct. > > This is correct. It needs to check all the 3 bits (XMM/YMM and ZMM). The > way it is written is now is in-correct. Thanks for the feedback. I've attached an updated patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: