RE: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От Amonson, Paul D
Тема RE: Popcount optimization using AVX512
Дата
Msg-id BL1PR11MB5304E51336123CE6F041A920DC3B2@BL1PR11MB5304.namprd11.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: Popcount optimization using AVX512  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
RE: Popcount optimization using AVX512  ("Amonson, Paul D" <paul.d.amonson@intel.com>)
Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
> -----Original Message-----
> From: Nathan Bossart <nathandbossart@gmail.com>
> Sent: Thursday, March 28, 2024 2:39 PM
> To: Amonson, Paul D <paul.d.amonson@intel.com>
>
> * The latest patch set from Paul Amonson appeared to support MSVC in the
>   meson build, but not the autoconf one.  I don't have much expertise here,
>   so the v14 patch doesn't have any autoconf/meson support for MSVC, which
>   I thought might be okay for now.  IIUC we assume that 64-bit/MSVC builds
>   can always compile the x86_64 popcount code, but I don't know whether
>   that's safe for AVX512.

I also do not know how to integrate MSVC+Autoconf, the CI uses MSVC+Meson+Ninja so I stuck with that.

> * I think we need to verify there isn't a huge performance regression for
>   smaller arrays.  IIUC those will still require an AVX512 instruction or
>   two as well as a function call, which might add some noticeable overhead.

Not considering your changes, I had already tested small buffers. At less than 512 bytes there was no measurable
regression(there was one extra condition check) and for 512+ bytes it moved from no regression to some gains between
512and 4096 bytes. Assuming you introduced no extra function calls, it should be the same. 

> I forgot to mention that I also want to understand whether we can actually assume availability of XGETBV when CPUID
sayswe support AVX512: 

You cannot assume as there are edge cases where AVX-512 was found on system one during compile but it's not actually
availablein a kernel on a second system at runtime despite the CPU actually having the hardware feature. 

I will review the new patch to see if there are anything that jumps out at me.

Thanks,
Paul




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: Popcount optimization using AVX512
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: incorrect results and different plan with 2 very similar queries