Re: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: Popcount optimization using AVX512
Дата
Msg-id 202401250949.xrwkd46bgbpl@alvherre.pgsql
обсуждение исходный текст
Ответ на RE: Popcount optimization using AVX512  ("Shankaran, Akash" <akash.shankaran@intel.com>)
Ответы Re: Popcount optimization using AVX512  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
RE: Popcount optimization using AVX512  ("Amonson, Paul D" <paul.d.amonson@intel.com>)
Список pgsql-hackers
On 2024-Jan-25, Shankaran, Akash wrote:

> With the updated patch, we observed significant improvements and
> handily beat the previous popcount algorithm performance. No
> regressions in any scenario are observed:
> Platform: Intel Xeon Platinum 8360Y (Icelake) for data sizes 1kb - 64kb.
> Microbenchmark: 2x - 3x gains presently vs 19% previously, on the same
> microbenchmark described initially in this thread. 

These are great results.

However, it would be much better if the improved code were available for
all relevant builds and activated if a CPUID test determines that the
relevant instructions are available, instead of requiring a compile-time
flag -- which most builds are not going to use, thus wasting the
opportunity for running the optimized code.

I suppose this would require patching pg_popcount64_choose() to be more
specific.  Looking at the existing code, I would also consider renaming
the "_fast" variants to something like pg_popcount32_asml/
pg_popcount64_asmq so that you can name the new one pg_popcount64_asmdq
or such.  (Or maybe leave the 32-bit version alone as "fast/slow", since
there's no third option for that one -- or do I misread?)

I also think this needs to move the CFLAGS-decision-making elsewhere;
asking the user to get it right is too much of a burden.  Is it workable
to simply verify compiler support for the additional flags needed, and
if so add them to a new CFLAGS_BITUTILS variable or such?  We already
have the CFLAGS_CRC model that should be easy to follow.  Should be easy
enough to mostly copy what's in configure.ac and meson.build, right?

Finally, the matter of using ifunc as proposed by Noah seems to be still
in the air, with no patches offered for the popcount family.  Given that
Nathan reports [1] a performance decrease, maybe we should set that
thought aside for now and continue to use function pointers.  It's worth
keeping in mind that popcount is already using function pointers (at
least in the case where we try to use POPCNT directly), so patching to
select between three options instead of between two wouldn't be a
regression.

[1] https://postgr.es/m/20231107201441.GA898662@nathanxps13

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)



В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: Documentation to upgrade logical replication cluster
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: speed up a logical replica setup