Re: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От Nathan Bossart
Тема Re: Popcount optimization using AVX512
Дата
Msg-id 20240313163904.GA2560056@nathanxps13
обсуждение исходный текст
Ответ на Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы RE: Popcount optimization using AVX512  ("Amonson, Paul D" <paul.d.amonson@intel.com>)
Список pgsql-hackers
A couple of thoughts on v7-0001:

+extern int  pg_popcount32_slow(uint32 word);
+extern int  pg_popcount64_slow(uint64 word);

+/* In pg_popcnt_*_accel source file. */
+extern int  pg_popcount32_fast(uint32 word);
+extern int  pg_popcount64_fast(uint64 word);

Can these prototypes be moved to a header file (maybe pg_bitutils.h)?  It
looks like these are defined twice in the patch, and while I'm not positive
that it's against project policy to declare extern function prototypes in
.c files, it appears to be pretty rare.

+  'pg_popcnt_choose.c',
+  'pg_popcnt_x86_64_accel.c',

I think we want these to be architecture-specific, i.e., only built for
x86_64 if the compiler knows how to use the relevant instructions.  There
is a good chance that we'll want to add similar support for other systems.
The CRC32C files are probably a good reference point for how to do this.

+#ifdef TRY_POPCNT_FAST

IIUC this macro can be set if either 1) the popcntq test in the
autoconf/meson scripts passes or 2) we're building with MSVC on x86_64.  I
wonder if it would be better to move the MSVC/x86_64 check to the
autoconf/meson scripts so that we could avoid surrounding large portions of
the popcount code with this macro.  This might even be a necessary step
towards building these files in an architecture-specific fashion.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation