Обсуждение: refactor architecture-specific popcount code
Right now, the organization of this code is weird. All AArch64-specific implementations live in an AArch64-specific file, the AVX-512 implementations live in their own file, and the architecture-agnostic and SSE4.2 implementations live in pg_bitutils.c. The attached patches move the SSE4.2 implementations to the AVX-512 file (which is renamed appropriately), and they update some function names to be more descriptive, i.e., "fast" is replaced with "sse42" and "slow" is replaced with "generic". I probably should've done this a while ago... -- nathan
Вложения
On Thu, Jan 15, 2026 at 3:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > Right now, the organization of this code is weird. All AArch64-specific > implementations live in an AArch64-specific file, the AVX-512 > implementations live in their own file, and the architecture-agnostic and > SSE4.2 implementations live in pg_bitutils.c. The attached patches move > the SSE4.2 implementations to the AVX-512 file (which is renamed > appropriately), and they update some function names to be more descriptive, > i.e., "fast" is replaced with "sse42" and "slow" is replaced with > "generic". Thanks for taking on some technical debt! 0001 --- a/src/port/pg_popcount_avx512.c +++ b/src/port/pg_popcount_x86_64.c Can we get away with just "x86" for brevity? We generally don't target 32-bit CPUs for this kind of work, so there's no chance of confusion. 0002 ``` -#ifdef USE_AVX512_POPCNT_WITH_RUNTIME_CHECK +#include "port/pg_bitutils.h" + +#ifdef TRY_POPCNT_X86_64 #if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT) #include <cpuid.h> #endif ``` With the above in the x86 .c file, I wonder we can get rid of this stanza and the "try" symbol and gate only on HAVE_X86_64_POPCNTQ: #ifdef HAVE_X86_64_POPCNTQ #if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID) #define TRY_POPCNT_X86_64 1 #endif #endif If we have to be cautious, we could just turn the #error on no CPUID symbol into "return false". 0003 s/fast/sse42/: Seems okay in this file, but this isn't the best name, either. Maybe a comment to head off future "corrections", something like: "Technically, POPCNT is not part of SSE 4.2, and is not even a vector operation, but many compilers emit the popcnt instruction with -msse4.2 anyway." s/slow/generic/: I'm ambivalent about this. The "slow" designation is flat-out wrong since at least Power and aarch64 can emit a single instruction here without prodding the compiler. On the other hand, "generic" seems wrong too, since e.g. pg_popcount64_slow() has three configure symbols and two compiler builtins. :-D A possible future project would be to have a truly generic simple fallback in pure C and put all the fancy stuff in the header for architectures that have unconditional hardware support. It would make more sense to revisit the name then. -- John Naylor Amazon Web Services
On 15/01/2026 11:07, John Naylor wrote: > 0003 > > s/fast/sse42/: > > Seems okay in this file, but this isn't the best name, either. Maybe a > comment to head off future "corrections", something like: > "Technically, POPCNT is not part of SSE 4.2, and is not even a vector > operation, but many compilers emit the popcnt instruction with > -msse4.2 anyway." > > s/slow/generic/: > > I'm ambivalent about this. The "slow" designation is flat-out wrong > since at least Power and aarch64 can emit a single instruction here > without prodding the compiler. On the other hand, "generic" seems > wrong too, since e.g. pg_popcount64_slow() has three configure symbols > and two compiler builtins. :-D "fallback", or "portable" ? > A possible future project would be to have a truly generic simple > fallback in pure C and put all the fancy stuff in the header for > architectures that have unconditional hardware support. It would make > more sense to revisit the name then. Yeah, I noticed that on x86_64, pg_popcount_optimized is always a function pointer with runtime check, even if you use compiler flags to target a CPU where the special instructions are available unconditionally. - Heikki
On Thu, Jan 15, 2026 at 04:07:51PM +0700, John Naylor wrote:
> Thanks for taking on some technical debt!
Thanks for reviewing.
> --- a/src/port/pg_popcount_avx512.c
> +++ b/src/port/pg_popcount_x86_64.c
>
> Can we get away with just "x86" for brevity? We generally don't target
> 32-bit CPUs for this kind of work, so there's no chance of confusion.
WFM.
> ```
> -#ifdef USE_AVX512_POPCNT_WITH_RUNTIME_CHECK
> +#include "port/pg_bitutils.h"
> +
> +#ifdef TRY_POPCNT_X86_64
>
> #if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT)
> #include <cpuid.h>
> #endif
> ```
>
> With the above in the x86 .c file, I wonder we can get rid of this
> stanza and the "try" symbol and gate only on HAVE_X86_64_POPCNTQ:
>
> #ifdef HAVE_X86_64_POPCNTQ
> #if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
> #define TRY_POPCNT_X86_64 1
> #endif
> #endif
>
> If we have to be cautious, we could just turn the #error on no CPUID
> symbol into "return false".
Yeah, the CPUID macro checks do seem overly cautious to me, especially
since we've just #error'd when the CPUID intrinsics are missing in
pg_crc32c_sse42_choose.c since 2015. That seems to suggest that nobody is
trying to build Postgres with a compiler that knows about SSE4.2/POPCNT but
not CPUID. For reference, CPUID was introduced in 1993.
I bet we could also convert this bit into a configuration-time check:
#if defined(_MSC_VER) && defined(_M_AMD64)
#define HAVE_X86_64_POPCNTQ
#endif
> s/fast/sse42/:
>
> Seems okay in this file, but this isn't the best name, either. Maybe a
> comment to head off future "corrections", something like:
> "Technically, POPCNT is not part of SSE 4.2, and is not even a vector
> operation, but many compilers emit the popcnt instruction with
> -msse4.2 anyway."
Makes sense.
--
nathan
On Thu, Jan 15, 2026 at 11:42:14AM +0200, Heikki Linnakangas wrote: > On 15/01/2026 11:07, John Naylor wrote: >> s/slow/generic/: >> >> I'm ambivalent about this. The "slow" designation is flat-out wrong >> since at least Power and aarch64 can emit a single instruction here >> without prodding the compiler. On the other hand, "generic" seems >> wrong too, since e.g. pg_popcount64_slow() has three configure symbols >> and two compiler builtins. :-D > > "fallback", or "portable" ? I've no strong opinions, but "portable" seems reasonable to me. > Yeah, I noticed that on x86_64, pg_popcount_optimized is always a function > pointer with runtime check, even if you use compiler flags to target a CPU > where the special instructions are available unconditionally. I wonder how close we are to being able to just require SSE4.2/POPCNT for x86-64 builds. I suppose there's always a chance that someone will try to run Postgres 19 on a CPU from the aughts... In any case, avoiding the function pointer when possible seems like a good follow-up. -- nathan
On 15/01/2026 18:08, Nathan Bossart wrote: > On Thu, Jan 15, 2026 at 11:42:14AM +0200, Heikki Linnakangas wrote: >> Yeah, I noticed that on x86_64, pg_popcount_optimized is always a function >> pointer with runtime check, even if you use compiler flags to target a CPU >> where the special instructions are available unconditionally. > > I wonder how close we are to being able to just require SSE4.2/POPCNT for > x86-64 builds. I suppose there's always a chance that someone will try to > run Postgres 19 on a CPU from the aughts... In any case, avoiding the > function pointer when possible seems like a good follow-up. It's not really our decision. Packagers choose which architecture to target and which compiler options to build with. We ought to just respect those choices. - Heikki
Here is a new patch set. Notably, I've added a 0004 that does the
following:
* Removes TRY_POPCNT_X86_64. We now assume that the required CPUID
intrinsics are available, as we have long done in some of the CRC-32C code.
* Moves the MSVC check for HAVE_X86_64_POPCNTQ to configuration-time. This
way, we set it for all relevant platforms in one place.
* Moves the #defines for USE_SSE2 and USE_NEON to c.h so that they can be
used elsewhere without simd.h. Consequently, we can remove POPCNT_AARCH64.
* Moves the #includes for pg_bitutils.h to below the system headers in
pg_popcount_{aarch64,x86}.c (since we no longer depend on macros defined in
pg_bitutils.h).
--
nathan
Вложения
On Fri, Jan 16, 2026 at 2:07 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Here is a new patch set. Notably, I've added a 0004 that does the
> following:
>
> * Removes TRY_POPCNT_X86_64. We now assume that the required CPUID
> intrinsics are available, as we have long done in some of the CRC-32C code.
>
> * Moves the MSVC check for HAVE_X86_64_POPCNTQ to configuration-time. This
> way, we set it for all relevant platforms in one place.
LGTM.
> * Moves the #defines for USE_SSE2 and USE_NEON to c.h so that they can be
> used elsewhere without simd.h. Consequently, we can remove POPCNT_AARCH64.
Seems reasonable.
> * Moves the #includes for pg_bitutils.h to below the system headers in
> pg_popcount_{aarch64,x86}.c (since we no longer depend on macros defined in
> pg_bitutils.h).
Good.
v2-0003
+ * XXX Technically, POPCNT is not part of SSE4.2, and isn't even a vector
+ * operation, but in practice this is close enough, and "sse42" seems easier to
+ * follow than "popcnt" for these names.
Quibble -- XXX is a bit loud for a side note.
On Thu, Jan 15, 2026 at 11:08 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Thu, Jan 15, 2026 at 11:42:14AM +0200, Heikki Linnakangas wrote:
> > "fallback", or "portable" ?
>
> I've no strong opinions, but "portable" seems reasonable to me.
I have a mild preference for "fallback" since it's a noun and we
already have comments in src/(include/)port referring to fallbacks. No
objection to "portable", though.
--
John Naylor
Amazon Web Services