Обсуждение: refactor architecture-specific popcount code

Поиск
Список
Период
Сортировка

refactor architecture-specific popcount code

От
Nathan Bossart
Дата:
Right now, the organization of this code is weird.  All AArch64-specific
implementations live in an AArch64-specific file, the AVX-512
implementations live in their own file, and the architecture-agnostic and
SSE4.2 implementations live in pg_bitutils.c.  The attached patches move
the SSE4.2 implementations to the AVX-512 file (which is renamed
appropriately), and they update some function names to be more descriptive,
i.e., "fast" is replaced with "sse42" and "slow" is replaced with
"generic".

I probably should've done this a while ago...

-- 
nathan

Вложения

Re: refactor architecture-specific popcount code

От
John Naylor
Дата:
On Thu, Jan 15, 2026 at 3:40 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Right now, the organization of this code is weird.  All AArch64-specific
> implementations live in an AArch64-specific file, the AVX-512
> implementations live in their own file, and the architecture-agnostic and
> SSE4.2 implementations live in pg_bitutils.c.  The attached patches move
> the SSE4.2 implementations to the AVX-512 file (which is renamed
> appropriately), and they update some function names to be more descriptive,
> i.e., "fast" is replaced with "sse42" and "slow" is replaced with
> "generic".

Thanks for taking on some technical debt!

0001

--- a/src/port/pg_popcount_avx512.c
+++ b/src/port/pg_popcount_x86_64.c

Can we get away with just "x86" for brevity? We generally don't target
32-bit CPUs for this kind of work, so there's no chance of confusion.

0002

```
-#ifdef USE_AVX512_POPCNT_WITH_RUNTIME_CHECK
+#include "port/pg_bitutils.h"
+
+#ifdef TRY_POPCNT_X86_64

 #if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT)
 #include <cpuid.h>
 #endif
```

With the above in the x86 .c file, I wonder we can get rid of this
stanza and the "try" symbol and gate only on HAVE_X86_64_POPCNTQ:

#ifdef HAVE_X86_64_POPCNTQ
#if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
#define TRY_POPCNT_X86_64 1
#endif
#endif

If we have to be cautious, we could just turn the #error on no CPUID
symbol into "return false".

0003

s/fast/sse42/:

Seems okay in this file, but this isn't the best name, either. Maybe a
comment to head off future "corrections", something like:
"Technically, POPCNT is not part of SSE 4.2, and is not even a vector
operation, but many compilers emit the popcnt instruction with
-msse4.2 anyway."

s/slow/generic/:

I'm ambivalent about this. The "slow" designation is flat-out wrong
since at least Power and aarch64 can emit a single instruction here
without prodding the compiler. On the other hand, "generic" seems
wrong too, since e.g. pg_popcount64_slow() has three configure symbols
and two compiler builtins. :-D

A possible future project would be to have a truly generic simple
fallback in pure C and put all the fancy stuff in the header for
architectures that have unconditional hardware support. It would make
more sense to revisit the name then.

--
John Naylor
Amazon Web Services



Re: refactor architecture-specific popcount code

От
Heikki Linnakangas
Дата:
On 15/01/2026 11:07, John Naylor wrote:
> 0003
> 
> s/fast/sse42/:
> 
> Seems okay in this file, but this isn't the best name, either. Maybe a
> comment to head off future "corrections", something like:
> "Technically, POPCNT is not part of SSE 4.2, and is not even a vector
> operation, but many compilers emit the popcnt instruction with
> -msse4.2 anyway."
> 
> s/slow/generic/:
> 
> I'm ambivalent about this. The "slow" designation is flat-out wrong
> since at least Power and aarch64 can emit a single instruction here
> without prodding the compiler. On the other hand, "generic" seems
> wrong too, since e.g. pg_popcount64_slow() has three configure symbols
> and two compiler builtins. :-D

"fallback", or "portable" ?

> A possible future project would be to have a truly generic simple
> fallback in pure C and put all the fancy stuff in the header for
> architectures that have unconditional hardware support. It would make
> more sense to revisit the name then.
Yeah, I noticed that on x86_64, pg_popcount_optimized is always a 
function pointer with runtime check, even if you use compiler flags to 
target a CPU where the special instructions are available unconditionally.

- Heikki




Re: refactor architecture-specific popcount code

От
Nathan Bossart
Дата:
On Thu, Jan 15, 2026 at 04:07:51PM +0700, John Naylor wrote:
> Thanks for taking on some technical debt!

Thanks for reviewing.

> --- a/src/port/pg_popcount_avx512.c
> +++ b/src/port/pg_popcount_x86_64.c
> 
> Can we get away with just "x86" for brevity? We generally don't target
> 32-bit CPUs for this kind of work, so there's no chance of confusion.

WFM.

> ```
> -#ifdef USE_AVX512_POPCNT_WITH_RUNTIME_CHECK
> +#include "port/pg_bitutils.h"
> +
> +#ifdef TRY_POPCNT_X86_64
> 
>  #if defined(HAVE__GET_CPUID) || defined(HAVE__GET_CPUID_COUNT)
>  #include <cpuid.h>
>  #endif
> ```
> 
> With the above in the x86 .c file, I wonder we can get rid of this
> stanza and the "try" symbol and gate only on HAVE_X86_64_POPCNTQ:
> 
> #ifdef HAVE_X86_64_POPCNTQ
> #if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
> #define TRY_POPCNT_X86_64 1
> #endif
> #endif
> 
> If we have to be cautious, we could just turn the #error on no CPUID
> symbol into "return false".

Yeah, the CPUID macro checks do seem overly cautious to me, especially
since we've just #error'd when the CPUID intrinsics are missing in
pg_crc32c_sse42_choose.c since 2015.  That seems to suggest that nobody is
trying to build Postgres with a compiler that knows about SSE4.2/POPCNT but
not CPUID.  For reference, CPUID was introduced in 1993.

I bet we could also convert this bit into a configuration-time check:

    #if defined(_MSC_VER) && defined(_M_AMD64)
    #define HAVE_X86_64_POPCNTQ
    #endif

> s/fast/sse42/:
> 
> Seems okay in this file, but this isn't the best name, either. Maybe a
> comment to head off future "corrections", something like:
> "Technically, POPCNT is not part of SSE 4.2, and is not even a vector
> operation, but many compilers emit the popcnt instruction with
> -msse4.2 anyway."

Makes sense.

-- 
nathan



Re: refactor architecture-specific popcount code

От
Nathan Bossart
Дата:
On Thu, Jan 15, 2026 at 11:42:14AM +0200, Heikki Linnakangas wrote:
> On 15/01/2026 11:07, John Naylor wrote:
>> s/slow/generic/:
>> 
>> I'm ambivalent about this. The "slow" designation is flat-out wrong
>> since at least Power and aarch64 can emit a single instruction here
>> without prodding the compiler. On the other hand, "generic" seems
>> wrong too, since e.g. pg_popcount64_slow() has three configure symbols
>> and two compiler builtins. :-D
> 
> "fallback", or "portable" ?

I've no strong opinions, but "portable" seems reasonable to me.

> Yeah, I noticed that on x86_64, pg_popcount_optimized is always a function
> pointer with runtime check, even if you use compiler flags to target a CPU
> where the special instructions are available unconditionally.

I wonder how close we are to being able to just require SSE4.2/POPCNT for
x86-64 builds.  I suppose there's always a chance that someone will try to
run Postgres 19 on a CPU from the aughts...  In any case, avoiding the
function pointer when possible seems like a good follow-up.

-- 
nathan



Re: refactor architecture-specific popcount code

От
Heikki Linnakangas
Дата:
On 15/01/2026 18:08, Nathan Bossart wrote:
> On Thu, Jan 15, 2026 at 11:42:14AM +0200, Heikki Linnakangas wrote:
>> Yeah, I noticed that on x86_64, pg_popcount_optimized is always a function
>> pointer with runtime check, even if you use compiler flags to target a CPU
>> where the special instructions are available unconditionally.
> 
> I wonder how close we are to being able to just require SSE4.2/POPCNT for
> x86-64 builds.  I suppose there's always a chance that someone will try to
> run Postgres 19 on a CPU from the aughts...  In any case, avoiding the
> function pointer when possible seems like a good follow-up.

It's not really our decision. Packagers choose which architecture to 
target and which compiler options to build with. We ought to just 
respect those choices.

- Heikki




Re: refactor architecture-specific popcount code

От
Nathan Bossart
Дата:
Here is a new patch set.  Notably, I've added a 0004 that does the
following:

* Removes TRY_POPCNT_X86_64.  We now assume that the required CPUID
intrinsics are available, as we have long done in some of the CRC-32C code.

* Moves the MSVC check for HAVE_X86_64_POPCNTQ to configuration-time.  This
way, we set it for all relevant platforms in one place.

* Moves the #defines for USE_SSE2 and USE_NEON to c.h so that they can be
used elsewhere without simd.h.  Consequently, we can remove POPCNT_AARCH64.

* Moves the #includes for pg_bitutils.h to below the system headers in
pg_popcount_{aarch64,x86}.c (since we no longer depend on macros defined in
pg_bitutils.h).

-- 
nathan

Вложения

Re: refactor architecture-specific popcount code

От
John Naylor
Дата:
On Fri, Jan 16, 2026 at 2:07 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Here is a new patch set. Notably, I've added a 0004 that does the
> following:
>
> * Removes TRY_POPCNT_X86_64.  We now assume that the required CPUID
> intrinsics are available, as we have long done in some of the CRC-32C code.
>
> * Moves the MSVC check for HAVE_X86_64_POPCNTQ to configuration-time.  This
> way, we set it for all relevant platforms in one place.

LGTM.

> * Moves the #defines for USE_SSE2 and USE_NEON to c.h so that they can be
> used elsewhere without simd.h.  Consequently, we can remove POPCNT_AARCH64.

Seems reasonable.

> * Moves the #includes for pg_bitutils.h to below the system headers in
> pg_popcount_{aarch64,x86}.c (since we no longer depend on macros defined in
> pg_bitutils.h).

Good.

v2-0003

+ * XXX Technically, POPCNT is not part of SSE4.2, and isn't even a vector
+ * operation, but in practice this is close enough, and "sse42" seems easier to
+ * follow than "popcnt" for these names.

Quibble -- XXX is a bit loud for a side note.

On Thu, Jan 15, 2026 at 11:08 PM Nathan Bossart
<nathandbossart@gmail.com> wrote:
>
> On Thu, Jan 15, 2026 at 11:42:14AM +0200, Heikki Linnakangas wrote:

> > "fallback", or "portable" ?
>
> I've no strong opinions, but "portable" seems reasonable to me.

I have a mild preference for "fallback" since it's a noun and we
already have comments in src/(include/)port referring to fallbacks. No
objection to "portable", though.

--
John Naylor
Amazon Web Services