Обсуждение: Re: vectorized CRC on ARM64

Поиск

Список

Период

Сортировка

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

12 января, 12:27:00

On Wed, May 14, 2025 I wrote:
>
> We did something similar for x86 for v18, and here is some progress
> towards Arm support.

Coming back to this, since there's been recent interest in Arm support.

v2 is a rebase, with a few changes.

- I simplified it by leaving out the inlining for "assume CRC" builds,
since I wanted to avoid alignment considerations if I can. I think
always indirecting through a pointer will have less risk of
regressions in a realistic setting than for x86 since Arm chips
typically have low latency for carryless multiplication instructions.
With just a bit of code we can still use the direct call for small
constant inputs, so I did that to avoid regressions under WAL insert
lock.

- One coding idiom for a vector literal in the generated code was
giving pgindent indigestion, I so rewrote it using Neon intrinsics and
verified it in Godbolt.

> 0002: Like 3c6e8c12389 and in fact uses the same program to generate
> the code, by specifying Neon instructions with the Arm "crypto"
> extension instead. There are some interesting differences from x86
> here as well:
> - The upstream implementation chose to use inline assembly instead of
> intrinsics for some reason. I initially thought that was a way to get
> broader compiler support, but it turns out you still need to pass the
> relevant flags to get the assembly to link.

To follow-up for curiosity's sake, [1] says that Apple chips can issue
PMULL + EOR as a single uop if they are next to each other in the
instruction stream.

> - I only have Meson support for now, since I used MacOS on CI to test.
> That OS and compiler combination apparently targets the CRC extension,
> but the PMULL instruction runtime check uses Linux-only headers, I
> believe, so previously I hacked the choose function to return true for
> testing. The choose function in 0002 is untested in this form.

This is still true, but now the CI hack lives in a separate
not-for-commit patch for clarity.

autoconf support is a WIP, and I will share that after I do some
testing on an Arm Linux instance.

[1] https://dougallj.github.io/applecpu/firestorm.html

--
John Naylor
Amazon Web Services

Вложения

Re: vectorized CRC on ARM64

От

Haibo Yan

Дата:

18 марта, 06:34:40

Hi John

Thank yo for working on this. I had one question about the mixed use of intrinsics and inline asm here.

On Jan 12, 2026, at 1:27 AM, John Naylor <johncnaylorls@gmail.com> wrote:

On Wed, May 14, 2025 I wrote:

We did something similar for x86 for v18, and here is some progress
towards Arm support.

Coming back to this, since there's been recent interest in Arm support.

v2 is a rebase, with a few changes.

- I simplified it by leaving out the inlining for "assume CRC" builds,
since I wanted to avoid alignment considerations if I can. I think
always indirecting through a pointer will have less risk of
regressions in a realistic setting than for x86 since Arm chips
typically have low latency for carryless multiplication instructions.
With just a bit of code we can still use the direct call for small
constant inputs, so I did that to avoid regressions under WAL insert
lock.

- One coding idiom for a vector literal in the generated code was
giving pgindent indigestion, I so rewrote it using Neon intrinsics and
verified it in Godbolt.

0002: Like 3c6e8c12389 and in fact uses the same program to generate
the code, by specifying Neon instructions with the Arm "crypto"
extension instead. There are some interesting differences from x86
here as well:
- The upstream implementation chose to use inline assembly instead of
intrinsics for some reason. I initially thought that was a way to get
broader compiler support, but it turns out you still need to pass the
relevant flags to get the assembly to link.

Since the implementation already uses NEON intrinsics such as vld1q_u64, I was wondering why the pmull / pmull2 + eor helpers still need to be inline asm rather than intrinsics.

Is that due to compiler/toolchain support, or because the intrinsic-based version produced noticeably worse code?

To follow-up for curiosity's sake, [1] says that Apple chips can issue
PMULL + EOR as a single uop if they are next to each other in the
instruction stream.

- I only have Meson support for now, since I used MacOS on CI to test.
That OS and compiler combination apparently targets the CRC extension,
but the PMULL instruction runtime check uses Linux-only headers, I
believe, so previously I hacked the choose function to return true for
testing. The choose function in 0002 is untested in this form.

This is still true, but now the CI hack lives in a separate
not-for-commit patch for clarity.

autoconf support is a WIP, and I will share that after I do some
testing on an Arm Linux instance.

[1] https://dougallj.github.io/applecpu/firestorm.html

--
John Naylor
Amazon Web Services
<v2-0001-Compute-CRC32C-on-ARM-using-the-Crypto-Extension-.patch><v2-0002-Force-testing-on-MacOS-CI-XXX-not-for-commit.patch>

Regards

Haibo

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

18 марта, 09:52:37

On Wed, Mar 18, 2026 at 10:34 AM Haibo Yan <tristan.yim@gmail.com> wrote:
>
> Hi John
>
> Thank yo for working on this. I had one question about the mixed use of intrinsics and inline asm here.

> Since the implementation already uses NEON intrinsics such as vld1q_u64, I was wondering why the pmull / pmull2 + eor
helpersstill need to be inline asm rather than intrinsics. 
>
> Is that due to compiler/toolchain support, or because the intrinsic-based version produced noticeably worse code?

I answered that in the email you replied to, re-quoted here:

> To follow-up for curiosity's sake, [1] says that Apple chips can issue
> PMULL + EOR as a single uop if they are next to each other in the
> instruction stream.
> [1] https://dougallj.github.io/applecpu/firestorm.html

I don't know if that's relevant for current server hardware, so it
could be pointless. I'm personally not a fan of inline assembly, but I
also didn't yet want to put in the effort to alter generated code. I
don't think it would be very hard to do, however.

--
John Naylor
Amazon Web Services

Re: vectorized CRC on ARM64

От

Haibo Yan

Дата:

18 марта, 20:16:53

On Tue, Mar 17, 2026 at 11:52 PM John Naylor <johncnaylorls@gmail.com> wrote:

On Wed, Mar 18, 2026 at 10:34 AM Haibo Yan <tristan.yim@gmail.com> wrote:
>
> Hi John
>
> Thank yo for working on this. I had one question about the mixed use of intrinsics and inline asm here.

> Since the implementation already uses NEON intrinsics such as vld1q_u64, I was wondering why the pmull / pmull2 + eor helpers still need to be inline asm rather than intrinsics.
>
> Is that due to compiler/toolchain support, or because the intrinsic-based version produced noticeably worse code?

I answered that in the email you replied to, re-quoted here:

> To follow-up for curiosity's sake, [1] says that Apple chips can issue
> PMULL + EOR as a single uop if they are next to each other in the
> instruction stream.
> [1] https://dougallj.github.io/applecpu/firestorm.html

I don't know if that's relevant for current server hardware, so it
could be pointless. I'm personally not a fan of inline assembly, but I
also didn't yet want to put in the effort to alter generated code. I
don't think it would be very hard to do, however.

Thanks, that makes sense as an explanation for why the inline asm is there today. But it also sounds like this is more of a temporary implementation choice than a conclusion that intrinsics are unsuitable. If so, I wonder whether it would be better to treat an intrinsics-based version as the preferred end state unless benchmarks show a clear regression.

Regards

Haibo

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

22 марта, 08:26:24

On Thu, Mar 19, 2026 at 12:17 AM Haibo Yan <tristan.yim@gmail.com> wrote:
>
> On Tue, Mar 17, 2026 at 11:52 PM John Naylor <johncnaylorls@gmail.com> wrote:

>> I don't know if that's relevant for current server hardware, so it
>> could be pointless. I'm personally not a fan of inline assembly, but I
>> also didn't yet want to put in the effort to alter generated code. I
>> don't think it would be very hard to do, however.
>
>
> Thanks, that makes sense as an explanation for why the inline asm is there today. But it also sounds like this is
moreof a temporary implementation choice than a conclusion that intrinsics are unsuitable. 

I can see how my words imply that, but after a moment's thought I
still don't want to put in that effort without a good reason. For
starters, what I said above about "not very hard" may be wishful
thinking.

> If so, I wonder whether it would be better to treat an intrinsics-based version as the preferred end state unless
benchmarksshow a clear regression. 

To meet your criterion, we'd not only have to rewrite it correctly,
we'd have to test on multiple vendors of non-Apple hardware and
multiple compiler vendors/versions (at least where the binary output
is different) to prove we haven't caused a regression. I wouldn't
recommend anyone to accept that challenge as stated, since the
risk/reward ratio is just not favorable. Especially considering we're
2 1/2 weeks away from feature freeze.

--
John Naylor
Amazon Web Services

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

31 марта, 14:19:49

I wrote:
> autoconf support is a WIP, and I will share that after I do some
> testing on an Arm Linux instance.

I've only checked paths with objdump and debugging printouts (no perf
testing), but this seems to work in v3. My main concern now is whether
it's a maintenance hazard to overwrite CFLAGS_CRC in a separate check.

In master, we can have one of:

CFLAGS_CRC=""
CFLAGS_CRC="-march=armv8-a+crc+simd"
CFLAGS_CRC="-march=armv8-a+crc"

...and then based on that we set either USE_ARMV8_CRC32C or
USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK, and set PG_CRC32C_OBJS.

But below that, v3 runs a new test for pmull instructions with the
flag "-march=armv8-a+crc+simd+crypto" and if it links, it will reset
CFLAGS_CRC to that set of flags. That doesn't seem like the right
thing to do, but I don't see a good alternative. I suppose I could
sidestep that with function attributes, but that's not as well
supported. Another idea would be to turn the relevant line here

if test x"$Ac_cachevar" = x"yes"; then
  CFLAGS_CRC="$1"
  pgac_arm_pmull_intrinsics=yes
fi

...into CFLAGS_CRC="CFLAGS_CRC$1", where in this case $1 is just
"+crypto". That seems even more fragile, though.

--
John Naylor
Amazon Web Services

Вложения

v3-0001-Compute-CRC32C-on-ARM-using-the-Crypto-Extension-.patch

Re: vectorized CRC on ARM64

От

Nathan Bossart

Дата:

31 марта, 21:21:26

On Tue, Mar 31, 2026 at 06:19:49PM +0700, John Naylor wrote:
> I've only checked paths with objdump and debugging printouts (no perf
> testing), but this seems to work in v3. My main concern now is whether
> it's a maintenance hazard to overwrite CFLAGS_CRC in a separate check.
> 
> In master, we can have one of:
> 
> CFLAGS_CRC=""
> CFLAGS_CRC="-march=armv8-a+crc+simd"
> CFLAGS_CRC="-march=armv8-a+crc"
> 
> ...and then based on that we set either USE_ARMV8_CRC32C or
> USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK, and set PG_CRC32C_OBJS.
> 
> But below that, v3 runs a new test for pmull instructions with the
> flag "-march=armv8-a+crc+simd+crypto" and if it links, it will reset
> CFLAGS_CRC to that set of flags. That doesn't seem like the right
> thing to do, but I don't see a good alternative. I suppose I could
> sidestep that with function attributes, but that's not as well
> supported. Another idea would be to turn the relevant line here
> 
> if test x"$Ac_cachevar" = x"yes"; then
>   CFLAGS_CRC="$1"
>   pgac_arm_pmull_intrinsics=yes
> fi
> 
> ...into CFLAGS_CRC="CFLAGS_CRC$1", where in this case $1 is just
> "+crypto". That seems even more fragile, though.

Appending +crypto to the current CFLAGS_CRC seems like the right thing to
do to me.  I'm trying to understand why you are concerned about fragility.
I suppose someone could add something else between the existing checks and
the one you're adding that appends a different option or something, but
besides that, I'd think merely appending +crypto to the -march value
wouldn't invalidate previous tests or anything like that.

-- 
nathan

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

01 апреля, 14:48:10

On Wed, Apr 1, 2026 at 1:21 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Mar 31, 2026 at 06:19:49PM +0700, John Naylor wrote:

> > In master, we can have one of:
> >
> > CFLAGS_CRC=""
> > CFLAGS_CRC="-march=armv8-a+crc+simd"
> > CFLAGS_CRC="-march=armv8-a+crc"
> >
> > ...and then based on that we set either USE_ARMV8_CRC32C or
> > USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK, and set PG_CRC32C_OBJS.
> >
> > But below that, v3 runs a new test for pmull instructions with the
> > flag "-march=armv8-a+crc+simd+crypto" and if it links, it will reset
> > CFLAGS_CRC to that set of flags. That doesn't seem like the right
> > thing to do, but I don't see a good alternative. I suppose I could
> > sidestep that with function attributes, but that's not as well
> > supported. Another idea would be to turn the relevant line here
> >
> > if test x"$Ac_cachevar" = x"yes"; then
> >   CFLAGS_CRC="$1"
> >   pgac_arm_pmull_intrinsics=yes
> > fi
> >
> > ...into CFLAGS_CRC="CFLAGS_CRC$1", where in this case $1 is just
> > "+crypto". That seems even more fragile, though.
>
> Appending +crypto to the current CFLAGS_CRC seems like the right thing to
> do to me.  I'm trying to understand why you are concerned about fragility.
> I suppose someone could add something else between the existing checks and
> the one you're adding that appends a different option or something, but
> besides that, I'd think merely appending +crypto to the -march value
> wouldn't invalidate previous tests or anything like that.

Maybe it's a low risk, but this stuff is awfully difficult to debug
when it goes wrong.

I don't think appending +crypto would work everywhere IIUC -- if the
packager set +crc in the CFLAGS, then CFLAGS_CRC="" so there is no
existing -march to put it on, and the PMULL check would fail. Maybe
that's okay if we call that out in the release notes, since that's
probably rare. Then we could check both with and without +crypto
tacked on.

I tried appending the new -march value, and that works since last one
wins. But that might have the same problem as above if the packager
put something special in CFLAGS for -march, that would get wiped out
by our new one.

--
John Naylor
Amazon Web Services

Re: vectorized CRC on ARM64

От

Nathan Bossart

Дата:

01 апреля, 18:24:40

On Wed, Apr 01, 2026 at 06:48:10PM +0700, John Naylor wrote:
> I don't think appending +crypto would work everywhere IIUC -- if the
> packager set +crc in the CFLAGS, then CFLAGS_CRC="" so there is no
> existing -march to put it on, and the PMULL check would fail. Maybe
> that's okay if we call that out in the release notes, since that's
> probably rare. Then we could check both with and without +crypto
> tacked on.
> 
> I tried appending the new -march value, and that works since last one
> wins. But that might have the same problem as above if the packager
> put something special in CFLAGS for -march, that would get wiped out
> by our new one.

The other idea I had was to always add +crypto in the existing tests
(unless we're not setting CFLAGS_CRC), and then to just do the PMULL check
with whatever CFLAGS_CRC is set to, not bothering to try different values.
That doesn't fix the problem you mentioned in the quoted text, but maybe
it's a little sturdier.

... or maybe we should just use __attribute__((target("..."))) for the
PMULL stuff.  That wouldn't work well for clang versions before 16, but it
at least wouldn't regress anything.  They just wouldn't get PMULL support.

-- 
nathan

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

02 апреля, 16:16:27

On Wed, Apr 1, 2026 at 10:24 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> ... or maybe we should just use __attribute__((target("..."))) for the
> PMULL stuff.  That wouldn't work well for clang versions before 16, but it
> at least wouldn't regress anything.  They just wouldn't get PMULL support.

Okay, that works as far back as gcc 6.3, so v4 does it that way. The
attribute doesn't seem to be necessary on the inline helpers for
production builds, but they're needed to work with -O0.

Also
- removed the term 'intrinsics' from config variables, since we're not
checking those
- removed the crc intrinsic from the pmull tests
- fixed a failure to restore CFLAGS
- fixed it to work with +crc CFLAGS

For some reason, my CI builds with MacOS are failing on v3 (v2 skipped
the runtime check to get some exposure on CI) with the following, and
running CI from my Github account fails as well, so it's not a
temporary glitch. Adding a __linux__ guard to the runtime check didn't
help, so not yet sure what to make of it.

3/382 setup - postgresql:initdb_cache  TIMEOUT 300.51s   killed by
signal 15 SIGTERM

--
John Naylor
Amazon Web Services

Вложения

v4-0001-Compute-CRC32C-on-ARM-using-the-Crypto-Extension-.patch

Re: vectorized CRC on ARM64

От

Nathan Bossart

Дата:

02 апреля, 18:53:24

On Thu, Apr 02, 2026 at 08:16:27PM +0700, John Naylor wrote:
> For some reason, my CI builds with MacOS are failing on v3 (v2 skipped
> the runtime check to get some exposure on CI) with the following, and
> running CI from my Github account fails as well, so it's not a
> temporary glitch. Adding a __linux__ guard to the runtime check didn't
> help, so not yet sure what to make of it.

I think the new pg_comp_crc32_choose() is infinitely recursing on macOS
because USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK is not defined but
pg_crc32c_armv8_available() returns false.  If I trace through that
function, I see that it's going straight to the

    #else
        return false;
    #endif

at the end.  And sure enough, both HAVE_ELF_AUX_INFO and HAVE_GETAUXVAL
aren't defined in pg_config.h.  I think we might need to use sysctlbyname()
to determine PMULL support on macOS, but at this stage of the development
cycle, I would probably lean towards just compiling in the sb8
implementation.

-- 
nathan

Re: vectorized CRC on ARM64

От

Nathan Bossart

Дата:

02 апреля, 19:17:00

On Thu, Apr 02, 2026 at 10:53:24AM -0500, Nathan Bossart wrote:
> I think the new pg_comp_crc32_choose() is infinitely recursing on macOS
> because USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK is not defined but
> pg_crc32c_armv8_available() returns false.  If I trace through that
> function, I see that it's going straight to the
> 
>     #else
>         return false;
>     #endif
> 
> at the end.  And sure enough, both HAVE_ELF_AUX_INFO and HAVE_GETAUXVAL
> aren't defined in pg_config.h.  I think we might need to use sysctlbyname()
> to determine PMULL support on macOS, but at this stage of the development
> cycle, I would probably lean towards just compiling in the sb8
> implementation.

Hm.  On second thought, that probably regresses macOS builds because it was
presumably using the armv8 path without runtime checks before...

-- 
nathan

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

03 апреля, 11:22:59

On Thu, Apr 2, 2026 at 11:17 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Thu, Apr 02, 2026 at 10:53:24AM -0500, Nathan Bossart wrote:
> > I think the new pg_comp_crc32_choose() is infinitely recursing on macOS
> > because USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK is not defined but
> > pg_crc32c_armv8_available() returns false.  If I trace through that
> > function, I see that it's going straight to the
> >
> >       #else
> >               return false;
> >       #endif
> >
> > at the end.  And sure enough, both HAVE_ELF_AUX_INFO and HAVE_GETAUXVAL

Ah of course.

> > aren't defined in pg_config.h.  I think we might need to use sysctlbyname()
> > to determine PMULL support on macOS, but at this stage of the development
> > cycle, I would probably lean towards just compiling in the sb8
> > implementation.
>
> Hm.  On second thought, that probably regresses macOS builds because it was
> presumably using the armv8 path without runtime checks before...

I went with the following for v5, and it passes MacOS on my Github CI:

+  /* set fallbacks */
+#ifdef USE_ARMV8_CRC32C
+  /* On e.g. MacOS, our runtime feature detection doesn't work */
+  pg_comp_crc32c = pg_comp_crc32c_armv8;
+#else
+  pg_comp_crc32c = pg_comp_crc32c_sb8;
+#endif
+ [...crc and pmull checks]

That should keep scalar hardware support working, but now it'll only
use direct calls for constant inputs.

I also did some benchmarking on an ARM Neoverse N1 / gcc 8.3
(attached). There the vector loop still works well all the way down to
the minimum input size of 64 bytes, and on long inputs it's almost
twice as fast as scalar. For reproduceability, I slightly modified the
benchmark we used last year, to make sure the input is aligned
(attached but not for CI). In the end, I want to add a length check so
that inputs smaller than 80 bytes go straight to the scalar path.
Above 80, after alignment adjustments in the preamble, that still
guarantees at least one loop iteration in the vector path.

--
John Naylor
Amazon Web Services

Вложения

Re: vectorized CRC on ARM64

От

Nathan Bossart

Дата:

03 апреля, 16:54:36

On Fri, Apr 03, 2026 at 03:22:59PM +0700, John Naylor wrote:
> I went with the following for v5, and it passes MacOS on my Github CI:
> 
> +  /* set fallbacks */
> +#ifdef USE_ARMV8_CRC32C
> +  /* On e.g. MacOS, our runtime feature detection doesn't work */
> +  pg_comp_crc32c = pg_comp_crc32c_armv8;
> +#else
> +  pg_comp_crc32c = pg_comp_crc32c_sb8;
> +#endif
> + [...crc and pmull checks]
> 
> That should keep scalar hardware support working, but now it'll only
> use direct calls for constant inputs.

v5 LGTM

-- 
nathan

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

04 апреля, 16:52:34

On Fri, Apr 3, 2026 at 8:54 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> v5 LGTM

Thanks for looking! Pushed with a minor comment tweak and removal of
the CFLAGS save-and-restore dance since we don't need it anymore.
Let's see what the buildfarm thinks.

--
John Naylor
Amazon Web Services

Re: vectorized CRC on ARM64

От

Nathan Bossart

Дата:

04 апреля, 17:36:58

On Sat, Apr 04, 2026 at 08:52:34PM +0700, John Naylor wrote:
> Thanks for looking! Pushed with a minor comment tweak and removal of
> the CFLAGS save-and-restore dance since we don't need it anymore.
> Let's see what the buildfarm thinks.

Ha, I think koel is going to complain about your comment that talks about
pgindent...

diff --git a/src/port/pg_crc32c_armv8.c b/src/port/pg_crc32c_armv8.c
index b404e6c373e..5fa57fb4927 100644
--- a/src/port/pg_crc32c_armv8.c
+++ b/src/port/pg_crc32c_armv8.c
@@ -162,8 +162,8 @@ pg_comp_crc32c_pmull(pg_crc32c crc, const void *data, size_t len)
         }

         /*
-         * pgindent complained of unmatched parens, so the following has
-         * been re-written with intrinsics:
+         * pgindent complained of unmatched parens, so the following has been
+         * re-written with intrinsics:
          *
          * x0 = veorq_u64((uint64x2_t) {crc0, 0}, x0);
          */

-- 
nathan

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

04 апреля, 17:53:12

On Sat, Apr 4, 2026 at 9:37 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Sat, Apr 04, 2026 at 08:52:34PM +0700, John Naylor wrote:
> > Thanks for looking! Pushed with a minor comment tweak and removal of
> > the CFLAGS save-and-restore dance since we don't need it anymore.
> > Let's see what the buildfarm thinks.
>
> Ha, I think koel is going to complain about your comment that talks about
> pgindent...

:-)

--
John Naylor
Amazon Web Services

Re: vectorized CRC on ARM64

От

Tomas Vondra

Дата:

04 апреля, 21:36:53

Hi,

I happened to do some testing on rpi5 with a 32-bit user space, and when
building with this commit I get these warnings from clang:

config.status: linking src/makefiles/Makefile.linux to src/Makefile.port
pg_crc32c_armv8_choose.c:112:1: warning: unused function
'pg_pmull_available' [-Wunused-function]
  112 | pg_pmull_available(void)
      | ^~~~~~~~~~~~~~~~~~
1 warning generated.
pg_crc32c_armv8_choose.c:112:1: warning: unused function
'pg_pmull_available' [-Wunused-function]
  112 | pg_pmull_available(void)
      | ^~~~~~~~~~~~~~~~~~
1 warning generated.
pg_crc32c_armv8_choose.c:112:1: warning: unused function
'pg_pmull_available' [-Wunused-function]
  112 | pg_pmull_available(void)
      | ^~~~~~~~~~~~~~~~~~
1 warning generated.

I suppose the pg_pmull_available() needs to be if-defed with
USE_PMULL_CRC32C_WITH_RUNTIME_CHECK. That removes the warning for me, at
least.


regards

-- 
Tomas Vondra

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

05 апреля, 04:56:05

On Sun, Apr 5, 2026 at 1:36 AM Tomas Vondra <tomas@vondra.me> wrote:
> I happened to do some testing on rpi5 with a 32-bit user space, and when
> building with this commit I get these warnings from clang:

> pg_crc32c_armv8_choose.c:112:1: warning: unused function
> 'pg_pmull_available' [-Wunused-function]
>   112 | pg_pmull_available(void)
>       | ^~~~~~~~~~~~~~~~~~
> 1 warning generated.
>
> I suppose the pg_pmull_available() needs to be if-defed with
> USE_PMULL_CRC32C_WITH_RUNTIME_CHECK. That removes the warning for me, at
> least.

Hmm, it looks like gcc is different in that it won't warn on unused
static inlines, only unused statics. The fix is right, so done that
way, thanks for the report!

--
John Naylor
Amazon Web Services

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

06 апреля, 05:38:45

On Sat, Apr 4, 2026 at 8:52 PM John Naylor <johncnaylorls@gmail.com> wrote:
> Let's see what the buildfarm thinks.

hoatzin is failing with:

[1481/2135] "link" @src/backend/postgres.exe.rsp
FAILED: src/backend/postgres.exe src/backend/postgres.pdb
"link" @src/backend/postgres.exe.rsp
   Creating library src\\backend\\postgres.lib
utils_hash_pg_crc.c.obj : error LNK2001: unresolved external symbol
__builtin_constant_p

That should have been pg_integer_constant_p to work on MSVC since
length in bytes is an integer. That symbol is new in the PG19 cycle
and I didn't know about it until now. I've pushed a fix using that,
and appropriate guards. The earlier use of __builtin_constant_p in the
x86 case didn't happen to cause any failures on MSVC for some reason.
That path needs a guard as well, but since that's backpatchable I
handle it later. It's unlikely anyone passing special CFLAGS is using
a non-mainstream compiler, but that's still a potential trap waiting
for someone.

--
John Naylor
Amazon Web Services

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

07 апреля, 13:02:04

I wrote:
> In the end, I want to add a length check so
> that inputs smaller than 80 bytes go straight to the scalar path.
> Above 80, after alignment adjustments in the preamble, that still
> guarantees at least one loop iteration in the vector path.

Attached is how that would look. The idea is that small inputs will
encounter fewer branches. It'd be tricky to prove a difference with a
benchmark, and I see this as just making the small-input path more
similar to PG 18, as a risk-avoidance maneuver.

-- 
John Naylor
Amazon Web Services

Вложения

add-len-check.patch

Re: vectorized CRC on ARM64

От

Nathan Bossart

Дата:

07 апреля, 16:55:11

On Tue, Apr 07, 2026 at 05:02:04PM +0700, John Naylor wrote:
> Attached is how that would look. The idea is that small inputs will
> encounter fewer branches. It'd be tricky to prove a difference with a
> benchmark, and I see this as just making the small-input path more
> similar to PG 18, as a risk-avoidance maneuver.

Seems fine to me.  I believe we do similar things elsewhere.

-- 
nathan

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

08 апреля, 10:22:27

On Tue, Apr 7, 2026 at 8:55 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Apr 07, 2026 at 05:02:04PM +0700, John Naylor wrote:
> > Attached is how that would look. The idea is that small inputs will
> > encounter fewer branches. It'd be tricky to prove a difference with a
> > benchmark, and I see this as just making the small-input path more
> > similar to PG 18, as a risk-avoidance maneuver.
>
> Seems fine to me.  I believe we do similar things elsewhere.

Pushed, thanks for looking!

--
John Naylor
Amazon Web Services

Re: vectorized CRC on ARM64

От

John Naylor

Дата:

05 мая, 14:56:37

On Mon, Apr 6, 2026 at 9:38 AM John Naylor <johncnaylorls@gmail.com> wrote:
> and appropriate guards. The earlier use of __builtin_constant_p in the
> x86 case didn't happen to cause any failures on MSVC for some reason.
> That path needs a guard as well, but since that's backpatchable I
> handle it later. It's unlikely anyone passing special CFLAGS is using
> a non-mainstream compiler, but that's still a potential trap waiting
> for someone.

Done in 648818ba3.

--
John Naylor
Amazon Web Services

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: vectorized CRC on ARM64

Вложения

Вложения

Вложения

Вложения

Вложения