Обсуждение: Add RISC-V Zbb popcount optimization

Поиск
Список
Период
Сортировка

Add RISC-V Zbb popcount optimization

От
"Greg Burd"
Дата:
Hello.

Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to
'rv64gc_zbb'flag when appropriate.
 

best.

-greg
Вложения

Re: Add RISC-V Zbb popcount optimization

От
Andres Freund
Дата:
Hi,

On 2026-03-21 12:54:10 -0400, Greg Burd wrote:
> Attached is a small patch that enables hardware popcount on RISC-V when
> available and also sets the arch flag to 'rv64gc_zbb' flag when appropriate.

Maybe I'm missing something: How is the latter approach safe without a runtime
check?  Just because it compiled on the build machine with -march=rv64gc_zbb
added doesn't mean it runs on either the build machine or any other machine?

If this worked, the compiler could just always specify -march=rv64gc_zbb, no?

Greetings,

Andres Freund



Re: Add RISC-V Zbb popcount optimization

От
"Greg Burd"
Дата:
On Sat, Mar 21, 2026, at 2:36 PM, Andres Freund wrote:
> Hi,
>
> On 2026-03-21 12:54:10 -0400, Greg Burd wrote:
>> Attached is a small patch that enables hardware popcount on RISC-V when
>> available and also sets the arch flag to 'rv64gc_zbb' flag when appropriate.
>
> Maybe I'm missing something: How is the latter approach safe without a runtime
> check?  Just because it compiled on the build machine with -march=rv64gc_zbb
> added doesn't mean it runs on either the build machine or any other machine?
>
> If this worked, the compiler could just always specify -march=rv64gc_zbb, no?

Hey Andres, thanks for taking a look.

You are correct, mea culpa for not catching this before I sent it out.  If the second test succeeds the patch will add
`-march=rv64gc_zbb`to `CFLAGS` globally, which means without the runtime check the binary will crash with SIGILL on
systemswithout Zbb.
 

I'll rework... :)

> Greetings,
>
> Andres Freund

best.

-greg



Re: Add RISC-V Zbb popcount optimization

От
John Naylor
Дата:
On Sat, Mar 21, 2026 at 11:56 PM Greg Burd <greg@burd.me> wrote:
> Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to
'rv64gc_zbb'flag when appropriate. 

I have to ask what the point is -- isn't that like putting a 4-inch
exhaust tip on a go-kart?

--
John Naylor
Amazon Web Services



Re: Add RISC-V Zbb popcount optimization

От
"Greg Burd"
Дата:
On Sat, Mar 21, 2026, at 10:14 PM, John Naylor wrote:
> On Sat, Mar 21, 2026 at 11:56 PM Greg Burd <greg@burd.me> wrote:
>> Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to
'rv64gc_zbb'flag when appropriate. 
>
> I have to ask what the point is -- isn't that like putting a 4-inch
> exhaust tip on a go-kart?

Hey John,

The point is to go fast, right? And to look cool (with awesome 4-inch exhaust tips) if possible! ;-P

gburd@rv:~/ws/postgres$ gcc -O2 -o popcnt-wo-zbb riscv-popcnt.c
gburd@rv:~/ws/postgres$ gcc -O2 -march=rv64gc_zbb -o popcnt-zbb riscv-popcnt.c
gburd@rv:~/ws/postgres$ ./popcnt-wo-zbb && ./popcnt-zbb
sw popcount:    0.196 sec  (    510.08 MB/s)
hw popcount:    0.293 sec  (    341.48 MB/s)

diff: 0.67x
match: 406261900 bits counted
sw popcount:    0.182 sec  (    548.86 MB/s)
hw popcount:    0.044 sec  (   2279.89 MB/s)

diff: 4.15x
match: 406261900 bits counted

But my first email/patch was incomplete/rushed, I should have followed the pattern used for similar ARM-specific logic.
v2 attached along with a test program. 

> --
> John Naylor
> Amazon Web Services

best.

-greg

Вложения

Re: Add RISC-V Zbb popcount optimization

От
Andres Freund
Дата:
Hi,

On 2026-03-22 13:43:43 -0400, Greg Burd wrote:
> On Sat, Mar 21, 2026, at 10:14 PM, John Naylor wrote:
> > On Sat, Mar 21, 2026 at 11:56 PM Greg Burd <greg@burd.me> wrote:
> >> Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to
'rv64gc_zbb'flag when appropriate.
 
> >
> > I have to ask what the point is -- isn't that like putting a 4-inch
> > exhaust tip on a go-kart?
> The point is to go fast, right? And to look cool (with awesome 4-inch exhaust tips) if possible! ;-P
>
> gburd@rv:~/ws/postgres$ gcc -O2 -o popcnt-wo-zbb riscv-popcnt.c
> gburd@rv:~/ws/postgres$ gcc -O2 -march=rv64gc_zbb -o popcnt-zbb riscv-popcnt.c
> gburd@rv:~/ws/postgres$ ./popcnt-wo-zbb && ./popcnt-zbb
> sw popcount:    0.196 sec  (    510.08 MB/s)
> hw popcount:    0.293 sec  (    341.48 MB/s)
>
> diff: 0.67x
> match: 406261900 bits counted
> sw popcount:    0.182 sec  (    548.86 MB/s)
> hw popcount:    0.044 sec  (   2279.89 MB/s)
>
> diff: 4.15x
> match: 406261900 bits counted
>
> But my first email/patch was incomplete/rushed, I should have followed the pattern used for similar ARM-specific
logic. v2 attached along with a test program.
 

Sure, but what PG workloads are actually affected to a meaningful degree by
this? And are those, on riscv, actually most bottlenecked by popcount
performance?

I'm also pretty doubtful all the effort to e.g. add AVX 512 popcount was spent
all that effectively - hard to believe there's any real world workloads where
that gain is worth the squeeze. At least for aarch64 and x86-64 there's real
world use of those platforms, making niche-y perf improvements somewhat
worthwhile. Whereas there's afaict not yet a whole lot of riscv production
adoption.

Once you add CPU dispatch to the cost it gets a heck of a lot less clearly
worthwhile. You need heuristics to decide when the dispatch cost is worth it
and even then it's going to slow down your non-worthwhile case somewhat.

That's one of the things that make's riscv's decision to put so many crucial
features into optional extensions so annoying for people that write
non-embedded software.

- Andres



Re: Add RISC-V Zbb popcount optimization

От
"Greg Burd"
Дата:
On Sun, Mar 22, 2026, at 2:01 PM, Andres Freund wrote:
> Hi,
>
> On 2026-03-22 13:43:43 -0400, Greg Burd wrote:
>> On Sat, Mar 21, 2026, at 10:14 PM, John Naylor wrote:
>> > On Sat, Mar 21, 2026 at 11:56 PM Greg Burd <greg@burd.me> wrote:
>> >> Attached is a small patch that enables hardware popcount on RISC-V when available and also sets the arch flag to
'rv64gc_zbb'flag when appropriate. 
>> >
>> > I have to ask what the point is -- isn't that like putting a 4-inch
>> > exhaust tip on a go-kart?
>> The point is to go fast, right? And to look cool (with awesome 4-inch exhaust tips) if possible! ;-P
>>
>> gburd@rv:~/ws/postgres$ gcc -O2 -o popcnt-wo-zbb riscv-popcnt.c
>> gburd@rv:~/ws/postgres$ gcc -O2 -march=rv64gc_zbb -o popcnt-zbb riscv-popcnt.c
>> gburd@rv:~/ws/postgres$ ./popcnt-wo-zbb && ./popcnt-zbb
>> sw popcount:    0.196 sec  (    510.08 MB/s)
>> hw popcount:    0.293 sec  (    341.48 MB/s)
>>
>> diff: 0.67x
>> match: 406261900 bits counted
>> sw popcount:    0.182 sec  (    548.86 MB/s)
>> hw popcount:    0.044 sec  (   2279.89 MB/s)
>>
>> diff: 4.15x
>> match: 406261900 bits counted
>>
>> But my first email/patch was incomplete/rushed, I should have followed the pattern used for similar ARM-specific
logic. v2 attached along with a test program. 
>
> Sure, but what PG workloads are actually affected to a meaningful degree by
> this? And are those, on riscv, actually most bottlenecked by popcount
> performance?
>
> I'm also pretty doubtful all the effort to e.g. add AVX 512 popcount was spent
> all that effectively - hard to believe there's any real world workloads where
> that gain is worth the squeeze. At least for aarch64 and x86-64 there's real
> world use of those platforms, making niche-y perf improvements somewhat
> worthwhile. Whereas there's afaict not yet a whole lot of riscv production
> adoption.
>
> Once you add CPU dispatch to the cost it gets a heck of a lot less clearly
> worthwhile. You need heuristics to decide when the dispatch cost is worth it
> and even then it's going to slow down your non-worthwhile case somewhat.
>
> That's one of the things that make's riscv's decision to put so many crucial
> features into optional extensions so annoying for people that write
> non-embedded software.

Hey Andres,

All fair points.  RISC-V is annoying, the idea of CPU extensions is just one reason.  To be honest, I'm not sure it is
worthit either!  That said, this patch isn't a huge "squeeze" (or unprecedented) and it does provide some "juice" (4x
faster). It has the shape of the ARM equivalent, so to me it fell into that category of things we'd commit. 

But I get it, as I said to start - all fair points.

> - Andres

best.

-greg



Re: Add RISC-V Zbb popcount optimization

От
Nathan Bossart
Дата:
On Sun, Mar 22, 2026 at 02:01:50PM -0400, Andres Freund wrote:
> I'm also pretty doubtful all the effort to e.g. add AVX 512 popcount was spent
> all that effectively - hard to believe there's any real world workloads where
> that gain is worth the squeeze. At least for aarch64 and x86-64 there's real
> world use of those platforms, making niche-y perf improvements somewhat
> worthwhile. Whereas there's afaict not yet a whole lot of riscv production
> adoption.

That work was partially motivated by vector stuff that used popcount
functions pretty heavily, but yeah, the complexity compared to the gains is
the main reason I've been pushing to just use simd.h elsewhere (i.e., SSE2
and Neon).  I'd still consider using AVX-512, etc. for things if the impact
on real-world workloads was huge, though. 

-- 
nathan



Re: Add RISC-V Zbb popcount optimization

От
"Greg Burd"
Дата:
On Mon, Mar 23, 2026, at 11:09 AM, Nathan Bossart wrote:
> On Sun, Mar 22, 2026 at 02:01:50PM -0400, Andres Freund wrote:
>> I'm also pretty doubtful all the effort to e.g. add AVX 512 popcount was spent
>> all that effectively - hard to believe there's any real world workloads where
>> that gain is worth the squeeze. At least for aarch64 and x86-64 there's real
>> world use of those platforms, making niche-y perf improvements somewhat
>> worthwhile. Whereas there's afaict not yet a whole lot of riscv production
>> adoption.

Hey Nathan,

> That work was partially motivated by vector stuff that used popcount
> functions pretty heavily, but yeah, the complexity compared to the gains is
> the main reason I've been pushing to just use simd.h elsewhere (i.e., SSE2
> and Neon).  I'd still consider using AVX-512, etc. for things if the impact
> on real-world workloads was huge, though. 

Yes, that and by research done while trying to understand why my RISC-V build farm animal "greenfly" (OrangePi RV2 with
aVisionFive 2 CPU: RISC-V RV64GC + Zba/Zbb/Zbc/Zbs) is failing consistently.
 

> -- 
> nathan

Forgive me, while $subject only mentions popcount I couldn't help myself so I added a few more RISC-V patches including
abug fix that I hope makes greenfly happy again.
 


0001 - This is a bug fix for DES/RISC-V/Clang DES initialization.

------> Join me in "the rabbit hole" on this issue if you care to...

The existing software DES (as shown by the build-farm animal "greenfly" [1]) fails because Clang 20 has an
auto-vectorizationbug that we trigger in the DES initialization code (des_init() function), not the DES encryption
algorithmitself.
 

I searched the LLVM issue tracker, here are the issues that caught my eye:
  1. Issue #176001 - "RISC-V Wrong code at -O1"
    - Vector peephole optimization with vmerge folding
    - Fixed by PR #176077 (merged Jan 2024)
    - Link: https://github.com/llvm/llvm-project/issues/176001
  2. Issue #187458 - "Wrong code for vector.extract.last.active"
    - Large index issues with zvl1024b
    - Partially fixed, still work ongoing
    - Link: https://github.com/llvm/llvm-project/issues/187458
  3. Issue #171978 - "RISC-V Wrong code at -O2/O3"
    - Illegal instruction from mismatched EEW
    - Under investigation
    - Link: https://github.com/llvm/llvm-project/issues/171978
  4. PR #176105 - "Fix i64 gather/scatter cost on rv32"
    - Cost model fixes for scatter/gather (merged Jan 2026)
    - Link: https://github.com/llvm/llvm-project/pull/176105

My fix in 0001 is simply adding this in a few places in crypt-des.c:

  #if defined(__riscv) && defined(__clang__)
      pg_memory_barrier();
  #endif

While searching I ran across a different solution, adding `-mllvm -riscv-v-vector-bits-min=0` sets the minimum vector
bitwidth for RISC-V vector extension in LLVM to 0 disabling all vectorization forcing scalar code generation, no RVV
instructionsare emitted.  This would prevent the DES bug at the cost of any vectorization anywhere in the binary.
 

While that might also fix the other intermittent bug we'd been seeing on greenfly (not tested) disablnig all RVV
optimizationsseems to heavy handed to me.
 


------> Moving on.

0002 - (was "0001" in v2) this is unchanged, it implements popcount using Zbb extension on RISC-V

0003 - is a small patch that adapted from the Google Abseil project's RISC-V CRC32C implementation [1].  It is *a lot
faster*than the software crc32c we fall back to now (see: riscv-crc32c.c).  This algorithm requires the Zbc (or Zbkc)
extension(for clmul) so the patch tests for that at build and adds the '-march' flag when it is.  However, as is the
casefor Zbb and popcnt in, the presence of Zbc (or Zbkc) must be detected at runtime.  That's done following the
pre-existingpattern used for ARM features.  This does introduce some runtime overhead and complexity, not more than
requiredI hope.
 

I attached test code, and results at the end of this email:
* riscv-popcnt.c - unchanged
* riscv-crc32c.c - new, based on work in the Google Abseil project
* riscv-des.c    - highlights the fix for DES using Clang on RISC-V 

I guess the question for 002 and/or 003 is if the "juice" is worth the "squeeze" or not.  There is a lot of performance
juiceto be had IMO.  But some might argue that RISC-V isn't widely adopted yet, and they'd be right.  Others might
pointout that RISC-V is currently showing up in embedded systems more than server/desktop/laptop/cloud, also true.
However,there is some evidence that is changing as there are RISC-V in servers [2][3], and there is a hosted (cloud)
solutionfrom Scaleway [4].  There exists a 64 core RISC-V desktop [6] and a Framework laptop mainboard [7] sporting a
RISC-VCPUs.  And there is the OrangePi RV2 [7] I have that is "greenfly".
 

Is it early days?  Certainly!  But too early?  That's up for debate. :)

If nothing else, these patches can be a durable record and used later when RISC-V is a critical platform for Postgres
orinformational to other projects.
 

best.

-greg

[1] https://github.com/abseil/abseil-cpp/pull/1986 absl/crc/internal/crc_riscv.cc
[2] https://www.firefly.store/products/rs-sra120-risc-v-server-2u-computing-server-cloud-storage-large-model-sg2042
[3] https://edgeaicomputer.com/our-products/servers/risc-v-compute-server-sra1-20/
[4]
https://www.scaleway.com/en/news/scaleway-launches-its-risc-v-servers-in-the-cloud-a-world-first-and-a-firm-commitment-to-technological-independence/
[5] https://milkv.io/pioneer and
https://www.crowdsupply.com/milk-v/milk-v-pioneer/updates/current-status-of-production
[6] https://deepcomputing.io/product/dc-roma-risc-v-mainboard/
[7] http://www.orangepi.org/html/hardWare/computerAndMicrocontrollers/details/Orange-Pi-RV2.html


---- TEST PROGRAM OUTPUT:

gburd@rv:~/ws/postgres$ make -f Makefile.RISCV
gcc -O2 riscv-des.c -o des-gcc-sw
gcc -O2 riscv-des.c -march=rv64gcv -o des-gcc-hw
clang-20 -O1 riscv-des.c -o des-clang-o1-sw
clang-20 -O1 -march=rv64gcv riscv-des.c -o des-clang-o1-hw
clang-20 -O2 riscv-des.c -o des-clang-o2-sw
clang-20 -O2 -march=rv64gcv riscv-des.c -o des-clang-o2-hw
gcc -O2 -o popcnt-gcc-o2-sw riscv-popcnt.c
gcc -O2 -march=rv64gc_zbb -o popcnt-gcc-o2-hw riscv-popcnt.c
clang-20 -O2 -o popcnt-clang-o2-sw riscv-popcnt.c
clang-20 -O2 -march=rv64gc_zbb -o popcnt-clang-o2-hw riscv-popcnt.c
gcc -O2 -o crc32c-gcc-o2-sw riscv-crc32c.c
gcc -O2 -march=rv64gc_zbc -o crc32c-gcc-o2-hw riscv-crc32c.c
clang-20 -O2 -o crc32c-clang-o2-sw riscv-crc32c.c
clang-20 -O2 -march=rv64gc_zbc -o crc32c-clang-o2-hw riscv-crc32c.c
gburd@rv:~/ws/postgres$ make -f Makefile.RISCV test
./des-gcc-sw
Compiler: GCC 13.3.0
Target: RISC-V 64-bit
Vector extension: Not enabled

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.409 seconds (409 ns/iter)
With barriers:    0.416 seconds (416 ns/iter)
Overhead: 1.6%
./des-gcc-hw
Compiler: GCC 13.3.0
Target: RISC-V 64-bit
Vector extension: Enabled (RVV)

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.410 seconds (410 ns/iter)
With barriers:    0.410 seconds (410 ns/iter)
Overhead: Negligible
./des-clang-o1-sw
Compiler: Clang 20.1.2
Target: RISC-V 64-bit
Vector extension: Not enabled

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.517 seconds (517 ns/iter)
With barriers:    0.516 seconds (516 ns/iter)
Overhead: Negligible
./des-clang-o1-hw
Compiler: Clang 20.1.2
Target: RISC-V 64-bit
Vector extension: Enabled (RVV)

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.405 seconds (405 ns/iter)
With barriers:    0.405 seconds (405 ns/iter)
Overhead: Negligible
./des-clang-o2-sw
Compiler: Clang 20.1.2
Target: RISC-V 64-bit
Vector extension: Not enabled

Testing WITHOUT compiler barriers:
PASS: Permutation tables are correct

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.517 seconds (517 ns/iter)
With barriers:    0.518 seconds (518 ns/iter)
Overhead: Negligible
./des-clang-o2-hw
Compiler: Clang 20.1.2
Target: RISC-V 64-bit
Vector extension: Enabled (RVV)

Testing WITHOUT compiler barriers:
ERROR: un_pbox mismatch:
    un_pbox[0] = 15, expected 8
    un_pbox[1] = 6, expected 16
    un_pbox[2] = 19, expected 22
    un_pbox[3] = 20, expected 30
    un_pbox[4] = 28, expected 12
  ... and 27 more errors
FAIL: Permutation tables are incorrect

Testing WITH compiler barriers:
PASS: Permutation tables are correct

Performance Comparison (1000000 iterations):
Without barriers: 0.093 seconds (93 ns/iter)
With barriers:    0.407 seconds (407 ns/iter)
Overhead: 335.5%
./popcnt-gcc-o2-sw
sw popcount:    0.183 sec  (    547.89 MB/s)
hw popcount:    0.274 sec  (    365.40 MB/s)

diff: 0.67x
match: 406261900 bits counted
./popcnt-gcc-o2-hw
sw popcount:    0.182 sec  (    548.17 MB/s)
hw popcount:    0.044 sec  (   2287.82 MB/s)

diff: 4.17x
match: 406261900 bits counted
./popcnt-clang-o2-sw
sw popcount:    0.188 sec  (    531.96 MB/s)
hw popcount:    0.207 sec  (    482.84 MB/s)

diff: 0.91x
match: 406261900 bits counted
./popcnt-clang-o2-hw
sw popcount:    0.224 sec  (    446.46 MB/s)
hw popcount:    0.056 sec  (   1794.83 MB/s)

diff: 4.02x
match: 406261900 bits counted
./crc32c-gcc-o2-sw
sw crc32c:    0.651 sec  (    153.68 MB/s)
hw crc32c:    0.651 sec  (    153.72 MB/s)

diff: 1.00x
match: 0x0B141F2D

validation: CRC32C("123456789") = 0xE3069283 (correct)
./crc32c-gcc-o2-hw
sw crc32c:    0.651 sec  (    153.70 MB/s)
hw crc32c:    0.000 sec  ( 308052.33 MB/s)

diff: 2004.21x
match: 0x0B141F2D

validation: CRC32C("123456789") = 0xE3069283 (correct)
./crc32c-clang-o2-sw
sw crc32c:    0.584 sec  (    171.10 MB/s)
hw crc32c:    0.584 sec  (    171.17 MB/s)

diff: 1.00x
match: 0x0B141F2D

validation: CRC32C("123456789") = 0xE3069283 (correct)
./crc32c-clang-o2-hw
sw crc32c:    0.584 sec  (    171.15 MB/s)
hw crc32c:    0.000 sec  ( 309282.38 MB/s)

diff: 1807.08x
match: 0x0B141F2D

validation: CRC32C("123456789") = 0xE3069283 (correct)
Вложения

clang bug affecting greenfly

От
John Naylor
Дата:
[new subject]

On Sat, Mar 28, 2026 at 3:22 AM Greg Burd <greg@burd.me> wrote:

> 0001 - This is a bug fix for DES/RISC-V/Clang DES initialization.
>
> ------> Join me in "the rabbit hole" on this issue if you care to...
>
> The existing software DES (as shown by the build-farm animal "greenfly" [1]) fails because Clang 20 has an
auto-vectorizationbug that we trigger in the DES initialization code (des_init() function), not the DES encryption
algorithmitself. 

> [disable vectorization entirely]
> While that might also fix the other intermittent bug we'd been seeing on greenfly (not tested) disablnig all RVV
optimizationsseems to heavy handed to me. 

The first thing I notice is that not very long ago the buildfarm had 3
gcc RISC-V members, but not anymore. If you care about having coverage
for this hardware, I'd suggest picking up gcc again if that's still
working, and wait and see about clang. Clang has shipped broken code
generation for obscure platforms in the past, and it seems here we're
not even sure of the extent of the breakage.

--
John Naylor
Amazon Web Services



Re: clang bug affecting greenfly

От
"Greg Burd"
Дата:
On Mon, Mar 30, 2026, at 2:39 AM, John Naylor wrote:
> [new subject]
>
> On Sat, Mar 28, 2026 at 3:22 AM Greg Burd <greg@burd.me> wrote:
>
>> 0001 - This is a bug fix for DES/RISC-V/Clang DES initialization.
>>
>> ------> Join me in "the rabbit hole" on this issue if you care to...
>>
>> The existing software DES (as shown by the build-farm animal "greenfly" [1]) fails because Clang 20 has an
auto-vectorizationbug that we trigger in the DES initialization code (des_init() function), not the DES encryption
algorithmitself. 
>
>> [disable vectorization entirely]
>> While that might also fix the other intermittent bug we'd been seeing on greenfly (not tested) disablnig all RVV
optimizationsseems to heavy handed to me. 
>
> The first thing I notice is that not very long ago the buildfarm had 3
> gcc RISC-V members, but not anymore. If you care about having coverage
> for this hardware, I'd suggest picking up gcc again if that's still
> working, and wait and see about clang. Clang has shipped broken code
> generation for obscure platforms in the past, and it seems here we're
> not even sure of the extent of the breakage.

Hey John,

All fair points.  I've changed greenfly to use GCC 13.3.0, thanks for the suggestion.

> --
> John Naylor
> Amazon Web Services

best.

-greg