On Thu, Feb 06, 2025 at 08:44:35AM +0000, Chiranmoy.Bhattacharya@fujitsu.com wrote:
>> Does this hand-rolled loop unrolling offer any particular advantage?  What
>> do the numbers look like if we don't do this or if we process, say, 4
>> vectors at a time?
> 
> The unrolled version performs better than the non-unrolled one, but
> processing four vectors provides no additional benefit. The numbers
> and code used are given below.
Hm.  Any idea why that is?  I wonder if the compiler isn't using as many
SVE registers as it could for this.
-- 
nathan