On Fri, Sep 27, 2024 at 02:50:13PM +1200, David Rowley wrote:
> I had been looking at [1] (which I've added your version to now). I
> had been surprised to see gcc emitting different code for the first 3
> versions. Clang does a better job at figuring out they all do the same
> thing and emitting the same code for each.
Interesting.
> I played around with the attached (hacked up) qsort.c to see if there
> was any difference.  Likely function call overhead kills the
> performance anyway. There does not seem to be much difference between
> them. I've not tested with an inlined comparison function.
I'd expect worse performance with the branchless routines for the inlined
case.  However, I recall that clang was able to optimize med3() as well as
it can with the branching routines, so that may not always be true.
> Looking at your version, it doesn't look like there's any sort of
> improvement in terms of the instructions. Certainly, for clang, it's
> worse as it adds a shift left instruction and an additional compare.
> No jumps, at least.
I think I may have forgotten to add -O2 when I was inspecting this code
with godbolt.org earlier.  *facepalm*  The different versions look pretty
comparable with that added.
> What's your reasoning for returning INT_MIN and INT_MAX?
That's just for the compile option added by commit c87cb5f, which IIUC is
intended to test that we correctly handle comparisons that return INT_MIN.
-- 
nathan