I wrote:
> Evidently, asin() is returning an 80-bit result, and that's not
> getting rounded to double width before we divide by asin_0_5.
I found that I could reproduce this type of assembly-code behavior
on dromedary (gcc version 4.2.1) by using -mfpmath=387. That box
doesn't show the visible regression-test failure, but that must be
down to its version of asin() not producing the same low-order bits
that yours does. It's clear from the assembly output that the code
*would* misbehave given an appropriate asin() library function.
With that version of gcc, just casting the function output to double
changes nothing. The local-variable solution likewise. I do get
a useful fix if I declare the asin_x local variable volatile:
.loc 1 1873 0 fstpl (%esp) call _asin
+ fstpl -16(%ebp)
+ LVL107:
+ fldl -16(%ebp) fdivl _asin_0_5-"L00000000019$pb"(%ebx) fmuls LC21-"L00000000019$pb"(%ebx)
Interestingly, declaring asin_x as global is not enough to fix it,
because it stores into the global but doesn't fetch back:
.loc 1 1873 0 fstpl (%esp) call _asin
+ movl L_asin_x$non_lazy_ptr-"L00000000019$pb"(%ebx), %eax
+ fstl (%eax) fdivl _asin_0_5-"L00000000019$pb"(%ebx) fmuls LC21-"L00000000019$pb"(%ebx)
Declaring asin_x as a volatile global does fix it:
.loc 1 1873 0 fstpl (%esp) call _asin
+ movl L_asin_x$non_lazy_ptr-"L00000000019$pb"(%ebx), %eax
+ fstpl (%eax)
+ fldl (%eax) fdivl _asin_0_5-"L00000000019$pb"(%ebx) fmuls LC21-"L00000000019$pb"(%ebx)
but that seems like just being uglier without much redeeming value.
In short, these tests suggest that we need a coding pattern like
this:
volatile float8 asin_x = asin(x);return (asin_x / asin_0_5) * 30.0;
We could drop the "volatile" by adopting -ffloat-store, but that
doesn't seem like a better answer, because -ffloat-store would
pessimize a lot of code elsewhere. (It causes a whole lot of
changes in float.c, for sure.)
regards, tom lane