At Thu, 14 Feb 2019 16:45:38 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote in <822.1550180738@sss.pgh.pa.us>
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-02-14 15:47:13 -0300, Alvaro Herrera wrote:
> >> Hah, I just realized you have to add -mlzcnt in order for these builtins
> >> to use the lzcnt instructions. It goes from something like
> >>
> >> bsrq %rax, %rax
> >> xorq $63, %rax
>
> > I'm confused how this is a general count leading zero operation? Did you
> > use constants or something that allowed ot infer a range in the test? If
> > so the compiler probably did some optimizations allowing it to do the
> > above.
>
> No. If you compile
>
> int myclz(unsigned long long x)
> {
> return __builtin_clzll(x);
> }
>
> at -O2, on just about any x86_64 gcc, you will get
>
> myclz:
> .LFB1:
> .cfi_startproc
> bsrq %rdi, %rax
> xorq $63, %rax
> ret
> .cfi_endproc
>
I understand that the behavior of __builtin_c[tl]z(0) is
undefined from the reason, they convert to bs[rf]. So if we use
these builtins, additional check is required.
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
> Built-in Function: int __builtin_clz (unsigned int x)
> Returns the number of leading 0-bits in x, starting at the most
> significant bit position. If x is 0, the result is undefined.
>
> Built-in Function: int __builtin_ctz (unsigned int x)
> Returns the number of trailing 0-bits in x, starting at the
> least significant bit position. If x is 0, the result is
> undefined.
And even worse lzcntx is accidentially "fallback"s to bsrx on
unsupported CPUs, which leads to bogus results.
__builtin_clzll(8) = 3, which should be 60.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center