Re: use ARM intrinsics in pg_lfind32() where available

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: use ARM intrinsics in pg_lfind32() where available
Дата
Msg-id CA+hUKGLcDi5L-+QXXDTEEEd0ynYse7Fd6dXdP6W9e+XVpa-i4Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: use ARM intrinsics in pg_lfind32() where available  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: use ARM intrinsics in pg_lfind32() where available  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
On Sun, Aug 28, 2022 at 10:12 AM Nathan Bossart
<nathandbossart@gmail.com> wrote:
> Yup.  The problem is that AFAICT there's no equivalent to
> _mm_movemask_epi8() on aarch64, so you end up with something like
>
>         vmaxvq_u8(vandq_u8(v, vector8_broadcast(0x80))) != 0
>
> But for pg_lfind32(), we really just want to know if any lane is set, which
> only requires a call to vmaxvq_u32().  I haven't had a chance to look too
> closely, but my guess is that this ultimately results in an extra AND
> operation in the aarch64 path, so maybe it doesn't impact performance too
> much.  The other option would be to open-code the intrinsic function calls
> into pg_lfind.h.  I'm trying to avoid the latter, but maybe it's the right
> thing to do for now...  What do you think?

Ahh, this gives me a flashback to John's UTF-8 validation thread[1]
(the beginner NEON hackery in there was just a learning exercise,
sadly not followed up with real patches...).  He had
_mm_movemask_epi8(v) != 0 which I first translated to
to_bool(bitwise_and(v, vmovq_n_u8(0x80))) and he pointed out that
vmaxvq_u8(v) > 0x7F has the right effect without the and.

[1] https://www.postgresql.org/message-id/CA%2BhUKGJjyXvS6W05kRVpH6Kng50%3DuOGxyiyjgPKm707JxQYHCg%40mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: use ARM intrinsics in pg_lfind32() where available
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: use ARM intrinsics in pg_lfind32() where available