Re: use ARM intrinsics in pg_lfind32() where available
От | Nathan Bossart |
---|---|
Тема | Re: use ARM intrinsics in pg_lfind32() where available |
Дата | |
Msg-id | 20220829054449.GA399604@nathanxps13 обсуждение исходный текст |
Ответ на | Re: use ARM intrinsics in pg_lfind32() where available (John Naylor <john.naylor@enterprisedb.com>) |
Ответы |
Re: use ARM intrinsics in pg_lfind32() where available
|
Список | pgsql-hackers |
On Mon, Aug 29, 2022 at 11:25:50AM +0700, John Naylor wrote: > + uint32 nelem_per_vector = sizeof(Vector32) / sizeof(uint32); > + uint32 nelem_per_iteration = 4 * nelem_per_vector; > > Using local #defines would be my style. I don't have a reason to > object to this way, but adding const makes these vars more clear. I added const. > Speaking of const: > > - const __m128i tmp1 = _mm_or_si128(result1, result2); > - const __m128i tmp2 = _mm_or_si128(result3, result4); > - const __m128i result = _mm_or_si128(tmp1, tmp2); > + tmp1 = vector32_or(result1, result2); > + tmp2 = vector32_or(result3, result4); > + result = vector32_or(tmp1, tmp2); > > Any reason to throw away the const declarations? The only reason is because I had to move the declarations to before the vector32_load() calls. > +static inline bool > +vector32_is_highbit_set(const Vector32 v) > +{ > +#ifdef USE_SSE2 > + return (_mm_movemask_epi8(v) & 0x8888) != 0; > +#endif > +} > > I'm not sure why we need this function -- AFAICS it just adds more > work on x86 for zero benefit. For our present application, can we just > cast to Vector8 (for Arm's sake) and call the 8-bit version? Good idea. > - * operations using bitwise operations on unsigned integers. > + * operations using bitwise operations on unsigned integers. Note that many > + * of the functions in this file presently do not have non-SIMD > + * implementations. > > It's unclear to the reader whether this is a matter of 'round-to-it's. > I'd like to document what I asserted in this thread, that it's likely > not worthwhile to do anything with a uint64 representing two 32-bit > ints. (It *is* demonstrably worth it for handling 8 byte-values at a > time) Done. > * Use saturating subtraction to find bytes <= c, which will present as > - * NUL bytes in 'sub'. > + * NUL bytes. > > I'd like to to point out that the reason to do it this way is to > workaround SIMD architectures frequent lack of unsigned comparison. Done. > + * Return the result of subtracting the respective elements of the input > + * vectors using saturation. > > I wonder if we should explain briefly what saturating arithmetic is. I > had never encountered it outside of a SIMD programming context. Done. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: