Re: add AVX2 support to simd.h

Поиск
Список
Период
Сортировка
От Nathan Bossart
Тема Re: add AVX2 support to simd.h
Дата
Msg-id 20240109162009.GA3033933@nathanxps13
обсуждение исходный текст
Ответ на Re: add AVX2 support to simd.h  (John Naylor <johncnaylorls@gmail.com>)
Ответы Re: add AVX2 support to simd.h  (Ants Aasma <ants.aasma@cybertec.at>)
Re: add AVX2 support to simd.h  (John Naylor <johncnaylorls@gmail.com>)
Список pgsql-hackers
On Tue, Jan 09, 2024 at 09:20:09AM +0700, John Naylor wrote:
> On Tue, Jan 9, 2024 at 12:37 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>>
>> > I suspect that there could be a regression lurking for some inputs
>> > that the benchmark doesn't look at: pg_lfind32() currently needs to be
>> > able to read 4 vector registers worth of elements before taking the
>> > fast path. There is then a tail of up to 15 elements that are now
>> > checked one-by-one, but AVX2 would increase that to 31. That's getting
>> > big enough to be noticeable, I suspect. It would be good to understand
>> > that case (n*32 + 31), because it may also be relevant now. It's also
>> > easy to improve for SSE2/NEON for v17.
>>
>> Good idea.  If it is indeed noticeable, we might be able to "fix" it by
>> processing some of the tail with shorter vectors.  But that probably means
>> finding a way to support multiple vector sizes on the same build, which
>> would require some work.
> 
> What I had in mind was an overlapping pattern I've seen in various
> places: do one iteration at the beginning, then subtract the
> aligned-down length from the end and do all those iterations. And
> one-by-one is only used if the total length is small.

Sorry, I'm not sure I understood this.  Do you mean processing the first
several elements individually or with SSE2 until the number of remaining
elements can be processed with just the AVX2 instructions (a bit like how
pg_comp_crc32c_armv8() is structured for memory alignment)?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Melanie Plageman
Дата:
Сообщение: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Следующее
От: "Tristan Partin"
Дата:
Сообщение: Re: Make psql ignore trailing semicolons in \sf, \ef, etc