Re: add AVX2 support to simd.h
От | Nathan Bossart |
---|---|
Тема | Re: add AVX2 support to simd.h |
Дата | |
Msg-id | 20240319163033.GA952199@nathanxps13 обсуждение исходный текст |
Ответ на | Re: add AVX2 support to simd.h (John Naylor <johncnaylorls@gmail.com>) |
Ответы |
Re: add AVX2 support to simd.h
|
Список | pgsql-hackers |
On Tue, Mar 19, 2024 at 04:53:04PM +0700, John Naylor wrote: > On Tue, Mar 19, 2024 at 10:16 AM Nathan Bossart > <nathandbossart@gmail.com> wrote: >> 0002 does the opposite of this. That is, after we've completed as many >> blocks as possible, we move the iterator variable back to "end - >> block_size" and do one final iteration to cover all the remaining elements. > > Sounds similar in principle, but it looks really complicated. I don't > think the additional loops and branches are a good way to go, either > for readability or for branch prediction. My sketch has one branch for > which loop to do, and then performs only one loop. Let's do the > simplest thing that could work. (I think we might need a helper > function to do the block, but the rest should be easy) I tried to trim some of the branches, and came up with the attached patch. I don't think this is exactly what you were suggesting, but I think it's relatively close. My testing showed decent benefits from using 2 vectors when there aren't enough elements for 4, so I've tried to keep that part intact. This changes pg_lfind32() to something like: if not many elements process one by one while enough elements for 4 registers remain process with 4 registers if no elements remain return false if more than 2-registers-worth of elements remain do one iteration with 2 registers do another iteration on last 2-registers-worth of elements -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: