Re: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: Popcount optimization using AVX512
Дата
Msg-id 20231107055315.8e@rfd.leadboat.com
обсуждение исходный текст
Ответ на Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Список pgsql-hackers
On Mon, Nov 06, 2023 at 09:59:26PM -0600, Nathan Bossart wrote:
> On Mon, Nov 06, 2023 at 07:15:01PM -0800, Noah Misch wrote:
> > On Mon, Nov 06, 2023 at 09:52:58PM -0500, Tom Lane wrote:
> >> Nathan Bossart <nathandbossart@gmail.com> writes:
> >> > Like I said, I don't have any proposals yet, but assuming we do want to
> >> > support newer intrinsics, either open-coded or via auto-vectorization, I
> >> > suspect we'll need to gather consensus for a new policy/strategy.
> >> 
> >> Yeah.  The function-pointer solution kind of sucks, because for the
> >> sort of operation we're considering here, adding a call and return
> >> is probably order-of-100% overhead.  Worse, it adds similar overhead
> >> for everyone who doesn't get the benefit of the optimization.
> > 
> > The glibc/gcc "ifunc" mechanism was designed to solve this problem of choosing
> > a function implementation based on the runtime CPU, without incurring function
> > pointer overhead.  I would not attempt to use AVX512 on non-glibc systems, and
> > I would use ifunc to select the desired popcount implementation on glibc:
> > https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Function-Attributes.html
> 
> Thanks, that seems promising for the function pointer cases.  I'll plan on
> trying to convert one of the existing ones to use it.  BTW it looks like
> LLVM has something similar [0].
> 
> IIUC this unfortunately wouldn't help for cases where we wanted to keep
> stuff inlined, such as is_valid_ascii() and the functions in pg_lfind.h,
> unless we applied it to the calling functions, but that doesn't ѕound
> particularly maintainable.

Agreed, it doesn't solve inline cases.  If the gains are big enough, we should
move toward packages containing N CPU-specialized copies of the postgres
binary, with bin/postgres just exec'ing the right one.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Intermittent failure with t/003_logical_slots.pl test on windows
Следующее
От: John Morris
Дата:
Сообщение: Re: Where can I find the doxyfile?