call popcount32/64 directly on non-x86 platforms

Поиск
Список
Период
Сортировка
От John Naylor
Тема call popcount32/64 directly on non-x86 platforms
Дата
Msg-id CAFBsxsE7otwnfA36Ly44zZO+b7AEWHRFANxR1h1kxveEV=ghLQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: call popcount32/64 directly on non-x86 platforms  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
Currently, all platforms must indirect through a function pointer to call popcount on a word-sized input, even though we don't arrange for a fast implementation on non-x86 to make it worthwhile.

0001 moves some declarations around so that "slow" popcount functions are called directly on non-x86 platforms.

0002 was an idea to simplify and unify the coding for the slow functions.

Also attached is a test module for building microbenchmarks.

On a Power8 machine using gcc 4.8, and running
time ./inst/bin/psql -c 'select drive_popcount(100000, 1024)'

I get

master: 647ms
0001: 183ms
0002: 228ms

So 0001 is a clear winner on that platform. 0002 is still good, but slower than 0001 for some reason, and it turns out that on master, gcc does emit a popcnt instruction from the intrinsic:

0000000000000000 <pg_popcount32_slow>:
   0:   f4 02 63 7c     popcntw r3,r3
   4:   b4 07 63 7c     extsw   r3,r3
   8:   20 00 80 4e     blr
        ...

The gcc docs mention a flag for this, but I'm not sure why it seems not to need it:

https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#RS_002f6000-and-PowerPC-Options

Maybe that's because the machine I used was ppc64le, but I'm not sure a ppc binary built like this is portable to other hardware. For that reason, maybe 0002 is a good idea. 

--
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Next Steps with Hash Indexes
Следующее
От: Suraj Khamkar
Дата:
Сообщение: Re: Tab completion for CREATE SCHEMAAUTHORIZATION