Re: pgbench - add pseudo-random permutation function

Поиск
Список
Период
Сортировка
От Fabien COELHO
Тема Re: pgbench - add pseudo-random permutation function
Дата
Msg-id alpine.DEB.2.21.1809260717020.22248@lancre
обсуждение исходный текст
Ответ на Re: pgbench - add pseudo-random permutation function  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: pgbench - add pseudo-random permutation function  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
Hello Thomas,

> modular_multiply() is an interesting device.  I will leave this to
> committers with a stronger mathematical background than me, but I have
> a small comment in passing:

For testing this function, I have manually commented out the various 
shortcuts so that the manual multiplication code is always used, and the 
tests passed. I just did it again.

> +#ifdef HAVE__BUILTIN_POPCOUNTLL
> + return __builtin_popcountll(n);
> +#else /* use clever no branching bitwise operator version */
>
> I think it is not enough for the compiler to have 
> __builtin_popcountll().  The CPU that this is eventually executed on 
> must actually have the POPCNT instruction[1] (or equivalent on ARM, 
> POWER etc), or the program will die with SIGILL.

Hmmm, I'd be pretty surprised: The point of the builtin is to delegate to 
the compiler the hassle of selecting the best option available, depending 
on target hardware class: whether it issues a cpu/sse4 instruction is 
not/should not be our problem.

> There are CPUs in circulation produced in this decade that don't have 
> it.

Then the compiler, when generating code that is expected to run for a 
large class of hardware which include such old ones, should not use a 
possibly unavailable instruction, or the runtime should take 
responsability for dynamically selecting a viable option.

My understanding is that it should always be safe to call __builtin_XYZ 
functions when available. Then if you compile saying that you want code 
specific to cpu X and then run it on cpu Y and it fails, basically you 
just shot yourself in the foot.

> I have previously considered something like this[2], but realised you 
> would therefore need a runtime check.  There are a couple of ways to do 
> that: see commit a7a73875 for one example, also 
> __builtin_cpu_supports(), and direct CPU ID bit checks (some platforms). 
> There is also the GCC "multiversion" system that takes care of that for 
> you but that worked only for C++ last time I checked.

ISTM that the purpose of a dynamic check would be to have the better 
hardware benefit even when compiling for a generic class of hardware which 
may or may not have the better option.

This approach is fine for reaching a better performance/portability 
compromise, but I do not think that it is that useful for pgbench to go to 
this level of optimization, unless someone else takes care, hence the 
compiler builtin.

An interesting side effect of your mail is that while researching a bit on 
the subject I noticed __builtin_clzll() which helps improve the nbits code 
compared to using popcount. Patch attached uses CLZ insted of POPCOUNT.

-- 
Fabien.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chris Travers
Дата:
Сообщение: Re: Proposal for Signal Detection Refactoring
Следующее
От: Fabien COELHO
Дата:
Сообщение: Re: Progress reporting for pg_verify_checksums