Re: pgbench - add pseudo-random permutation function
От | Fabien COELHO |
---|---|
Тема | Re: pgbench - add pseudo-random permutation function |
Дата | |
Msg-id | alpine.DEB.2.21.1809260717020.22248@lancre обсуждение исходный текст |
Ответ на | Re: pgbench - add pseudo-random permutation function (Thomas Munro <thomas.munro@enterprisedb.com>) |
Ответы |
Re: pgbench - add pseudo-random permutation function
|
Список | pgsql-hackers |
Hello Thomas, > modular_multiply() is an interesting device. I will leave this to > committers with a stronger mathematical background than me, but I have > a small comment in passing: For testing this function, I have manually commented out the various shortcuts so that the manual multiplication code is always used, and the tests passed. I just did it again. > +#ifdef HAVE__BUILTIN_POPCOUNTLL > + return __builtin_popcountll(n); > +#else /* use clever no branching bitwise operator version */ > > I think it is not enough for the compiler to have > __builtin_popcountll(). The CPU that this is eventually executed on > must actually have the POPCNT instruction[1] (or equivalent on ARM, > POWER etc), or the program will die with SIGILL. Hmmm, I'd be pretty surprised: The point of the builtin is to delegate to the compiler the hassle of selecting the best option available, depending on target hardware class: whether it issues a cpu/sse4 instruction is not/should not be our problem. > There are CPUs in circulation produced in this decade that don't have > it. Then the compiler, when generating code that is expected to run for a large class of hardware which include such old ones, should not use a possibly unavailable instruction, or the runtime should take responsability for dynamically selecting a viable option. My understanding is that it should always be safe to call __builtin_XYZ functions when available. Then if you compile saying that you want code specific to cpu X and then run it on cpu Y and it fails, basically you just shot yourself in the foot. > I have previously considered something like this[2], but realised you > would therefore need a runtime check. There are a couple of ways to do > that: see commit a7a73875 for one example, also > __builtin_cpu_supports(), and direct CPU ID bit checks (some platforms). > There is also the GCC "multiversion" system that takes care of that for > you but that worked only for C++ last time I checked. ISTM that the purpose of a dynamic check would be to have the better hardware benefit even when compiling for a generic class of hardware which may or may not have the better option. This approach is fine for reaching a better performance/portability compromise, but I do not think that it is that useful for pgbench to go to this level of optimization, unless someone else takes care, hence the compiler builtin. An interesting side effect of your mail is that while researching a bit on the subject I noticed __builtin_clzll() which helps improve the nbits code compared to using popcount. Patch attached uses CLZ insted of POPCOUNT. -- Fabien.
Вложения
В списке pgsql-hackers по дате отправления: