On 20/12/18 6:53, David Rowley wrote:
> Back in 2016 [1] there was some discussion about using the POPCNT
> instruction to improve the performance of counting the number of bits
> set in a word. Improving this helps various cases, such as
> bms_num_members and also things like counting the allvisible and
> frozen pages in the visibility map.
>
> [snip]
>
> I've put together a very rough patch to implement using POPCNT and the
> leading and trailing 0-bit instructions to improve the performance of
> bms_next_member() and bms_prev_member(). The correct function should
> be determined on the first call to each function by way of setting a
> function pointer variable to the most suitable supported
> implementation. I've not yet gone through and removed all the
> number_of_ones[] arrays to replace with a pg_popcount*() call.
IMVHO: Please do not disregard potential optimization by the compiler
around those calls.. o_0 That might explain the reduced performance
improvement observed.
Not that I can see any obvious alternative to your implementation right
away ...
> That
> seems to have mostly been done in Thomas' patch [3], part of which
> I've used for the visibilitymap.c code changes. If this patch proves
> to be possible, then I'll look at including the other changes Thomas
> made in his patch too.
>
> What I'm really looking for by posting now are reasons why we can't do
> this. I'm also interested in getting some testing done on older
> machines, particularly machines with processors that are from before
> 2007, both AMD and Intel.
I can offer a 2005-vintage Opteron 2216 rev3 (bought late 2007) to test
on. Feel free to toss me some test code.
cpuinfo flags: fpu de tsc msr pae mce cx8 apic mca cmov pat clflush
mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow
rep_good nopl extd_apicid eagerfpu pni cx16 hypervisor lahf_lm
cmp_legacy 3dnowprefetch vmmcall
> 2007-2008 seems to be around the time both
> AMD and Intel added support for POPCNT and LZCNT, going by [4].
Thanks