Обсуждение: A population of population counts

Поиск
Список
Период
Сортировка

A population of population counts

От
Thomas Munro
Дата:
Hi

I noticed that we have three "number_of_ones" tables under contrib and
two under src, and some new specially masked variants for visibility
maps.

Would it be an improvement if we just defined one table with external
linkage, and accessed it via a macros/functions popcount_uint8, and
wider versions _uint32, popcount_array(data, len) that sum the
popcounts of their component bytes?

Then there would be less duplication, and future opportunities to use
fancy built-ins/assembler instructions/vectorisation in one central
place, and to work in larger sizes than bytes.

Perhaps we could also get rid of the new special masked popcount
tables by masking the value we look up instead, eg walk through the
page calling popcount_uint64(value & FROZEN_BITMASK_64).

As for the rightmost_one_pos table in bitmapset.c, couldn't the
various bms_XXX functions just use ffs(n) - 1 and work word-at-a-time?That generates a bsf instruction at -O2 on this
machine.

The micro-optimisation opportunities probably don't matter, but I
wondered if it might at least be interesting to delete a bunch of
code, and re-use a standard interface for something that apparently
several modules need to do.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: A population of population counts

От
David Rowley
Дата:
On 7 May 2016 at 12:41, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
> Hi
>
> I noticed that we have three "number_of_ones" tables under contrib and
> two under src, and some new specially masked variants for visibility
> maps.
>
> Would it be an improvement if we just defined one table with external
> linkage, and accessed it via a macros/functions popcount_uint8, and
> wider versions _uint32, popcount_array(data, len) that sum the
> popcounts of their component bytes?
>
> Then there would be less duplication, and future opportunities to use
> fancy built-ins/assembler instructions/vectorisation in one central
> place, and to work in larger sizes than bytes.
>
> Perhaps we could also get rid of the new special masked popcount
> tables by masking the value we look up instead, eg walk through the
> page calling popcount_uint64(value & FROZEN_BITMASK_64).
>
> As for the rightmost_one_pos table in bitmapset.c, couldn't the
> various bms_XXX functions just use ffs(n) - 1 and work word-at-a-time?
>  That generates a bsf instruction at -O2 on this machine.
>
> The micro-optimisation opportunities probably don't matter, but I
> wondered if it might at least be interesting to delete a bunch of
> code, and re-use a standard interface for something that apparently
> several modules need to do.

I remember looking at GCC's __builtin_popcount() about 6 months ago
with thoughts about using it for bms_num_members(). I did see
performance improvements from micro-benchmarks to compare it to the
number_of_ones[] array, and saw a small improvement. Likely the
improvements I did see with those would actually have been better in a
real world case, since not having a number_of_ones[] array in a CPU
cache might be more useful for a workload with a more contended cache.

I'd like to see us using those functions, when they're available and
falling back on the array when they're not. Sounds like that would
just be a new configure test. Perhaps a good home for some shared code
would be numutils.c. I see the functions there declared in builtins.h
which I see used in contrib/spi. I think it could all be made to work
quite nicely with types like bitmapword just with some preprocessor
trickery.

I'd say go for it. Sounds like a like these would be some nice
reusable functions that we already have a suitable home for.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: A population of population counts

От
Tom Lane
Дата:
David Rowley <david.rowley@2ndquadrant.com> writes:
> I'd like to see us using those functions, when they're available and
> falling back on the array when they're not. Sounds like that would
> just be a new configure test. Perhaps a good home for some shared code
> would be numutils.c.

Meh --- numutils.c is about numbers.  Maybe "bitutils.c"?

Another point here is that there might easily be a use-case for such
functionality in frontend code, so I'd lean towards putting the support in
src/common if we do this.
        regards, tom lane



Re: A population of population counts

От
Thomas Munro
Дата:
On Sat, May 7, 2016 at 4:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rowley <david.rowley@2ndquadrant.com> writes:
>> I'd like to see us using those functions, when they're available and
>> falling back on the array when they're not. Sounds like that would
>> just be a new configure test. Perhaps a good home for some shared code
>> would be numutils.c.
>
> Meh --- numutils.c is about numbers.  Maybe "bitutils.c"?
>
> Another point here is that there might easily be a use-case for such
> functionality in frontend code, so I'd lean towards putting the support in
> src/common if we do this.

I played around with this a bit on the weekend (see rough sketch
attached).  The trouble with __builtin_popcount and the POPCNT
instruction is that you only get them if you ask for -msse4.2 or
-mpopcnt, and then you get an illegal instruction trap instead of
sensible fallback behaviour on ancient hardware, so no packager would
be able to do that.  So I guess we'd have to use a runtime test like
we do for CRC32 hardware support (the test may actually be identical
since both depend on the SSE4.2 instruction set) and maybe inline
assembler rather the GCC builtin, and that seems a bit over the top
unless someone can show that it's worth it.

My aim with this thread was mainly reducing code duplication and
needless code: perhaps at least the other ideas in the attached
sketch, namely using ffs instead of the rightmost_one_pos table loop
and consolidation of popcount into a reusable API (without trying to
get hardware support) could be worth polishing for the next CF?
Annoyingly, it seems Windows doesn't have POSIX/SUSv2 ffs, though
apparently it can reach that instruction with MSVC intrinsic
_BitScanReverse or MingW __builtin_ffs.

--
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: A population of population counts

От
Robert Haas
Дата:
On Sun, May 8, 2016 at 7:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> My aim with this thread was mainly reducing code duplication and
> needless code: perhaps at least the other ideas in the attached
> sketch, namely using ffs instead of the rightmost_one_pos table loop
> and consolidation of popcount into a reusable API (without trying to
> get hardware support) could be worth polishing for the next CF?
> Annoyingly, it seems Windows doesn't have POSIX/SUSv2 ffs, though
> apparently it can reach that instruction with MSVC intrinsic
> _BitScanReverse or MingW __builtin_ffs.

I think my_log2() is the same thing as one of ffs() and fls() - I can
never keep those straight.  It seems like it wouldn't he hard to clean
things up at least that much.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company