Re: use CLZ instruction in AllocSetFreeIndex()

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: use CLZ instruction in AllocSetFreeIndex()
Дата
Msg-id 2770.1577458475@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: use CLZ instruction in AllocSetFreeIndex()  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: use CLZ instruction in AllocSetFreeIndex()  (John Naylor <john.naylor@2ndquadrant.com>)
Re: use CLZ instruction in AllocSetFreeIndex()  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Dec-26, John Naylor wrote:
>> In commit ab5b4e2f9ed, we optimized AllocSetFreeIndex() using a lookup
>> table. At the time, using CLZ was rejected because compiler/platform
>> support was not widespread enough to justify it. For other reasons, we
>> recently added bitutils.h which uses __builtin_clz() where available,
>> so it makes sense to revisit this. I modified the test in [1] (C files
>> attached), using two separate functions to test CLZ versus the
>> open-coded algorithm of pg_leftmost_one_pos32().

> I can confirm these results on my Intel laptop.  I ran it with a
> repetition of 20, averages of 4 runs:

I tried this on a few other architectures --- ppc32, aarch64, and x86
(not 64).  The general contours of the result are the same on all,
eg here's the results on aarch64 (Fedora 28):

$ ./a.out 100
...
                 clz 22.713s
       bitutils func 59.462s
             current 30.630s

This kind of leads me to wonder if we don't need to expend more
effort on the non-CLZ version of pg_leftmost_one_pos32; it seems
like it shouldn't be losing this badly to the only-slightly-
improved logic that's currently in AllocSetFreeIndex.  On the
other hand, the buildfarm thinks that __builtin_clz is essentially
universal these days --- the only active non-MSVC critter that
reports not having it is anole.  So maybe it's not worth sweating
over that.  Perhaps what we really ought to be working on is
finding MSVC equivalents for __builtin_clz and friends.

Anyway, getting back to the presented patch, I find myself a bit
dissatisfied with it because it seems like it's leaving something
on the table.  Specifically, looking at the generated assembly
code on a couple of architectures, the setup logic generated by

        tsize = (size - 1) >> ALLOC_MINBITS;

looks like it costs as much or more as the clz proper.  I'm not
sure we can get rid of the subtract-one step, but couldn't the
right shift be elided in favor of changing the constant we
subtract clz's result from?  Shifting off those bits separately
made sense in the old implementation, but assuming that CLZ is
more or less constant-time, it doesn't with CLZ.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: use CLZ instruction in AllocSetFreeIndex()
Следующее
От: John Naylor
Дата:
Сообщение: Re: use CLZ instruction in AllocSetFreeIndex()