Re: use CLZ instruction in AllocSetFreeIndex()

Поиск
Список
Период
Сортировка
От David Fetter
Тема Re: use CLZ instruction in AllocSetFreeIndex()
Дата
Msg-id 20191228021648.GK32763@fetter.org
обсуждение исходный текст
Ответ на Re: use CLZ instruction in AllocSetFreeIndex()  (John Naylor <john.naylor@2ndquadrant.com>)
Ответы Re: use CLZ instruction in AllocSetFreeIndex()  (John Naylor <john.naylor@2ndquadrant.com>)
Список pgsql-hackers
On Fri, Dec 27, 2019 at 07:02:02PM -0500, John Naylor wrote:
> On Fri, Dec 27, 2019 at 11:05 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > John Naylor <john.naylor@2ndquadrant.com> writes:
> > > On Fri, Dec 27, 2019 at 9:54 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> ... but couldn't the
> > >> right shift be elided in favor of changing the constant we
> > >> subtract clz's result from?  Shifting off those bits separately
> > >> made sense in the old implementation, but assuming that CLZ is
> > >> more or less constant-time, it doesn't with CLZ.
> >
> > > That makes sense -- I'll look into doing that.
> >
> > Actually, we could apply that insight to both code paths.
> > In the existing path, that requires assuming
> > ALLOCSET_NUM_FREELISTS+ALLOC_MINBITS <= 17, but that's OK.
> > (Nowadays I'd probably add a StaticAssert about that.)
> 
> I tried that in the attached files and got these results:
> 
>              current 6.14s
>                  clz 4.52s
>   clz no right shift 3.15s
>         lookup table 5.56s
> lookup table no right shift 7.34s
> 
> Here, "lookup table" refers to using the pg_leftmost_one_pos[] array
> and incrementing the result. Removing the shift operation from the CLZ
> case is clearly an improvement, and the main body goes from
> 
> movabsq $34359738367, %rax
> addq %rax, %rdi
> shrq $3, %rdi
> bsrl %edi, %eax
> xorl $-32, %eax
> addl $33, %eax
> 
> to
> 
> decl %edi
> bsrl %edi, %eax
> xorl $-32, %eax
> addl $30, %eax
> 
> The lookup table case is less clear. Removing the shift results in
> assembly that looks more like the C code and is slower for me. The
> standard lookup table code uses some magic constants and does its own
> constant folding by shifting 11 (8 + 3). In the absence of testing on
> platforms that will actually exercise this path, it seems the
> open-coded path should keep the shift for now. Thoughts?

It's probably worth doing the things you've found unambiguous gains
for as a patch, putting it on the next commitfest, and seeing what the
commitfest.cputube.org machinery has to say about it.

Maybe it'd be worth trying out a patch that enables CLZ for Windows,
but that seems like a separate issue.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Maciek Sakrejda
Дата:
Сообщение: Re: Duplicate Workers entries in some EXPLAIN plans
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Server crash with Master-Slave configuration.