Re: [HACKERS] Fix performance of generic atomics

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: [HACKERS] Fix performance of generic atomics
Дата	6 сентября 2017 г. 23:17:18
Msg-id	13707.1504718238@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: [HACKERS] Fix performance of generic atomics (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: [HACKERS] Fix performance of generic atomics
Список	pgsql-hackers

Дерево обсуждения

Robert Haas <robertmhaas@gmail.com> writes:
> It's not a question of whether the return value is used, but of
> whether the updated value of *old is used.

Right, but if we re-read "old" in the loop, and if the primitive
doesn't return "old" (or does, but call site ignores it) then in
principle the CAS might be strength-reduced a bit.  Whether that
can win enough to be better than removing the unlocked read,
I dunno.

In the case at hand, looking at a loop like
while (count-- > 0){    (void) pg_atomic_fetch_or_u32(myptr, 1);}

with our HEAD code and RHEL6's gcc I get this for the inner loop:

.L9:movl    (%rdx), %eaxmovl    %eax, %ecxorl    $1, %ecxlock                cmpxchgl    %ecx,(%rdx)    setz        %cl
      testb    %cl, %clje    .L9subq    $1, %rbxtestq    %rbx, %rbxjg    .L9

Applying the proposed generic.h patch, it becomes

.L10:movl    (%rsi), %eax
.L9:movl    %eax, %ecxorl    $1, %ecxlock                cmpxchgl    %ecx,(%rdx)    setz        %cl        testb
%cl,%clje    .L9subq    $1, %rbxtestq    %rbx, %rbxjg    .L10

Note that in both cases the cmpxchgl is coming out of the asm construct in
pg_atomic_compare_exchange_u32_impl from atomics/arch-x86.h, so that even
if a simpler assembly instruction were possible given that we don't need
%eax to be updated, there's no chance of that actually happening.  This
gets back to the point I made in the other thread that maybe the
arch-x86.h asm sequences are not an improvement over the gcc intrinsics.

The code is the same if the loop is modified to use the result of
pg_atomic_fetch_or_u32, so I won't bother showing that.

Adding the proposed generic-gcc.h patch, the loop reduces to

.L11:lock orl    $1, (%rax)subq    $1, %rbxtestq    %rbx, %rbxjg    .L11

or if we make the loop do    junk += pg_atomic_fetch_or_u32(myptr, 1);
then we get

.L13:movl    (%rsi), %eax
.L10:movl    %eax, %edimovl    %eax, %ecxorl    $1, %ecxlock cmpxchgl    %ecx, (%rdx)jne    .L10addl    %edi, %r8dsubq
 $1, %rbxtestq    %rbx, %rbxjg    .L13

So that last is slightly better than the generic.h + asm CAS version,
perhaps not meaningfully so --- but it's definitely better when
the compiler can see the result isn't used.

In short then, given the facts on the ground here, in particular the
asm-based CAS primitive at the heart of these loops, it's clear that
taking the read out of the loop can't hurt.  But that should be read
as a rather narrow conclusion.  With a different compiler and/or a
different version of pg_atomic_compare_exchange_u32_impl, maybe the
answer would be different.  And of course it's moot once the
generic-gcc.h patch is applied.

Anyway, I don't have a big objection to applying this.  My concern
is more that we need to be taking a harder look at other parts of
the atomics infrastructure, because tweaks there are likely to buy
much more.
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Simon Riggs
Дата: 06 сентября 2017 г., 22:59:20
Сообщение: Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

Следующее

От: Tom Lane
Дата: 06 сентября 2017 г., 23:27:36
Сообщение: Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Fix performance of generic atomics

Предыдущее

Следующее