Re: Optimize WindowAgg's use of tuplestores

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Optimize WindowAgg's use of tuplestores
Дата
Msg-id CAApHDvpJyvaKpye+5CzE8=22zJ5UhEynCo-KY8-gteSHctMvyA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Optimize WindowAgg's use of tuplestores  (Andy Fan <zhihuifan1213@163.com>)
Список pgsql-hackers
On Thu, 18 Jul 2024 at 19:56, Andy Fan <zhihuifan1213@163.com> wrote:
> I'm curious about why a 'unlikey' change can cause noticeable change,
> especially there is just one function call in the 'if-statement' (I am
> thinking more instrucments in the if-statement body, more changes it can
> cause).
>
> +       if (unlikely(winstate->buffer == NULL))
> +               prepare_tuplestore(winstate);

This isn't the branch being discussed.  We've not talked about whether
the above one is useful or not. The branch we were discussing is the
"if (winstate->all_first)", of which has a large number of
instructions inside it.

However, for this one, my understanding of this particular case is,
it's a very predictable branch as, even if the first prediction gets
it wrong the first time, it's not going to be wrong for long as the
condition is false for all subsequent calls. So, from there, if you
assume that the instruction decoder is always going to fetch the
correct instructions according to the branch predictor's correct
prediction (the ones for branch not taken), then the only benefit
possible is that the next instructions to execute are the next
instructions in the code rather than instructions located far away,
possibly on another cacheline that needs to be loaded from RAM or
higher cache levels.  Adding the unlikely() should coax the compiler
into laying out the code so the branch's instructions at the end of
the function and that, in theory, should reduce the likelihood of
frontend stalls waiting for cachelines for further instructions to
arrive from RAM or higher cache levels. That's my theory, at least.  I
expected to see perf stat show me a lower "stalled-cycles-frontend"
with the v2 patch than v1, but didn't see that when I looked, so my
theory might be wrong.

For the case you're mentioning above, calling the function isn't very
many instructions, so the benefits are likely very small with a branch
this predictable.

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Следующее
От: Joe Conway
Дата:
Сообщение: Re: CI, macports, darwin version problems