Re: Auto-vectorization speeds up multiplication of large-precision numerics

Поиск
Список
Период
Сортировка
От Amit Khandekar
Тема Re: Auto-vectorization speeds up multiplication of large-precision numerics
Дата
Msg-id CAJ3gD9eEXJ2CHMSiOehvpTZu3Ap2GMi5jaXhoZuW=3XJLmZWpw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Auto-vectorization speeds up multiplication of large-precisionnumerics  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Ответы Re: Auto-vectorization speeds up multiplication of large-precision numerics  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Re: Auto-vectorization speeds up multiplication of large-precision numerics  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
On Wed, 10 Jun 2020 at 04:20, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-06-09 13:50, Amit Khandekar wrote:
> > Also, the regress/sql/numeric_big test itself speeds up by 80%
>
> That's nice.  I can confirm the speedup:
>
> -O3 without the patch:
>
>       numeric                      ... ok          737 ms
> test numeric_big                  ... ok         1014 ms
>
> -O3 with the patch:
>
>       numeric                      ... ok          680 ms
> test numeric_big                  ... ok          580 ms
>
> Also:
>
> -O2 without the patch:
>
>       numeric                      ... ok          693 ms
> test numeric_big                  ... ok         1160 ms
>
> -O2 with the patch:
>
>       numeric                      ... ok          677 ms
> test numeric_big                  ... ok          917 ms
>
> So the patch helps either way.

Oh, I didn't observe that the patch helps numeric_big.sql to speed up
to some extent even with -O2. For me, it takes 805 on head and 732 ms
with patch. One possible reason that I can think of is : Because of
the forward loop traversal, pre-fetching might be helping. But this is
just a wild guess.

-O3 : HEAD
test numeric                      ... ok          102 ms
test numeric_big                  ... ok          803 ms

-O3 : patched :
test numeric                      ... ok          100 ms
test numeric_big                  ... ok          450 ms


-O2 : HEAD
test numeric                      ... ok          100 ms
test numeric_big                  ... ok          805 ms

-O2 patched :
test numeric                      ... ok          103 ms
test numeric_big                  ... ok          732 ms

> But it also seems that without the patch, -O3 might
> be a bit slower in some cases. This might need more testing.

For me, there is no observed change in the times with -O2 versus -O3,
on head. Are you getting a consistent slower numeric*.sql tests with
-O3 on current code ? Not sure what might be the reason.
But this is not related to the patch. Is it with the context of patch
that you are suggesting that it might need more testing ? There might
be existing cases that might be running a bit slower with O3, but
that's strange actually. Probably optimization in those cases might
not be working as thought by the compiler, and in fact they might be
working negatively. Probably that's one of the reasons -O2 is the
default choice.


>
> > For the for loop to be auto-vectorized, I had to simplify it to
> > something like this :
>
> Well, how do we make sure we keep it that way?  How do we prevent some
> random rearranging of the code or some random compiler change to break
> this again?

I believe the compiler rearranges the code segments w.r.t. one another
when those are independent of each other. I guess the compiler is able
to identify that. With our case, it's the for loop. There are some
variables used inside it, and that would prevent it from moving the
for loop. Even if the compiler finds it safe to move relative to
surrounding code, it would not spilt the for loop contents themselves,
so the for loop will remain intact, and so would the vectorization,
although it seems to keep an unrolled version of the loop in case it
is called with smaller iteration counts. But yes, if someone in the
future tries to change the for loop, it would possibly break the
auto-vectorization. So, we should have appropriate comments (patch has
those). Let me know if you find any possible reasons due to which the
compiler would in the future stop the vectorization even when there is
no change in the for loop.

It might look safer if we take the for loop out into an inline
function; just to play it safe ?



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jelte Fennema
Дата:
Сообщение: walkdir does not honor elevel because of call to AllocateDir,possibly causing issues in abort handler
Следующее
От: David Rowley
Дата:
Сообщение: Re: Parallel Seq Scan vs kernel read ahead