Re: Auto-vectorization speeds up multiplication of large-precision numerics

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Auto-vectorization speeds up multiplication of large-precision numerics
Дата
Msg-id 1709987.1599511760@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Auto-vectorization speeds up multiplication of large-precision numerics  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Auto-vectorization speeds up multiplication of large-precision numerics  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Список pgsql-hackers
I wrote:
> I experimented with a few different ideas such as adding restrict
> decoration to the pointers, and eventually found that what works
> is to write the loop termination condition as "i2 < limit"
> rather than "i2 <= limit".  It took me a long time to think of
> trying that, because it seemed ridiculously stupid.  But it works.

I've done more testing and confirmed that both gcc and clang can
vectorize the improved loop on aarch64 as well as x86_64.  (clang's
results can be confusing because -ftree-vectorize doesn't seem to
have any effect: its vectorizer is on by default.  But if you use
-fno-vectorize it'll go back to the old, slower code.)

The only buildfarm effect I've noticed is that locust and
prairiedog, which are using nearly the same ancient gcc version,
complain

c1: warning: -ftree-vectorize enables strict aliasing. -fno-strict-aliasing is ignored when Auto Vectorization is used.

which is expected (they say the same for checksum.c), but then
there are a bunch of

warning: dereferencing type-punned pointer will break strict-aliasing rules

which seems worrisome.  (This sort of thing is the reason I'm
hesitant to apply higher optimization levels across the board.)
Both animals pass the regression tests anyway, but if any other
compilers treat -ftree-vectorize as an excuse to apply stricter
optimization assumptions, we could be in for trouble.

I looked closer and saw that all of those warnings are about
init_var(), and this change makes them go away:

-#define init_var(v)        MemSetAligned(v, 0, sizeof(NumericVar))
+#define init_var(v)        memset(v, 0, sizeof(NumericVar))

I'm a little inclined to commit that as future-proofing.  It's
essentially reversing out a micro-optimization I made in d72f6c750.
I doubt I had hard evidence that it made any noticeable difference;
and even if it did back then, modern compilers probably prefer the
memset approach.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Andres Freund"
Дата:
Сообщение: Re: Improving connection scalability: GetSnapshotData()
Следующее
От: Jeff Davis
Дата:
Сообщение: Re: Disk-based hash aggregate's cost model