Apply auto-vectorization to the inner loop of numeric multiplication.
Compile numeric.c with -ftree-vectorize where available, and adjust
the innermost loop of mul_var() so that it is amenable to being
auto-vectorized. (Mainly, that involves making it process the arrays
left-to-right not right-to-left.)
Applying -ftree-vectorize actually makes numeric.o smaller, at least
with my compiler (gcc 8.3.1 on x86_64), and it's a little faster too.
Independently of that, fixing the inner loop to be vectorizable also
makes things a bit faster. But doing both is a huge win for
multiplications with lots of digits. For me, the numeric regression
test is the same speed to within measurement noise, but numeric_big
is a full 45% faster.
We also looked into applying -funroll-loops, but that makes numeric.o
bloat quite a bit, and the additional speed improvement is very
marginal.
Amit Khandekar, reviewed and edited a little by me
Discussion: https://postgr.es/m/CAJ3gD9evtA_vBo+WMYMyT-u=keHX7-r8p2w7OSRfXf42LTwCZQ@mail.gmail.com
Branch
------
master
Details
-------
https://git.postgresql.org/pg/commitdiff/88709176236caf3cb9655acda6bad2df0323ac8f
Modified Files
--------------
src/backend/utils/adt/Makefile | 3 +++
src/backend/utils/adt/numeric.c | 15 ++++++++++++---
2 files changed, 15 insertions(+), 3 deletions(-)