Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.

Поиск
Список
Период
Сортировка
От Joel Jacobson
Тема Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
Дата
Msg-id 32510d71-8c00-4896-9c11-5ea4b3c01ffc@app.fastmail.com
обсуждение исходный текст
Ответ на Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Ответы Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.
Список pgsql-hackers
On Wed, Jul 3, 2024, at 20:57, Dean Rasheed wrote:
> Ah yes, I think I was looking at a newer version of the code where I'd
> already fixed that bug. Unless you think there are still bugs in any
> of the boundary checks, which is entirely possible.

Ah, that explains it.
And no, I can't find any other bugs in the boundary checks.

> Ah, I can see what's going on. It's perhaps best illustrated with a
> simple example. Suppose you are multiplying a 4-digit integer x by a
> 2-digit integer y (both with dscale=0). Then the terms of the full
> product computed by mul_var() or mul_var_small() would look something
> like this:
>
>             x0     x1     x2     x3
>                         * y0     y1
>   ---------------------------------
>   x0*y0  x1*y0  x2*y0  x3*y0
>          x0*y1  x1*y1  x2*y1  x3*y1
>
> In the reduced-rscale case, it might perform a truncated computation,
> computing just the first 3 columns (say), and discarding the last two
> columns. Therefore it would skip the 3 rightmost digit products.
>
> However, in mul_var_int(), y0 and y1 have been combined into a single
> integer equal to y0*NBASE+y1, and the terms of full product are
> computed as follows:
>
>   x0*(y0*NBASE+y1)  x1*(y0*NBASE+y1)  x2*(y0*NBASE+y1)  x3*(y0*NBASE+y1)
>
> In the full product, that gives the same result, but if you follow the
> same rule in the reduced-rscale case, skipping the last two terms, it
> would actually discard 4 digit products, making it less accurate.
>
> That could be avoided by increasing maxdigits by 1 in mul_var_int() so
> it would always be at least as accurate as it was before, but that
> might not really be necessary. However, if we implemented
> mul_var_int64() in the same way, it would be discarding much
> higher-order digit products, and so we probably would have to increase
> maxdigits to get sufficiently accurate results. But there's an even
> bigger problem: the results would be different between platforms that
> did and didn't have 128-bit integers, which I really don't like. We
> could avoid that by not using it in reduced-rscale cases, but that
> would involve another test condition.
>
> By contrast, mul_var_small() is intended to replicate the arithmetic
> in mul_var() exactly (but in a different order) for all rscales. So if
> you've found any cases where they give different results, that's a
> bug.
>
> In light of that, it might be that mul_var_small() is the better
> option, rather than mul_var_int(), but it'd be interesting to see how
> they compare in terms of performance first.

Thanks for explaining, very helpful.

I agree on your reasoning about the pros and cons.
Not sure yet which version I prefer. Let's see how it evolves.

I've done some benchmarks.
Haven't tested Intel and AMD yet, but this is what I get on my Apple M3 Max:

--
-- varndigits=1
--

-- HEAD
SELECT SUM(numeric_mul_head(var1,var2,0)) FROM bench_mul_var_var1ndigits_1;
Time: 2976.896 ms (00:02.977)
Time: 2984.759 ms (00:02.985)
Time: 2970.364 ms (00:02.970)

-- mul_var_int() patch
SELECT SUM(numeric_mul_patch_int(var1,var2,0)) FROM bench_mul_var_var1ndigits_1;
Time: 2790.227 ms (00:02.790)
Time: 2786.338 ms (00:02.786)
Time: 2784.957 ms (00:02.785)

-- mul_var_small() patch
SELECT SUM(numeric_mul_patch_small(var1,var2,0)) FROM bench_mul_var_var1ndigits_1;
Time: 2770.211 ms (00:02.770)
Time: 2760.685 ms (00:02.761)
Time: 2773.221 ms (00:02.773)

--
-- varndigits=2
--

-- HEAD
SELECT SUM(numeric_mul_head(var1,var2,0)) FROM bench_mul_var_var1ndigits_2;
Time: 3353.258 ms (00:03.353)
Time: 3273.055 ms (00:03.273)
Time: 3266.392 ms (00:03.266)

-- mul_var_int() patch
SELECT SUM(numeric_mul_patch_int(var1,var2,0)) FROM bench_mul_var_var1ndigits_2;
Time: 2694.169 ms (00:02.694)
Time: 2687.935 ms (00:02.688)
Time: 2692.398 ms (00:02.692)

-- mul_var_small() patch
SELECT SUM(numeric_mul_patch_small(var1,var2,0)) FROM bench_mul_var_var1ndigits_2;
Time: 2997.685 ms (00:02.998)
Time: 2984.418 ms (00:02.984)
Time: 2986.976 ms (00:02.987)

--
-- varndigits=3
--

-- HEAD
SELECT SUM(numeric_mul_head(var1,var2,0)) FROM bench_mul_var_var1ndigits_3;
Time: 3471.391 ms (00:03.471)
Time: 3384.114 ms (00:03.384)
Time: 3387.031 ms (00:03.387)

-- mul_var_int() patch
SELECT SUM(numeric_mul_patch_int(var1,var2,0)) FROM bench_mul_var_var1ndigits_3;
Time: 3384.428 ms (00:03.384)
Time: 3398.044 ms (00:03.398)
Time: 3393.727 ms (00:03.394)

-- mul_var_small() patch
SELECT SUM(numeric_mul_patch_small(var1,var2,0)) FROM bench_mul_var_var1ndigits_3;
Time: 3100.567 ms (00:03.101)
Time: 3114.225 ms (00:03.114)
Time: 3116.137 ms (00:03.116)

Interesting, mul_var_small() seems to be the winner for var1ndigits=3
and mul_var_int() to be the winner for var1ndigits=2,
and about the same for var1ndigits=1.

>> What can be said about mul_var()'s contract with regards to rscale?
>> It's the number of decimal digits requested by the caller, and if not
>> requesting full precision, then the decimal digits might not be accurate,
>> but can something be said about how far off they can be?
>>
>
> I wouldn't expect it to ever be off by more than 1, given that
> MUL_GUARD_DIGITS = 2, which corresponds to 8 decimal digits, and the
> number of digits in the smaller input (and hence the number of digit
> products in each column) is limited to something like 16,000 NBASE
> digits.

OK, so then the cases I found where it was off by 2 for the mul_var_int() patch
are unexpected?

Regards,
Joel



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: improve performance of pg_dump --binary-upgrade
Следующее
От: "Joel Jacobson"
Дата:
Сообщение: Re: Optimize numeric multiplication for one and two base-NBASE digit multiplicands.