Re: [HACKERS] Sum aggregate calculation for single precsion real

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: [HACKERS] Sum aggregate calculation for single precsion real
Дата
Msg-id fd9e577d-6f58-134b-79aa-61d96ee52183@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] Sum aggregate calculation for single precsion real  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] Sum aggregate calculation for single precsion real  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Список pgsql-hackers


On 13.02.2017 19:20, Tom Lane wrote:
Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes:
I wonder why SUM aggregate is calculated for real (float4) type using 
floating point accumulator?
If you can't deal with the vagaries of floating-point arithmetic, you
shouldn't be storing your data in float format.  Use numeric.

4-byte floats are widely used for example in trading applications just because it is two times shorter then double and range of stored data is relatively small (do not need a lot of significant digits). At the same time volume of stored data is very large and switching from float4 to float8 will almost double it. It requires two times more storage and almost two times increase query execution time.
So this is not acceptable answer.


Are there are reasons of using float4pl function for SUM aggregate instead of float4_accum?
The latter is probably a good two orders of magnitude slower, and it
wouldn't really do much to solve the inherent accuracy problems of
adding float4 values that have a wide dynamic range.

It is not true - please notice query execution time of this two queries:

postgres=# select sum(l_quantity)  from lineitem where l_shipdate <= '1998-12-01';
     sum    
-------------
 1.52688e+09
(1 row)

Time: 2858.852 ms
postgres=# select sum(l_quantity+0.0)  from lineitem where l_shipdate <= '1998-12-01';
    sum    
------------
 1529738036
(1 row)

Time: 3174.529 ms


Looks like now in Postgres aggregate calculation itself is not a bottleneck, comparing with tuple deform cost.



The expectation for SUM(float4) is that you want speed and are
prepared to cope with the consequences.  It's easy enough to cast your
input to float8 if you want a wider accumulator, or to numeric if
you'd like more stable (not necessarily more accurate :-() results.
I do not think it's the database's job to make those choices for you.

From my point of your it is strange and wrong expectation.
I am choosing "float4" type for a column just because it is enough to represent range of data I have and I need to minimize size of record.
But when I am calculating sum, I expect to receive more or less precise result. Certainly I realize that even in case of using double it is possible to loose precision while calculation and result may depend on sum order (if we add very small and very larger values). But in real use cases (for example in trading data) such large difference in attribute values is very rare. If you have, for example, stock price, then it is very unlikely that one company has value 0.000001 and another 10000000.0
At least in TPC-H example (which certainly deal with dummy generated data), double type produce "almost price" result.

		regards, tom lane

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: [HACKERS] Re: [COMMITTERS] pgsql: Remove all references to "xlog" fromSQL-callable functions in p
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] amcheck (B-Tree integrity checking tool)