Re[2]: [HACKERS] Re: bug on aggregate function AVG()

Поиск
Список
Период
Сортировка
От Sferacarta Software
Тема Re[2]: [HACKERS] Re: bug on aggregate function AVG()
Дата
Msg-id 18641.981105@bo.nettuno.it
обсуждение исходный текст
Ответ на Re: [HACKERS] Re: bug on aggregate function AVG()  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
TL> "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes:
>>> I see that AVG() and SUM() uses an accumulator not enough big to hold
>>> the result of calculation, but the point is: should we consider this
>>> thing a "terrible" bug or an acceptable feature ?
>>> What about to convert every accumulator to float8 ?

>> imho we can't do that because we lose the exact qualities of integers.
>> If you accumulate in float8, and if you take a sum over a very large
>> table, you might start ignoring values.

TL> I think that SUM() on an int column ought to produce an exact result.
TL> AVG() is a different story --- I think you could make a good case that
TL> it ought to produce a float result even when the input is integers,
TL> since the exact right answer would typically not be integral anyway.
TL> (A programmer who wants the average rounded to integer should have to
TL> write something like ROUND(AVG(x)), I think.)

TL> One way you could postpone the overflow problem for SUM() is to
TL> accumulate the running sum in a "long", or even better "long long" where
TL> available, even if the input datatype is a smaller flavor of int.
TL> You might still find that the end result overflows, but if the incoming
TL> values are not all the same sign then this might avoid an unnecessary
TL> intermediate overflow.
TL>                         regards, tom lane

I see there are four new built in functions:
int42pl,int42div, int84pl and int84div.
I created four new aggregate functions SUM(int2), SUM(int4), AVG(int2) and
AVG(int4) using these new functions, in this way the accumulator for int2
is int4 and the accumulator for int4 is int8, this reduce the probabilities
of overflow cases.
Remains just the overflow problem for int8.

Is there any reason for not use these functions on SUM() and AVG() on
official release ?
------------------------------------------------
drop aggregate sum2 int2;
DROP
create aggregate sum2(       sfunc1    = int42pl,       basetype  = int2,       stype1    = int4,       initcond1 =
'0'
);
CREATE
drop aggregate avg2 int2;
DROP
create aggregate avg2(       sfunc1    = int42pl,       basetype  = int2,       stype1    = int4,       initcond1 =
'0',      sfunc2    = int2inc,       stype2    = int2,       finalfunc = int42div,       initcond2 = '0'
 
);
CREATE
drop aggregate sum4 int4;
DROP
create aggregate sum4(       sfunc1    = int84pl,       basetype  = int4,       stype1    = int8,       initcond1 =
'0'
);
CREATE

drop aggregate avg4 int4;
DROP
create aggregate avg4(       sfunc1    = int84pl,       basetype  = int4,       stype1    = int8,       initcond1 =
'0',      sfunc2    = int4inc,       stype2    = int4,       finalfunc = int84div,       initcond2 = '0'
 
);
CREATE
drop table b;
DROP
create table b(       i2 int2,       i4 int4,       i8 int8
);
CREATE
insert into b values (32767,2147483647,9223372036620802086);
NOTICE:  Integer input '9223372036620802086' is out of range; promoted to float
^^^^^^^^^^^^^^^^^^^^                      what's happen here? seems a test for an int4                       not an
int8.
INSERT 246255 1
insert into b values (32767,2147483647,2147483647);
INSERT 246256 1
select * from b;  i2|        i4|                 i8
-----+----------+-------------------
32767|2147483647|9223372036620802048
32767|2147483647|         2147483647
(2 rows)

select sum2(i2) as new_sum2,      sum(i2) as old_sum2,      sum4(i4) as new_sum4,      sum(i4) as old_sum4,
sum(i8)as only_sum8
 
from b;
new_sum2|old_sum2|  new_sum4|old_sum4|           only_sum8
--------+--------+----------+--------+--------------------  65534|      -2|4294967294|      -2|-9223372034941265921
(1 row)

select avg2(i2) as new_avg2,      avg(i2) as old_avg2,      avg4(i4) as new_avg4,      avg(i4) as old_avg4,
avg(i8)as only_sum8
 
from b;
new_avg2|old_avg2|  new_avg4|old_avg4|           only_sum8
--------+--------+----------+--------+--------------------  32767|      -1|2147483647|      -1|-4611686017470632960
(1 row)

Is there any reason for not use these functions on SUM() and AVG() on
official release ?
- Jose' -




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Terry Mackintosh
Дата:
Сообщение: quoting problem?
Следующее
От: "Thomas G. Lockhart"
Дата:
Сообщение: Re: [HACKERS] Open the flood gates...v6.4 is tag'd...