Обсуждение: 6.5.0 - Overflow bug in AVG( )
% psql test1
Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL
[PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
type \? for help on slash commands type \q to quit type \g or terminate with semicolon to execute queryYou are
currentlyconnected to the database: test1
test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";count| max| min| avg
------+-------+-----+----
677719|3075717|61854|-251
(1 row)
Overflow, perhaps?
Gene Sokolov.
> [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
> test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
> count| max| min| avg
> ------+-------+-----+----
> 677719|3075717|61854|-251
> (1 row)
> Overflow, perhaps?
Of course. These are integer fields? I've been considering changing
all accumulators (and results) for integer aggregate functions to
float8, but have not done so yet. I was sort of waiting for a v7.0
release, but am not sure why...
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California
> > [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
> > test1=> select count(*), max("ID"), min("ID"), avg("ID") from
"ItemsBars";
> > count| max| min| avg
> > ------+-------+-----+----
> > 677719|3075717|61854|-251
> > (1 row)
> > Overflow, perhaps?
>
> Of course. These are integer fields? I've been considering changing
Yes, the fields are int4
> all accumulators (and results) for integer aggregate functions to
> float8, but have not done so yet. I was sort of waiting for a v7.0
> release, but am not sure why...
Float8 accumulator seems to be a good solution if AVG is limited to
int/float types. I wonder if it could produce system dependency in AVG due
to rounding errors. Some broader solution should be considered though if you
want AVG to work on numeric/decimal as well.
Gene Sokolov.
> Float8 accumulator seems to be a good solution if AVG is limited to
> int/float types. I wonder if it could produce system dependency in AVG due
> to rounding errors. Some broader solution should be considered though if you
> want AVG to work on numeric/decimal as well.
The implementation can be specified for each datatype individually, so
that's not a problem. afaik the way numeric/decimal work it would be
fine to use those types as their own accumulators. It's mostly the
int2/int4/int8 types which are the problem, since they silently
overflow (on most machines?).
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California
Thomas Lockhart wrote:
>
> > [PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
> > test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
> > count| max| min| avg
> > ------+-------+-----+----
> > 677719|3075717|61854|-251
> > (1 row)
> > Overflow, perhaps?
>
> Of course. These are integer fields? I've been considering changing
> all accumulators (and results) for integer aggregate functions to
> float8, but have not done so yet. I was sort of waiting for a v7.0
> release, but am not sure why...
Wouldn't it be better to use NUMERIC for the avg(int) state
values? It will never loose any significant digit.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #
> > Of course. These are integer fields? I've been considering changing
> > all accumulators (and results) for integer aggregate functions to
> > float8, but have not done so yet. I was sort of waiting for a v7.0
> > release, but am not sure why...
>
> Wouldn't it be better to use NUMERIC for the avg(int) state
> values? It will never loose any significant digit.
Sure. It would be fast, right? avg(int) is likely to be used a lot,
and should be as fast as possible.
- Thomas
--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California
"Gene Sokolov" <hook@aktrad.ru> writes:
> test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
> count| max| min| avg
> ------+-------+-----+----
> 677719|3075717|61854|-251
> Overflow, perhaps?
sum() and avg() for int fields use int accumulators. You might want
to use avg(float8(field)) to get a less-likely-to-overflow result.
Someday it'd be a good idea to revise the sum() and avg() aggregates
to use float or numeric accumulators in all cases. This'd require
inventing a few more cross-data-type operators...
regards, tom lane
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> Some broader solution should be considered though if you
>> want AVG to work on numeric/decimal as well.
> The implementation can be specified for each datatype individually,
In the current implementation, each datatype does use its own type as
the accumulator --- and also as the counter. float8 and numeric are
fine, float4 is sort of OK (a float8 accumulator would be better for
accuracy reasons), int4 loses, int2 loses *bad*.
To fix it we'd need to invent operators that do the appropriate cross-
data-type operations. For example, int4 avg using float8 accumulator
would need "float8 + int4 yielding float8" and "float8 / int4 yielding
int4", neither of which are to be found in pg_proc at the moment. But
it's a straightforward thing to do.
int8 is the only integer type that I wouldn't want to use a float8
accumulator for. Maybe numeric would be the appropriate thing here,
slow though it be.
Note that switching over to float accumulation would *not* be real
palatable until we have fixed the memory-leak issue. avg() on int4
doesn't leak memory currently, but it would with a float accumulator...
regards, tom lane
Thomas Lockhart wrote:
>
> > > Of course. These are integer fields? I've been considering changing
> > > all accumulators (and results) for integer aggregate functions to
> > > float8, but have not done so yet. I was sort of waiting for a v7.0
> > > release, but am not sure why...
> >
> > Wouldn't it be better to use NUMERIC for the avg(int) state
> > values? It will never loose any significant digit.
>
> Sure. It would be fast, right? avg(int) is likely to be used a lot,
> and should be as fast as possible.
I think it would be fast enough, even if I have things in
mind how to speed it up. But that would result in a total
rewrite of NUMERIC from scratch.
The only math function of NUMERIC which is time critical for
AVG() is ADD. And even for int8 the number of digits it has
to perform is relatively small. I expect the time spent on
that is negligible compared to the heap scanning required to
get all the values.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #