Use int8 for int4/int2 aggregate accumulators?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Use int8 for int4/int2 aggregate accumulators?
Дата
Msg-id 24638.997133222@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Use int8 for int4/int2 aggregate accumulators?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
This was discussed on pgsql-general a little bit on 21-July, but the
discussion died off without reaching a conclusion.  I'd like to
put out a concrete proposal and see if anyone has objections.

1. SUM() and AVG() for int2 and int4 inputs should accumulate the
running sum as an INT8, not a NUMERIC, for speed reasons.  INT8 seems
large enough to avoid overflow in practical situations.  The final
output datatype of AVG() will still be NUMERIC, but the final output
of SUM() will become INT8 for these two input types.

2. STDDEV() and VARIANCE() for int2 and int4 inputs will continue to
use NUMERIC for accuracy and overflow reasons (accumulating sum(x^2)
is much more prone to overflow than sum(x)).  So will all these
aggregates for INT8.

3. As a separate proposal, we could change COUNT()'s running counter
and output datatype from INT4 to INT8.  This would make it a little
slower but effectively overflow-proof.


All of these changes are within the latitude that the SQL92 spec
affords (it just says that the output values are exact numeric with
implementation-defined precision and scale).  Issues to consider are:

* On machines with no 8-byte-int C datatype, the accumulator would
effectively be int4.  This would make the behavior no worse than
currently for COUNT(), and no worse than it was in 7.0 for SUM() and
AVG(), so that doesn't bother me a whole lot.  But it would be a
new source of cross-platform behavioral differences.

* Changing the output datatype of these operations --- especially COUNT
--- might affect or even break applications.  We got a few complaints,
not many, about changing SUM() and AVG() from integer to NUMERIC output
in 7.1.  Changing SUM() to INT8 isn't likely to hurt anyone who survived
that transition.  But COUNT() is much more widely used and is more
likely to affect people.  Should we keep it at INT4 output to avoid
compatibility problems?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Notes about int8 sequences
Следующее
От: mlw
Дата:
Сообщение: Re: Re: AW: Re: OID wraparound: summary and proposal