Re: Weird function behavior from Sept 11 snapshot

Поиск
Список
Период
Сортировка
От Thomas Lockhart
Тема Re: Weird function behavior from Sept 11 snapshot
Дата
Msg-id 39BE4484.C453B792@alumni.caltech.edu
обсуждение исходный текст
Ответ на Weird function behavior from Sept 11 snapshot  (Mike Mascari <mascarm@mascari.com>)
Ответы Re: Weird function behavior from Sept 11 snapshot
Список pgsql-hackers
> ... Does anyone want to check the time for sum() or avg() on an
> int4 column over a large table, under both 7.0.* and current?

For 262144 rows on the current tree, I get the following:

sum(int4):               12.0 seconds
sum(float8):              5.2 seconds
sum(cast(int4 as float8): 5.7 seconds

This includes startup costs, etc, and are the minimum times from several
runs (there is pretty wide variability, presumably due to disk caching,
swapping, etc on my laptop). It is a safe bet that the original int4
implementation was as fast or faster than the float8 result above (int4
does not require palloc() calls).

> Actually, using a float8 accumulator would work pretty well; assuming
> IEEE float8, you'd only start to get roundoff error when the running
> sum exceeds 2^52 or so.  However the SQL92 spec is insistent that sum()
> deliver an exact-numeric result when applied to exact-numeric data,
> and with a float accumulator we'd be at the mercy of the quality of the
> local implementation of floating point.

A problem with float8 is that it is possible to reach a point in the
accumulation where subsequent input values are ignored in the sum. This
is different than just roundoff error, since it degrades ungracefully
from that point on.

> I could see offering variant aggregates, say "sumf" and "avgf", that
> use float8 accumulation.  Right now the user can get the same result
> by writing "sum(foo::float8)" but it might be wise to formalize the
> idea ...

How about using int8 for the accumulator (on machines which support it
of course)? Falling back to float8 or numeric on other machines? Or
perhaps we could have an option (runtime??) to switch accumulator modes.

I like the idea of something like "sumf" to get alternative algorithms,
but it would be nice if basic sum() could be a bit more optimized than
currently.
                      - Thomas


В списке pgsql-hackers по дате отправления:

Предыдущее
От: devik@cdi.cz
Дата:
Сообщение: Re: Performance improvement hints
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Weird function behavior from Sept 11 snapshot