pgsql: Improve the accuracy of floating point statistical aggregates.

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема pgsql: Improve the accuracy of floating point statistical aggregates.
Дата
Msg-id E1g8joG-0007px-60@gemulon.postgresql.org
обсуждение исходный текст
Список pgsql-committers
Improve the accuracy of floating point statistical aggregates.

When computing statistical aggregates like variance, the common
schoolbook algorithm which computes the sum of the squares of the
values and subtracts the square of the mean can lead to a large loss
of precision when using floating point arithmetic, because the
difference between the two terms is often very small relative to the
terms themselves.

To avoid this, re-work these aggregates to use the Youngs-Cramer
algorithm, which is a proven, numerically stable algorithm that
directly aggregates the sum of the squares of the differences of the
values from the mean in a single pass over the data.

While at it, improve the test coverage to test the aggregate combine
functions used during parallel aggregation.

Per report and suggested algorithm from Erich Schubert.

Patch by me, reviewed by Madeleine Thompson.

Discussion: https://postgr.es/m/153313051300.1397.9594490737341194671@wrigleys.postgresql.org

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/e954a727f0c8872bf5203186ad0f5312f6183746

Modified Files
--------------
src/backend/utils/adt/float.c            | 732 ++++++++++++++++++++-----------
src/test/regress/expected/aggregates.out | 128 ++++++
src/test/regress/sql/aggregates.sql      |  41 ++
3 files changed, 639 insertions(+), 262 deletions(-)


В списке pgsql-committers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: pgsql: Assign constraint name when cloning FK definition forpartitions
Следующее
От: Tom Lane
Дата:
Сообщение: pgsql: Propagate xactStartTimestamp and stmtStartTimestamp toparallel