Re: BUG #15307: Low numerical precision of (Co-) Variance

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: BUG #15307: Low numerical precision of (Co-) Variance
Дата
Msg-id CAEZATCXRXn40RJjwfSz0FoJfHAX9Y6gwYB2HdL30KVE8ozwCyA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15307: Low numerical precision of (Co-) Variance  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Список pgsql-bugs
On 9 August 2018 at 12:02, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> ... the YC algorithm is probably preferable. For the record,
> attached are both versions that I tried.
>

Here is an updated, more complete patch, based on the YC algorithm,
with updated regression tests for the September commitfest.

All the existing tests pass unchanged, although I'm somewhat surprised
that the current tests pass with no platform variations. I've added
new tests to cover infinity/NaN handling, parallel aggregation and
confirm the improved accuracy with large offsets. The latter tests
operate well within in the limits of double precision arithmetic, so I
wouldn't expect any platform variation, but that's difficult
guarantee. If there are problems, it may be necessary to round the
test results.

Notable changes from the previous patch:

I have rewritten the overflow checks in the accum functions to be
clearer and more efficient which, if anything, makes these aggregates
now slightly faster than HEAD. More importantly though, I've added
explicit code to force Sxx to be NaN if any input is infinite, which
the previous coding didn't guarantee. I think NaN is the right result
for quantities like variance, if any input value is infinite, since it
logically involves 'infinity minus infinity'. That's also consistent
with the current behaviour.

I have also made the aggregate combine functions SQL-callable to make
testing easier -- there was a bug in the previous version due to a
typo which meant that float8_regr_combine() was incorrect when N1 was
non-zero and N2 was zero. That situation is unlikely to happen in
practice, and difficult to provoke deliberately without manually
calling the combine function, which is why I didn't spot it before.
The new tests cover all code branches, and make it easier to see that
the combine functions are producing the correct results.

Regards,
Dean

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #15357: Data goes to wrong partition in HASH Partitionedtable
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: BUG #15350: Getting invalid cache ID: 11 Errors