Re: BUG #19340: Wrong result from CORR() function

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: BUG #19340: Wrong result from CORR() function
Дата
Msg-id CAEZATCU8rnyKN3z_5-osk3Bn8dtzWf9nKjTr2E16-ExXiESNrQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #19340: Wrong result from CORR() function  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: BUG #19340: Wrong result from CORR() function
Список pgsql-bugs
On Tue, 2 Dec 2025 at 17:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> It's fundamentally impossible to guarantee exact results with
> floating-point arithmetic, so if you are expecting that you need
> to readjust your expectations.  But having said that, it does
> seem a bit sad that we can't detect constant-input cases exactly.

Yes, indeed. I tested the following query:

SELECT n,
       (SELECT variance(1.3::float8) FROM generate_series(1, n)),
       (SELECT corr(1.3, 1.3) FROM generate_series(1, n))
  FROM generate_series(1, 10) g(n);

In v11 (with the old algorithm) this produces

 n  |       variance       | corr
----+----------------------+------
  1 |                      |
  2 |                    0 |
  3 |                    0 |
  4 |                    0 |
  5 |  3.5527136788005e-16 |    1
  6 |   2.368475785867e-16 |    1
  7 | 3.38353683695286e-16 |    1
  8 |                    0 |
  9 |                    0 |
 10 |                    0 |
(10 rows)

whereas in HEAD (with the Youngs-Cramer algorithm) it produces

 n  |        variance        | corr
----+------------------------+------
  1 |                        |
  2 |                      0 |
  3 |                      0 |
  4 |                      0 |
  5 |                      0 |
  6 |  5.259072701473412e-33 |    1
  7 |  4.382560584561177e-33 |    1
  8 |  3.756480501052437e-33 |    1
  9 | 3.2869204384208825e-33 |    1
 10 |  6.817316464872942e-33 |    1
(10 rows)

so the errors in the variance are smaller, but any non-zero error
makes the correlation completely wrong.

> I wonder whether it'd be worth carrying additional state to
> check that explicitly (instead of assuming that "if (Sxx == 0 ||
> Syy == 0)" will catch it).

I wondered the same thing. It's not nice to have to do that, but
clearly the existing test for constant inputs is no good. The question
is, do we really want to spend extra cycles on every query just to
catch this odd corner case?

Regards,
Dean



В списке pgsql-bugs по дате отправления: