Re: Aggregate Function corr does not always return the correct value

Поиск
Список
Период
Сортировка
От Ranier Vilela
Тема Re: Aggregate Function corr does not always return the correct value
Дата
Msg-id CAEudQAobLi=Kvk4KSw6PSgcgLWEDTGS7fr1aEEu_23V7MzFuwg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Aggregate Function corr does not always return the correct value  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers


Em ter., 26 de ago. de 2025 às 14:34, Tom Lane <tgl@sss.pgh.pa.us> escreveu:
Maxim Orlov <orlovmg@gmail.com> writes:
> One of the clients complained as to why the query for calculating the
> correlation coefficient with the CORR function yielded such weird
> results. After a little analysis, it was discovered that they were
> calculating the correlation coefficient for two sets, one of which is
> more or less random and the other of which is simply a set of constant
> values (0.09 if that matters). As a result, they were attaining
> unexpected results. However, as far as I am aware, they should have
> received NULL because it is impossible to calculate the standard
> deviation for such a set.

[ shrug... ]  Calculations with float8 are inherently inexact, so
it's unsurprising that we sometimes fail to detect that the input
is exactly a horizontal or vertical line.  I don't think there is
anything to be done here that wouldn't end in making things worse.
With the below checking 

if (Sxx == 0.0 && Syy == 0.0)
   PG_RETURN_NULL();

This test returns NaN
WITH dataset AS (SELECT x, 0.125 AS y FROM generate_series(0, 5) AS x) SELECT corr(x, y) FROM dataset;

But I can't say if this answer (NaN) makes things worse.

best regards,
Ranier Vilela

В списке pgsql-hackers по дате отправления: