Re: Wrong results with grouping sets

Поиск
Список
Период
Сортировка
От Richard Guo
Тема Re: Wrong results with grouping sets
Дата
Msg-id CAMbWs4_xhRAfy-i==nFMZGukw4M=OnkfwpfEfiGmAx6a3SYBKw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Wrong results with grouping sets  (Paul George <p.a.george19@gmail.com>)
Ответы Re: Wrong results with grouping sets
Список pgsql-hackers
On Wed, Jul 17, 2024 at 8:50 AM Paul George <p.a.george19@gmail.com> wrote:
> > Since a subquery is a volatile expression, each of its instances
> should be evaluated separately.

I don't think this conclusion is correct.  Look at:

select random(), random() from t group by random();
       random       |       random
--------------------+--------------------
 0.7972330769936766 | 0.7972330769936766
(1 row)

> This seems like a valid point, though "query 2" below which groups over a RANDOM() column and outputs an additional
RANDOM()column a potential, albeit contrived, counter-example? [NOTE: this was done on Postgres 16.3] I've included a
fewdifferent combinations of GROUP BYs. 

Interesting.  I looked into the scenarios with multiple instances of
the same volatile grouping expressions and here is what I observed.

create table t (a int, b int);
insert into t select 1,1;

-- on master, with plain volatile functions
select random() as c1,
       random() as c2,
       random() as c3
from t t1 group by c1;
        c1         |        c2         |        c3
-------------------+-------------------+-------------------
 0.567478050404431 | 0.567478050404431 | 0.567478050404431
(1 row)

So the random() function is evaluated only once, even though it
appears three times.

-- on master, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
       (select random() from t t2 where a = t1.a) as c2,
       (select random() from t t2 where a = t1.a) as c3
from t t1 group by c1;
         c1         |         c2         |         c3
--------------------+--------------------+--------------------
 0.8420177313766823 | 0.2969648209746336 | 0.3499675329093421
(1 row)

So on master the subquery is evaluated three times.  Why isn't this
consistent with the behavior of the first query?

-- on patched, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
       (select random() from t t2 where a = t1.a) as c2,
       (select random() from t t2 where a = t1.a) as c3
from t t1 group by c1;
         c1         |         c2         |         c3
--------------------+--------------------+--------------------
 0.5203586066423254 | 0.5203586066423254 | 0.5203586066423254
(1 row)

So on patched the subquery is evaluated only once, which is consistent
with the behavior of the first query.

Does this suggest that the patched version is more 'correct' for this
case?


Now let's look at the scenario with two grouping keys.

-- on master, with plain volatile functions
select random() as c1,
       random() as c2,
       random() as c3
from t t1 group by c1, c2;
         c1         |         c2         |         c3
--------------------+--------------------+--------------------
 0.9388558105069595 | 0.2900389441597979 | 0.9388558105069595
(1 row)

So the first two random() functions are evaluated independently, and
the third random() function references the result of the first one.

-- on master, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
       (select random() from t t2 where a = t1.a) as c2,
       (select random() from t t2 where a = t1.a) as c3
from t t1 group by c1, c2;
         c1          |         c2         |         c3
---------------------+--------------------+--------------------
 0.46275163300894073 | 0.5083760995112951 | 0.6752682696191123
(1 row)

So on master the subquery is evaluated three times.

-- on patched, with subqueries that are 'volatile'
select (select random() from t t2 where a = t1.a) as c1,
       (select random() from t t2 where a = t1.a) as c2,
       (select random() from t t2 where a = t1.a) as c3
from t t1 group by c1, c2;
         c1         |         c2         |         c3
--------------------+--------------------+--------------------
 0.9887848690744176 | 0.9887848690744176 | 0.9887848690744176
(1 row)

So on patched the subquery is evaluated only once.

It seems that in this scenario, neither the master nor the patched
version handles volatile subqueries in grouping expressions the same
way as it handles plain volatile functions.

I am confused.  Does the SQL standard explicitly define or standardize
the behavior of grouping by volatile expressions?  Does anyone know
about that?

Thanks
Richard



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Следующее
От: Joseph Koshakow
Дата:
Сообщение: Re: Remove dependence on integer wrapping