Обсуждение: ARRAY(subquery) volatility
Why does the first query below return the same value for each row
while the second query returns random values? Planner optimization?
test=> SELECT ARRAY(SELECT random()) FROM generate_series(1, 5); ?column?
---------------------{0.269273371561092}{0.269273371561092}{0.269273371561092}{0.269273371561092}{0.269273371561092}
(5 rows)
test=> SELECT ARRAY(SELECT random() + x * 0) FROM generate_series(1, 5) AS g(x); ?column?
---------------------{0.826863945846848}{0.42534113182935}{0.36419924318986}{0.258920902972538}{0.843205466327819}
(5 rows)
--
Michael Fuhr
Michael Fuhr wrote:
> Why does the first query below return the same value for each row
> while the second query returns random values? Planner optimization?
I assume it is due to some kind of flattening in the planner, but it is
totally unrelated to ARRAY(subquery):
regression=# SELECT (SELECT random()) as f FROM generate_series(1, 5); f
------------------- 0.752416231088534 0.752416231088534 0.752416231088534 0.752416231088534 0.752416231088534
(5 rows)
regression=# SELECT (SELECT random() + 0 * f) as f FROM
generate_series(1, 5) as t(f); f
------------------- 0.176055165555354 0.608546747178094 0.55303416240636 0.127355110425202 0.21671894063089
(5 rows)
Here's another example:
regression=# create table t1(f text);
CREATE TABLE
regression=# insert into t1 values('0');
INSERT 17366 1
regression=# insert into t1 values('0');
INSERT 17367 1
regression=# insert into t1 values('0');
INSERT 17368 1
regression=# SELECT (SELECT timeofday()::timestamp + f::interval) as f
FROM t1; f
---------------------------- 2005-08-16 16:38:12.738215 2005-08-16 16:38:12.738292 2005-08-16 16:38:12.738315
(3 rows)
regression=# SELECT (SELECT timeofday()::timestamp) as f FROM t1; f
---------------------------- 2005-08-16 16:35:33.100791 2005-08-16 16:35:33.100791 2005-08-16 16:35:33.100791
(3 rows)
This shows that it isn't related to use of an SRF either.
Joe
Michael Fuhr <mike@fuhr.org> writes:
> Why does the first query below return the same value for each row
> while the second query returns random values? Planner optimization?
> test=> SELECT ARRAY(SELECT random()) FROM generate_series(1, 5);
> test=> SELECT ARRAY(SELECT random() + x * 0) FROM generate_series(1, 5) AS g(x);
The sub-SELECT in the first one is considered an uncorrelated subquery,
so you get a plan that evaluates the subquery just once:
Function Scan on generate_series (cost=0.01..12.51 rows=1000 width=0) InitPlan -> Result (cost=0.00..0.01 rows=1
width=0)
In the second case, x is an outer parameter to the subquery, so it has
to be re-evaluated for each row of the outer query:
Function Scan on generate_series g (cost=0.00..32.50 rows=1000 width=4) SubPlan -> Result (cost=0.00..0.02
rows=1width=0)
Note the "InitPlan" vs "SubPlan" labels --- they look similar, but the
evaluation rules are totally different.
The fact that there's a volatile function in the subquery isn't
considered while making this decision. I'm not sure if it should be.
regards, tom lane