Обсуждение: Volatile functions under Memoize node
Hi,
Excuse me if I made noise in vain. After discovering the limits of the
Memoize node, I realized that volatile functions are allowed under the
Memoize. Example:
DROP TABLE IF EXISTS base, other CASCADE;
CREATE TABLE base (x numeric, y text, x1 integer);
INSERT INTO base (x,y,x1) SELECT 1, 'abccccccccccccc'||1,1 FROM
generate_series(1,1E6) AS x;
CREATE TABLE other (x numeric, y text, x1 integer);
INSERT INTO other (x,y,x1) SELECT x, 'abccccccccccccc'||x,1 FROM
generate_series(1,1E4) AS x;
VACUUM ANALYZE base, other;
EXPLAIN (COSTS OFF)
SELECT * FROM base WHERE base.x IN (
SELECT o2.x FROM other o1 LEFT JOIN other o2 ON (o1.x=o2.x) LEFT JOIN
other o3 ON (o2.x=o3.x+random())
WHERE base.x = o1.x GROUP BY o2.x ORDER BY o2.x
);
/*
Nested Loop
-> Seq Scan on base
-> Memoize
Cache Key: base.x
Cache Mode: binary
-> Subquery Scan on "ANY_subquery"
Filter: (base.x = "ANY_subquery".x)
-> Group
Group Key: o2.x
-> Sort
Sort Key: o2.x
-> Nested Loop Left Join
Join Filter: ((o2.x)::double precision
= ((o3.x)::double precision + random()))
-> Nested Loop Left Join
-> Seq Scan on other o1
Filter: (base.x = x)
-> Seq Scan on other o2
Filter: (x = base.x)
-> Seq Scan on other o3
*/
In my mind, any volatile function in any expression should reject the
usage of Memoize, am I wrong? I haven't discovered this case deeply yet,
but maybe someone has a quick and short answer to this question.
--
regards, Andrei Lepikhov
On Fri, 20 Sept 2024 at 04:47, Andrei Lepikhov <lepihov@gmail.com> wrote: > Excuse me if I made noise in vain. After discovering the limits of the > Memoize node, I realized that volatile functions are allowed under the > Memoize. Example: I don't think we're particularly consistent about the number of evaluations of volatile functions in general. Here's something crafted up without Memoize: create function volatilefunc(p int) returns int as $$ begin raise notice '%', p; return p; end; $$ language plpgsql volatile; explain analyze select * from (values(1),(2)) t1(v) where t1.v in(select t2.v from (values(1),(2)) t2(v) inner join (values(1),(2)) t3(v) on t2.v = volatilefunc(t3.v)); Normally: NOTICE: 1 NOTICE: 2 NOTICE: 1 NOTICE: 2 But: set enable_hashagg = 0; set enable_hashjoin = 0; set enable_material = 0; set enable_sort = 0; gives: NOTICE: 1 NOTICE: 1 NOTICE: 2 NOTICE: 1 NOTICE: 2 I'm not sure if it's a good idea to penalise your case when we're not all that consistent to start with. Is this causing some sort of breakage? David
On 20/9/2024 04:36, David Rowley wrote: > On Fri, 20 Sept 2024 at 04:47, Andrei Lepikhov <lepihov@gmail.com> wrote: >> Excuse me if I made noise in vain. After discovering the limits of the >> Memoize node, I realized that volatile functions are allowed under the >> Memoize. > I'm not sure if it's a good idea to penalise your case when we're not > all that consistent to start with. Is this causing some sort of > breakage? As I've said before, I just discovered the feature's limits and realised that I don't actually understand the idea of volatility checking: even in the get_memoize_path, we spend cycles checking the target list and base restrictions for such functions. Is it necessary there? -- regards, Andrei Lepikhov
On 20/9/2024 04:36, David Rowley wrote: > On Fri, 20 Sept 2024 at 04:47, Andrei Lepikhov <lepihov@gmail.com> wrote: > I'm not sure if it's a good idea to penalise your case when we're not > all that consistent to start with. Is this causing some sort of > breakage? I skimmed the code entries with such checks and found out that the initial reason was to avoid index scans, with a reason that such a routine should be applied to each tuple of the table. The second reason - to postpone expression evaluation (9118d03) - is also reasonable for me. It was introduced to be consistent with the clause's syntactical level in the SQL. It seems to follow the same idea as disabling subquery pull-ups: to avoid multiple evaluations and change the syntactical level. At the same time, Material doesn't care about volatility. So, what was the idea behind the commit 990c365 you added? -- regards, Andrei Lepikhov
On Fri, 20 Sept 2024 at 20:46, Andrei Lepikhov <lepihov@gmail.com> wrote:
> I skimmed the code entries with such checks and found out that the
> initial reason was to avoid index scans, with a reason that such a
> routine should be applied to each tuple of the table.
> The second reason - to postpone expression evaluation (9118d03) - is
> also reasonable for me. It was introduced to be consistent with the
> clause's syntactical level in the SQL.
> It seems to follow the same idea as disabling subquery pull-ups: to
> avoid multiple evaluations and change the syntactical level.
> At the same time, Material doesn't care about volatility. So, what was
> the idea behind the commit 990c365 you added?
I don't recall. Likely to try and keep Memoize more in keeping with
what would happen prior to Memoize existing.
It seems customary around here to disable various optimisations in the
planner when there are volatile functions to try to avoid changing the
number of evaluations of the volatile function. However, I've yet to
see any sort of standard we're meant to be abiding by for it.
You only need to look at things like;
postgres=# select random(), random();
random | random
--------------------+--------------------
0.6097568694225706 | 0.5371689823343302
(1 row)
postgres=# select random(), random() order by random();
random | random
---------------------+---------------------
0.16673781514021058 | 0.16673781514021058
(1 row)
and you might be left scratching your head at why all the random()
calls randomly returned the same value. Of course, the above behaviour
is on purpose, but there are certainly reports of people questioning
it [1].
David
[1] https://www.postgresql.org/message-id/CALA8mJrDQhL-kntd%3DypBgwvogL8%3Dkspn5za1Mxv%2BmS%3DdinL5Sg%40mail.gmail.com