Обсуждение: Problem with volatile function
So my understanding of volatile functions is that volatile functions can return different results given the same input. I have a function random(int, int) that returns a random value between $1 and $2. I want to use it in a query to generate values. But it only evaluates once per query and not once per row like I need it to. -- This always returns the same value SELECT ts.sis_id, bldg_id, f_name.name, l_name.name FROM tmp_students ts JOIN names AS f_name ON ts.gender = f_name.gender WHERE f_name.counter = random(1,300) --As does this SELECT ts.sis_id, bldg_id, f_name.name, l_name.name FROM tmp_students ts JOIN names AS f_name ON ts.gender = f_name.gender AND ts.counter = random(1,100) -- This generates different numbers SELECT random(1,100), s.* FROM usr_students s
Artacus <artacus@comcast.net> writes: > I have a function random(int, int) that returns a random value between > $1 and $2. I want to use it in a query to generate values. But it only > evaluates once per query and not once per row like I need it to. > -- This always returns the same value > SELECT ts.sis_id, bldg_id, f_name.name, l_name.name > FROM tmp_students ts > JOIN names AS f_name ON > ts.gender = f_name.gender > WHERE f_name.counter = random(1,300) You haven't really explained what your problem is, but offhand I'd expect random() to be evaluated once per f_name row here. What are you looking for? regards, tom lane
Artacus wrote: > So my understanding of volatile functions is that volatile functions can > return different results given the same input. > > I have a function random(int, int) that returns a random value between > $1 and $2. I want to use it in a query to generate values. But it only > evaluates once per query and not once per row like I need it to. > > -- This always returns the same value > SELECT ts.sis_id, bldg_id, f_name.name, l_name.name > FROM tmp_students ts > JOIN names AS f_name ON > ts.gender = f_name.gender > WHERE f_name.counter = random(1,300) Personally I'd expect that to only evaluate once. It's saying "where f_name.counter in this row is equal to some single random value generated at the start of the query". The parameters of the random() function do not depend on the input, so Pg evaluates it once rather than with each row. That's not to do with volatility; rather, it's the semantics of your query. It's like the difference between correlated and uncorrelated subqueries. A subquery that doesn't reference outside context is only evaluated once, ie it's uncorrelated. Only if the subquery contains references to values in rows in the outside query is it evaluated once for each row. If Pg re-evaluated your random() function for every record, how would you then write "fetch all records that have f_name.counter equal to the same randomly selected value" ? You can force Pg to re-evaluate random() by adding a dummy parameter that depends on the input record, or (probably better) by writing a variant of it that tests the input against a randomly generated value and returns a boolean. Eg: SELECT ts.sis_id, bldg_id, f_name.name, l_name.name FROM tmp_students ts JOIN names AS f_name ON ts.gender = f_name.gender WHERE random_equals(1,300,f_name.counter) random_equals would be called once per input record. It would generate a random number between 1 and 300, and test the result to see if it was equal to the argument value f_name.counter, then return the result of the test. Alternately, you could pair each result up with a random value and re-check the results with a subquery (slower): SELECT sis_id, bldg_id, fname, lname FROM (SELECT ts.sis_id, bldg_id, f_name.name AS fname, l_name.name AS lname, f_name.counter AS counter, random(1,300) AS r FROM tmp_students ts JOIN names AS f_name ON ts.gender = f_name.gender ) AS x WHERE r = counter; ... but that's pretty ugly. > --As does this > SELECT ts.sis_id, bldg_id, f_name.name, l_name.name > FROM tmp_students ts > JOIN names AS f_name ON > ts.gender = f_name.gender > AND ts.counter = random(1,100) Same deal as above. > > -- This generates different numbers > SELECT random(1,100), s.* > FROM usr_students s ... because you've used random() in the result list. That's where VOLATILE and STABLE etc will make a difference. -- Craig Ringer
Artacus wrote: > So my understanding of volatile functions is that volatile functions can > return different results given the same input. > > I have a function random(int, int) that returns a random value between > $1 and $2. I want to use it in a query to generate values. But it only > evaluates once per query and not once per row like I need it to. > > -- This always returns the same value > SELECT ts.sis_id, bldg_id, f_name.name, l_name.name > FROM tmp_students ts > JOIN names AS f_name ON > ts.gender = f_name.gender > WHERE f_name.counter = random(1,300) > > --As does this > SELECT ts.sis_id, bldg_id, f_name.name, l_name.name > FROM tmp_students ts > JOIN names AS f_name ON > ts.gender = f_name.gender > AND ts.counter = random(1,100) > > -- This generates different numbers > SELECT random(1,100), s.* > FROM usr_students s > Are you sure it's volatile? (as opposed to stable) postgres=# create or replace function random(int,int) returns int as $$select round($2*random()*$1)::int;$$ language sql volatile; CREATE FUNCTION postgres=# select generate_series from generate_series(1,10,1) where random(1,100) > 50; generate_series ----------------- 1 3 4 6 7 9 (6 rows) postgres=# select generate_series from generate_series(1,10,1) where random(1,100) > 50; generate_series ----------------- 3 7 8 9 (4 rows) postgres=# select random(1,100), generate_series from generate_series(1,10,1); random | generate_series --------+----------------- 56 | 1 23 | 2 80 | 3 57 | 4 16 | 5 99 | 6 9 | 7 41 | 8 90 | 9 88 | 10 (10 rows) postgres=# create or replace function random(int,int) returns int as $$select round($2*random()*$1)::int;$$ language sql stable; CREATE FUNCTION postgres=# select generate_series from generate_series(1,10,1) where random(1,100) > 50; generate_series ----------------- 1 2 3 4 5 6 7 8 9 10 (10 rows) postgres=# select generate_series from generate_series(1,10,1) where random(1,100) > 50; generate_series ----------------- (0 rows) postgres=# select random(1,100), generate_series from generate_series(1,10,1); random | generate_series --------+----------------- 72 | 1 90 | 2 53 | 3 47 | 4 53 | 5 33 | 6 10 | 7 56 | 8 78 | 9 87 | 10 (10 rows) postgres=# -- Klint Gore Database Manager Sheep CRC A.G.B.U. University of New England Armidale NSW 2350 Ph: 02 6773 3789 Fax: 02 6773 3266 EMail: kgore4@une.edu.au
Craig Ringer <craig@postnewspapers.com.au> writes: > Personally I'd expect that to only evaluate once. It's saying "where > f_name.counter in this row is equal to some single random value > generated at the start of the query". The parameters of the random() > function do not depend on the input, so Pg evaluates it once rather than > with each row. No, that's not the model that we use. In my view, the semantics specified by the SQL standard are that SELECT ... FROM a, b, c WHERE ... are that you notionally form the Cartesian product of a X b X c and then evaluate the WHERE condition at each row of the product to find out which rows make it into the SELECT result. Obviously, the DBMS is going to try like heck to optimize the execution of that model, but its optimizations should not change the results. We're not perfect about that, in a number of ways --- in particular, we allow short-circuit evaluation of ANDed conditions, which means that we might not bother to execute a volatile function for some row of the Cartesian product if we're able to exclude that row from the result based on other WHERE conditions. The OP's complaint is a bit different from this, in that (if I understand it correctly) he doesn't want the WHERE condition to be dropped down to the individual relation scan, because that amounts to improperly assuming that the volatile WHERE condition will produce the same result at every row of the Cartesian product that included that row from the base relation. This gripe isn't without foundation. But I'm not seeing what properties you get from that that wouldn't be broken by AND-shortcircuit evaluation. Dropping the AND-shortcircuit optimization would mean that we revert to the brain-dead "form the Cartesian product and then filter" approach if even one volatile function appears anywhere in the WHERE conditions. I doubt that either the OP or anyone else wants that. regards, tom lane
> You can force Pg to re-evaluate random() by adding a dummy parameter > that depends on the input record, or (probably better) by writing a > variant of it that tests the input against a randomly generated value > and returns a boolean. Eg: > > Thanks all. So here's the situation. I added a dummy parameter and passed the id like you suggested. That had no effect. I still got one name for males and one name for females. So I used the 3rd param in a trivial way: select round($2*random()*$1+($3/10000000))::int; And that actually forced it to execute for every row. However, it returned unpredictable results. There should have been only one match for first_name and last_name for each person but it return from 1 to 5 rows for each person. sis_id gender name name counter counter --------- --------- ---------- ---------- ---------- ---------- 105607 M Denis Weber 19 671 105666 M Javier Custodio 154 182 105666 M Javier Nelson 154 250 105839 M Johnnie Whicker 295 32 105847 F Trina Garcia 259 155 105847 F Dione Freeman 103 651 105847 F Dione Harden 103 897 105847 F Cruz Brannen 249 1240 So what I actually had to do was get the sis_id and the two random numbers in a subselect. SELECT stu.sis_id, stu.gender, f_name.name AS first_name, l_name.name AS last_name FROM usr_students stu JOIN ( SELECT sis_id, random(1,300) AS f_cnt, random(1,1700) AS l_cnt FROM usr_students s ) sub ON stu.sis_id = sub.sis_id JOIN names f_name ON stu.gender = f_name.gender AND f_name.counter = sub.f_cnt JOIN names l_name ON l_name.gender IS NULL AND l_name.counter = sub.l_cnt So while that works, postgres isn't behaving how I'd expect (or how Tom expects from the sounds of it)
Tom Lane wrote: > Craig Ringer <craig@postnewspapers.com.au> writes: >> Personally I'd expect that to only evaluate once. It's saying "where >> f_name.counter in this row is equal to some single random value >> generated at the start of the query". The parameters of the random() >> function do not depend on the input, so Pg evaluates it once rather than >> with each row. > > No, that's not the model that we use. In my view, the semantics > specified by the SQL standard are that > SELECT ... FROM a, b, c WHERE ... > are that you notionally form the Cartesian product of a X b X c > and then evaluate the WHERE condition at each row of the product > to find out which rows make it into the SELECT result. Obviously, > the DBMS is going to try like heck to optimize the execution of that > model, but its optimizations should not change the results. Thanks for explaining that; I'd completely misunderstood the way Pg handles evaluation of expressions that don't explicitly reference fields in the records being scanned. I think the workarounds I suggested should still work, despite the incorrect reasoning on which they were based. -- Craig Ringer