Обсуждение: is it a known issue or just a bug?
Folks, Last week one of my students confronted me with a nice little SQL statement which made me call gdb ... Consider the following scenario: [hs@fedora bug]$ cat q1.sql create temporary sequence seq_ab; select * from (Select nextval('seq_ab') as nv, * from ( select t_product.id,t_text.value,t_price.price from t_product,t_price,t_text where t_product.id = t_price.product_id and t_product.name = t_text.id and t_text.lang='de' and t_price.typ = 'default' order by price desc ) as t ) as u -- WHERE nv <= 1 ; [hs@fedora bug]$ psql test < q1.sql CREATE SEQUENCE nv | id | value | price ----+----+---------+------- 1 | 3 | Banane | 12 2 | 1 | T-Shirt | 10 3 | 2 | Apfel | 7 (3 rows) this query returns the right result. however, when uncommenting the WHERE clause things look different: [hs@fedora bug]$ cat q2.sql create temporary sequence seq_ab; select * from (Select nextval('seq_ab') as nv, * from ( select t_product.id,t_text.value,t_price.price from t_product,t_price,t_text where t_product.id = t_price.product_id and t_product.name = t_text.id and t_text.lang='de' and t_price.typ = 'default' order by price desc ) as t ) as u WHERE nv <= 1 ; [hs@fedora bug]$ psql test < q2.sql CREATE SEQUENCE nv | id | value | price ----+----+---------+------- 4 | 1 | T-Shirt | 10 (1 row) Obviously nv = 4 is wrong ... Looking at the execution plan of the second query the problem seems quite obvious: QUERY PLAN -------------------------------------------------------------------------------------------------------- Subquery Scan t (cost=69.24..69.26 rows=1 width=68) -> Sort (cost=69.24..69.25 rows=1 width=68) Sort Key: t_price.price -> Hash Join (cost=22.51..69.23 rows=1 width=68) Hash Cond: ("outer".name = "inner".id) Join Filter: (nextval('seq_ab'::text) <= 1) -> Nested Loop (cost=0.00..46.68 rows=5 width=40) -> Seq Scan on t_price (cost=0.00..22.50 rows=5 width=36) Filter: (typ = 'default'::text) -> Index Scan using t_product_pkeyon t_product (cost=0.00..4.82 rows=1 width=8) Index Cond: (t_product.id = "outer".product_id) -> Hash (cost=22.50..22.50 rows=5 width=36) -> Seq Scan on t_text (cost=0.00..22.50 rows=5 width=36) Filter: (lang = 'de'::text) (14 rows) nextval() is called again when processing the WHERE clause. this was fine if nextval() would return the same thing again and again (which is not the job of nextval). if the planner materialized the subquery things would materialize the subquery in case of unstable functions things would work in this case. I know I temp table would easily fix this query and it is certainly not the best query I have ever seen but still it seems like a bug and I just wanted to know whether it is a know issue or not. Looking at the code I did not quite know whether this is something which should / can be fixed or not. here is the data: ------------------------------------------------------ CREATE TABLE t_text ( id int4, lang text, value text ); CREATE TABLE t_group ( id int4, name int4, -- mehrsprachig in t_text valid boolean ); INSERT INTO t_group VALUES (1, 1, 't'); INSERT INTO t_text VALUES (1, 'de', 'Obst'); INSERT INTO t_text VALUES (1, 'en', 'Fruits'); INSERT INTO t_group VALUES (2, 2, 't'); INSERT INTO t_text VALUES (2, 'de', 'Kleidung'); INSERT INTO t_text VALUES (2, 'en', 'Clothes'); CREATE UNIQUE INDEX idx_group_id ON t_group (id); CREATE TABLE t_product ( id int4, name int4, -- mehrsprachig in t_text active boolean, PRIMARY KEY (id) ); INSERT INTO t_product VALUES (1, 3, 't'); INSERT INTO t_text VALUES (3, 'de', 'T-Shirt'); INSERT INTO t_text VALUES (3, 'en', 'T-Shirt'); INSERT INTO t_product VALUES (2, 4, 't'); INSERT INTO t_text VALUES (4, 'de', 'Apfel'); INSERT INTO t_text VALUES (4, 'en', 'Apple'); INSERT INTO t_product VALUES (3, 5, 't'); INSERT INTO t_text VALUES (5, 'de', 'Banane'); INSERT INTO t_text VALUES (5, 'en', 'Banana'); CREATE TABLE t_product_group ( product_id int4 REFERENCES t_product(id) ON UPDATE CASCADE ON DELETE CASCADE, group_id int4 REFERENCESt_group(id) ON UPDATE CASCADE ONDELETE CASCADE ); INSERT INTO t_product_group VALUES (2, 1); INSERT INTO t_product_group VALUES (3, 1); INSERT INTO t_product_group VALUES (1, 2); CREATE TABLE t_price ( id int4, typ text, price numeric, product_id int4 REFERENCES t_product(id) ON UPDATE CASCADE ON DELETE CASCADE, PRIMARY KEY (id) ); INSERT INTO t_price VALUES (1, 'default', '10', 1); INSERT INTO t_price VALUES (2, 'sonder', '20', 1); INSERT INTO t_price VALUES (3, 'spezial', '30', 1); INSERT INTO t_price VALUES (4, 'default', '7', 2); INSERT INTO t_price VALUES (5, 'default', '12', 3); Best regards, Hans -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/720/10 1234567 or +43/660/816 40 77 www.cybertec.at, www.postgresql.at, kernel.cybertec.at
Hans-Jürgen Schönig <postgres@cybertec.at> writes: > Consider the following scenario: > select * from (Select nextval('seq_ab') as nv, > * from ( select > t_product.id,t_text.value,t_price.price > from t_product,t_price,t_text > where t_product.id = t_price.product_id > and t_product.name = t_text.id > and t_text.lang='de' > and t_price.typ = 'default' > order by price desc ) as t ) as u > WHERE nv <= 1 > ; I don't think there's any very clean way to fix this sort of problem in general. We could make this particular example work if (1) we prevented a subquery containing volatile functions in its targetlist from being flattened into the parent query, and (2) we prevented outer WHERE clauses from being pushed down into a subquery when they reference subquery outputs containing volatile functions. There has been some recent discussion about doing (1) but I think we forgot about the necessity to also do (2); otherwise you'd end up with select * from (Select nextval('seq_ab') as nv, ... WHERE nextval('seq_ab') <= 1 ) as u ; which is hardly any better. Now those things are both doable but where it really falls down is when you join the subselect to some other table. Short of materializing the subselect there'd be no way to guarantee single evaluation of any one row in the subselect. I'd be willing to do (1) and (2) but not to force materialization; the performance hit for that just seems unacceptable. regards, tom lane
Tom, > I don't think there's any very clean way to fix this sort of problem in > general. We could make this particular example work if Frankly, I don't think there *is* any safe way to use volatile functions in subqueries -- I certainly avoid it, except now() and random() which as discussed are special cases. Perhaps a WARNING is in order? -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > Tom, > > >>I don't think there's any very clean way to fix this sort of problem in >>general. We could make this particular example work if > > > Frankly, I don't think there *is* any safe way to use volatile functions in > subqueries -- I certainly avoid it, except now() and random() which as > discussed are special cases. Perhaps a WARNING is in order? > Personally I like Josh's idea. A warning would be a nice thing. I have just looked at Oracle and SQL Server to find out what those systems are doing ... Oracle has a very interesting concept *g*: ORA-02287 sequence number not allowed here: The usage of a sequence is limited and it can be used only in few areas of PL/SQL and SQL coding. The following are the cases where you can't use a sequence: For a SELECT Statement: 1. In a WHERE clause 2. In a GROUP BY or ORDER BY clause 3. In a DISTINCT clause 4. Along witha UNION or INTERSECT or MINUS 5. In a sub-query -------------------------- SQL Server does not support sequences the way we know it so it is hard to compare. I did not have time to test DB2 yet. Thanks a lot, Hans -- Cybertec Geschwinde u Schoenig Schoengrabern 134, A-2020 Hollabrunn, Austria Tel: +43/720/10 1234567 or +43/660/816 40 77 www.cybertec.at, www.postgresql.at, kernel.cybertec.at
Hans-Jürgen Schönig <postgres@cybertec.at> writes: > Josh Berkus wrote: >> Frankly, I don't think there *is* any safe way to use volatile functions in >> subqueries -- I certainly avoid it, except now() and random() which as >> discussed are special cases. Perhaps a WARNING is in order? > Personally I like Josh's idea. A warning would be a nice thing. From the planner's perspective, it would have to warn about any volatile function, which would probably be overly chatty --- remember that the default marking for user-defined functions is "volatile". This default may also be a good reason not to put in the anti-flattening defenses I suggested before, because it would mean that even slight sloppiness in the definition of a user function could cripple subquery optimization. I'm not sure that that's a strong argument, but it's something to think about. It'd be easy enough to put in the anti-flattening defenses (checks (1) and (2) in my prior message) but I've got mixed emotions about whether this is really a good thing to do. Any opinions out there? regards, tom lane
Tom, > It'd be easy enough to put in the anti-flattening defenses (checks (1) > and (2) in my prior message) but I've got mixed emotions about whether > this is really a good thing to do. Any opinions out there? If my opinion wasn't clear, I was suggesting adding a WARNING and not doing anything about flattening. I can't say that, in 5 years of developing applications in Postgres, that this has ever been a problem for me personally -- from my perspective the persons reporting the issue needs to re-code their query, it's not what sequences were meant for. -- --Josh Josh Berkus Aglio Database Solutions San Francisco