Обсуждение: Nested CASE-WHEN scoping
While looking at fixing the multiple-evaluation issue in IN and BETWEEN discussed a while ago, I realized that the current assumption that only one CaseTestExpr placeholder needs to be valid at any given time is not true. Here's a bit contrived example: CREATE FUNCTION evileq (timestamptz, int4) returns boolean AS $$ SELECT case $2 WHEN length($1::text) THEN true ELSE falseEND; $$ language sql; CREATE OPERATOR = (procedure = evileq, leftarg = timestamptz, rightarg = int4); postgres=# SELECT now() = 29, CASE now() WHEN 29 THEN 'foo' ELSE 'bar' END; ?column? | case ----------+------ t | bar (1 row) Direct call to the operator, "now () = 29" returns true, but when used in CASE-WHEN, which implicitly does the same comparison, the result is false. Admittedly that's pretty far-fetched, but nevertheless it's a bug. As part of the BETWEEN/IN fix, I was going to refactor CaseTestExpr and CoerceToDomainValue placeholder node types into one generic placeholder node. BETWEEN needs three placeholder slots in the worst case [*], and now it seems that we need to handle an arbitrary number of simultaneous placeholders even for CASE-WHEN. So I'm going to put the BETWEEN/IN fix aside for now, and refactor the placeholder infrastructure to handle several simultaneous placeholders, and replace CaseTestExpr and CoerceToDomainValue with it. Actually AggRef and WindowFunc nodes look a lot like CaseTestExpr and CoerceToDomainValue too, but I'm a bit scared of touching those. PS. This is all 9.2 material, in case you're wondering. We're talking about pretty big patches. [*] a BETWEEN SYMMETRIC b AND c is handled as "(a <= b AND a >= c) OR (a >= b AND a <= c)", leading to multiple evaluationof all three operands if placeholders are not used -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 25.05.2011 14:57, Heikki Linnakangas wrote: > Here's a bit contrived example: > > CREATE FUNCTION evileq (timestamptz, int4) returns boolean AS $$ > SELECT case $2 WHEN length($1::text) THEN true ELSE false END; > $$ language sql; > CREATE OPERATOR = (procedure = evileq, leftarg = timestamptz, rightarg = > int4); > > postgres=# SELECT now() = 29, CASE now() WHEN 29 THEN 'foo' ELSE 'bar' END; > ?column? | case > ----------+------ > t | bar > (1 row) > > Direct call to the operator, "now () = 29" returns true, but when used > in CASE-WHEN, which implicitly does the same comparison, the result is > false. Admittedly that's pretty far-fetched, but nevertheless it's a bug. I should add that this works fine if the function is not an SQL function that gets inlined. But inlining is desirable, we don't want to give up on that, and inhibiting it in that case would need some extra bookkeeping anyway. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > While looking at fixing the multiple-evaluation issue in IN and BETWEEN > discussed a while ago, I realized that the current assumption that only > one CaseTestExpr placeholder needs to be valid at any given time is not > true. [ scratches head ... ] Why does the save/restore in ExecEvalCase not take care of this? > So I'm going to put the BETWEEN/IN fix aside for now, and refactor the > placeholder infrastructure to handle several simultaneous placeholders, That sounds like a mess, and I'm not even convinced it'll solve the problem ... regards, tom lane
On 25.05.2011 17:47, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> While looking at fixing the multiple-evaluation issue in IN and BETWEEN >> discussed a while ago, I realized that the current assumption that only >> one CaseTestExpr placeholder needs to be valid at any given time is not >> true. > > [ scratches head ... ] Why does the save/restore in ExecEvalCase not > take care of this? The mistake happens during planning, when the SQL function is inlined and pre-evaluated. It's a bit hard to see what happened once the planning is finished because the whole expression is folded into a constant, but here's goes: The original expression is: CASE now() WHEN 29 THEN 'foo' ELSE 'bar' END; In parse analysis, it is turned into this: CASE WHEN CaseTestExpr = 29 THEN 'foo' ELSE 'bar' END; where CaseTestExpr stands for the now(). Next the planner tries to simplify the WHEN condition, "CaseTestExpr = 29". The equality operator is implemented by the evileq(timestamptz, int4) SQL function, defined as: CASE $2 WHEN length($1::text) THEN true ELSE false END; That SQL-function is transformed at parse analysis into: CASE CaseTestExpr = length($1::text) THEN true ELSE false END; This CaseTestExpr stands for the Param to the function, $2. When that tranformed SQL function body is inlined into the outer WHEN clause, "CaseTestExpr = 29", and Params are substituted, it becomes: CASE CaseTestExpr = length(CaseTestExpr::text) THEN true ELSE false END. (you can see the expression tree for that if you print out 'newexpr' in inline_function(), just after the call to substitute_actual_parameters()) At this point it's easy to see that we have screwed up. The first CaseTestExpr stands for the inner CASE-value, which is $2, which stands for 29, and the second CaseTestExpr stands for the *outer* CASE-value, which is supposed to be now(). The planner cannot distinguish between the two anymore. Both CaseTestExprs are then incorrectly replaced with constant 29, and the whole expression is constant-folded into 'bar'. >> So I'm going to put the BETWEEN/IN fix aside for now, and refactor the >> placeholder infrastructure to handle several simultaneous placeholders, > > That sounds like a mess, and I'm not even convinced it'll solve the > problem ... Hmm, it would solve the above by if we can keep the CaseTestExprs separate. It's not quite as straightforward as I originally thought, as the parse analysis of the inlined SQL function needs to use placeholder numbers that are different from those used in the outer context. But it seems doable. BTW, i just stumbled into this: postgres=# explain verbose SELECT CASE now() WHEN (29+random()::int4) THEN 'foo' ELSE 'bar' END; ERROR: unexpected CASE WHEN clause: 326 Looks like ruleutils.c is also not prepared for the case that the implicit equality operation gets inlined into something else than an OpExpr. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 25.05.2011 17:47, Tom Lane wrote: >> [ scratches head ... ] Why does the save/restore in ExecEvalCase not >> take care of this? > The mistake happens during planning, when the SQL function is inlined > and pre-evaluated. Hm. I'm inclined to think this may be more of a bug in the inlining process than anything else. I have to run off for a doctor's appointment, but will look at this closer later today. regards, tom lane
On 25.05.2011 20:11, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> On 25.05.2011 17:47, Tom Lane wrote: >>> [ scratches head ... ] Why does the save/restore in ExecEvalCase not >>> take care of this? > >> The mistake happens during planning, when the SQL function is inlined >> and pre-evaluated. > > Hm. I'm inclined to think this may be more of a bug in the inlining > process than anything else. Well, if you want to get away without the capability to reference multiple CaseTestExprs at a time, you'll have to detect the danger and abort the inlining process. That's a bit pessimal, although this is a pretty artificial case in the first place so maybe we don't care much. (I'm still going to need more placeholder slots to handle IN and BETWEEN. Of course, I can just copy-paste CaseTestExpr into something like InTestExpr and BetweenTestExpr, but it seems like it would be good to unite all that infrastructure) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 25.05.2011 17:47, Tom Lane wrote: >> [ scratches head ... ] Why does the save/restore in ExecEvalCase not >> take care of this? > The mistake happens during planning, when the SQL function is inlined > and pre-evaluated. OK, I see the problem now: we have a CaseTestExpr in the arguments of the function, which we are unable to reduce to a constant, so it gets substituted as-is into the body of the function during inlining. And then it's physically inside the CASE expression that's in the function body, so it looks like it syntactically belongs to that expression, which it doesn't. You're probably right that this is impractical to fix without redesigning the expression representation, and that CoerceToDomainValue has got a similar issue. My advice is to not change the parser output representation, though. What I think you ought to do about this is to have the planner replace CaseTestExpr and CoerceToDomainValue with PARAM_EXEC Params during expression preprocessing, and assign suitable Param numbers which it sticks into the CaseExpr (resp CoerceToDomainExpr) so that those expressions know which Param slot to fill at runtime. The const-simplification logic can avoid getting dumber by treating the cases like known-Param-value cases. I don't think you need to invent something separate from the PARAM_EXEC infrastructure to handle these. The main annoyance here is that standalone expressions may now need Param slots to execute, which will complicate places that need to evaluate them, but there's probably no way around that (a bespoke mechanism would complicate those callers just as much, if the number of value slots it needs is variable, which it will be). > BTW, i just stumbled into this: > postgres=# explain verbose SELECT CASE now() WHEN (29+random()::int4) > THEN 'foo' ELSE 'bar' END; > ERROR: unexpected CASE WHEN clause: 326 > Looks like ruleutils.c is also not prepared for the case that the > implicit equality operation gets inlined into something else than an OpExpr. Grumble ... I thought we'd fixed that ... regards, tom lane
I wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> BTW, i just stumbled into this: >> postgres=# explain verbose SELECT CASE now() WHEN (29+random()::int4) >> THEN 'foo' ELSE 'bar' END; >> ERROR: unexpected CASE WHEN clause: 326 >> Looks like ruleutils.c is also not prepared for the case that the >> implicit equality operation gets inlined into something else than an OpExpr. > Grumble ... I thought we'd fixed that ... Yeah, you're right. We've hacked that code so it can handle some transformations that the optimizer might apply, but inlining some random expression to replace the equality operator is far beyond what we can hope to deal with. For those following along at home, the point is that if the user writes CASE test_expr WHEN cmp_expr THEN ... the parser identifies the equality operator to use and produces something that looks like this: CASE test_expr WHEN CaseTestExpr = cmp_expr THEN ... We really need ruleutils.c to generate the original form when it is looking at a stored rule (eg a view), so it goes to some lengths to recognize "CaseTestExpr = something" in a WHEN clause and only print the "something". However, this example shows that there's no chance of always being able to do that when looking at an expression that's been through the planner. I think what we'd better do, if we don't see something that looks like "CaseTestExpr = something", is just print whatever we do have in the WHEN clause. That will require inventing a print representation for CaseTestExpr, since in most cases that's going to appear in there somewhere. I suggest we just print CASE_TEST_EXPR, but if anyone wants to bikeshed, feel free ... Note that if Heikki does what I suggested upthread, the display will eventually probably look like "$nn" instead (since it'll be a Param not a CaseTestExpr). But that's 9.2 or later material. regards, tom lane
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > I think we can work around both of those by just saving and restoring > the value of each Param that we set while evaluating an expression, Huh? That's a waste of time and effort. Just make sure that each such spot has its own Param number. That's why I'm telling you to do it in the planner, not the parser --- it is easy to assign globally-unique- across-the-plan numbers at plan time, in fact we do it already. > For debugging purposes, it seems like a good idea to invent a new kind > of Param for these, and keep them separate from PARAM_EXEC params. I'd vote against that too. PARAM_EXEC Params already have more than one purpose. regards, tom lane
On 26.05.2011 00:31, Tom Lane wrote: > The main annoyance here is that standalone expressions may now need > Param slots to execute, which will complicate places that need to > evaluate them, but there's probably no way around that Yeah, I can see two complications: 1. Domain constraints Domain constraint check expressions are fetched and constant-folded separately from the enclosing expression, in ExecInitExpr(). In order to allocate distinct paramids for any CASE values within the domain check expression, we'd need to know how many paramids are in use in the parent expression. We could dig into the parent PlanState and its EState to get that, but we don't have that for stand-alone expressions. 2. Index expressions Index expressions are stored in relcache after constant evaluation, so any CaseTestExprs will already be replaced with Params. When the recheck expression is evaluated, the paramids used in the recheck expression will clash with real PARAM_EXEC params used to pass values up and down subqueries, as well as any params used for CASEs. I think we can work around both of those by just saving and restoring the value of each Param that we set while evaluating an expression, as the values should not be used simultaneously, but it makes me a bit uncomfortable. If we ever try to inline those expressions into other expressions, we'll be right back to the issue we have with nested CASE now. I'm not sure if we might already do that for index recheck expressions. Alternatively, we could adjust the paramids when an expression is inlined into another, similar to what OffsetVarNodes does for Vars. For debugging purposes, it seems like a good idea to invent a new kind of Param for these, and keep them separate from PARAM_EXEC params. The scope would be different, PARAM_EXECs are used to pass values between planner nodes, while these new kind of Params would be used to pass values within a single expression. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 30.05.2011 17:21, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> I think we can work around both of those by just saving and restoring >> the value of each Param that we set while evaluating an expression, > > Huh? That's a waste of time and effort. Just make sure that each such > spot has its own Param number. That's why I'm telling you to do it in > the planner, not the parser --- it is easy to assign globally-unique- > across-the-plan numbers at plan time, in fact we do it already. Yeah, I got that part. What I'm saying is that it's not that easy, because of the two issues, domain constraints and index expressions, that I mentioned. Here's a WIP patch, but those two issues have not been addressed yet. I'm sure those are not insurmountable problems, I'm just trying to figure out the best way to surmount them: For domain constraints, ExecInitExpr could assign globally-unique param numbers if it knew how many params are in use. That's trivial for expressions in plans, as you have access to the EState via the PlanState, and EState can include the number of params assigned. For a stand-alone expression, we don't have that. There is no global information whatsoever for stand-alone expressions, ExecInitExpr only sees the current node it's dealing with. Perhaps we need to add the concept of a global plan For index expressions, we could use a function similar to ChangeVarNodes(), that shifts all the paramids in the already-planned expression, preparing it for inclusion within the enclosing plan. I'm a bit worried that that might screw up the logic used to compare if an expression matches the index expression, though; the param ids in the two expressions won't match. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Вложения
On 31.05.2011 19:10, Heikki Linnakangas wrote: > For index expressions, we could use a function similar to > ChangeVarNodes(), that shifts all the paramids in the already-planned > expression, preparing it for inclusion within the enclosing plan. I'm a > bit worried that that might screw up the logic used to compare if an > expression matches the index expression, though; the param ids in the > two expressions won't match. Yeah, the expression comparison logic gets all confused by this :-(. I couldn't figure out a way to make it work without making the comparison a whole lot more complicated than it is today. I'm going back to my original thoughts of a new kind of node to replace CaseTestExpr, which allows referencing values from upper levels in the expression tree. So, here's a WIP patch using that approach. I've replaced CaseTestExpr with ExpressionParam. ExpressionParam has a levelsup field, which is similar to varlevelsup in Var. With levelsup == 0, ExpressionParam works just like CaseTestExpr did. With levelsup == 1, it refers to the value from above the enclosing CaseExpr (or any other node that uses these ExpressionParams/CaseTestExprs). The complicated part is to ensure that levelsup is always set correctly. At parse time, levelsup is always set to 0, as the syntax doesn't allow referencing upper levels directly. When an SQL function is inlined, any ExpressionParams in the expressions that are substituted for Params need to have their levelsup adjusted, so that it still refers to the right value if there's CASE expressions in the inlined function. Also, when an ExpressionParam is replaced with a Const, the levelsup fields of any other ExpressionParams within the CaseExpr referring to higher levels need to have their levelsup decremented to account for the fact that the CaseExpr doesn't push the expression parameter anymore. At execution time, the expression parameters form a stack. We've always done the save-restore logic, but the stack is now represented explicitly as a List in ExprContext. When an ExpressionParam is evaluated, the nth element is fetched from the stack, corresponding to levelsup. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Вложения
Hello 2011/6/3 Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>: > On 31.05.2011 19:10, Heikki Linnakangas wrote: >> >> For index expressions, we could use a function similar to >> ChangeVarNodes(), that shifts all the paramids in the already-planned >> expression, preparing it for inclusion within the enclosing plan. I'm a >> bit worried that that might screw up the logic used to compare if an >> expression matches the index expression, though; the param ids in the >> two expressions won't match. > > Yeah, the expression comparison logic gets all confused by this :-(. I > couldn't figure out a way to make it work without making the comparison a > whole lot more complicated than it is today. I'm going back to my original > thoughts of a new kind of node to replace CaseTestExpr, which allows > referencing values from upper levels in the expression tree. > > So, here's a WIP patch using that approach. I've replaced CaseTestExpr with > ExpressionParam. ExpressionParam has a levelsup field, which is similar to > varlevelsup in Var. With levelsup == 0, ExpressionParam works just like > CaseTestExpr did. With levelsup == 1, it refers to the value from above the > enclosing CaseExpr (or any other node that uses these > ExpressionParams/CaseTestExprs). > I have a query - should be ExpressionParam used for removing a multiple function call when composite result is expanded? With some similar optimization a SELECT (fce(x)).* FROM tab will be optimal Regards Pavel Stěhule > The complicated part is to ensure that levelsup is always set correctly. At > parse time, levelsup is always set to 0, as the syntax doesn't allow > referencing upper levels directly. When an SQL function is inlined, any > ExpressionParams in the expressions that are substituted for Params need to > have their levelsup adjusted, so that it still refers to the right value if > there's CASE expressions in the inlined function. Also, when an > ExpressionParam is replaced with a Const, the levelsup fields of any other > ExpressionParams within the CaseExpr referring to higher levels need to have > their levelsup decremented to account for the fact that the CaseExpr doesn't > push the expression parameter anymore. > > At execution time, the expression parameters form a stack. We've always done > the save-restore logic, but the stack is now represented explicitly as a > List in ExprContext. When an ExpressionParam is evaluated, the nth element > is fetched from the stack, corresponding to levelsup. > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
On 16.06.2011 22:46, Pavel Stehule wrote: > I have a query - should be ExpressionParam used for removing a > multiple function call when composite result is expanded? > > With some similar optimization a SELECT (fce(x)).* FROM tab will be optimal Hmm, I don't think it will work as is for that purpose, as each targetlist item is a separate expression, and ExpressionParams only work within an expression. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > The complicated part is to ensure that levelsup is always set correctly. > At parse time, levelsup is always set to 0, as the syntax doesn't allow > referencing upper levels directly. When an SQL function is inlined, any > ExpressionParams in the expressions that are substituted for Params need > to have their levelsup adjusted, so that it still refers to the right > value if there's CASE expressions in the inlined function. Also, when an > ExpressionParam is replaced with a Const, the levelsup fields of any > other ExpressionParams within the CaseExpr referring to higher levels > need to have their levelsup decremented to account for the fact that the > CaseExpr doesn't push the expression parameter anymore. I believe this is an unworkably complex, and almost certainly buggy Rube Goldberg device. Even if it manages to work today, it's going to be impossible to maintain those levelsup values correctly during any sort of expression rearrangement or optimization. Please take another look at just assigning a PARAM_EXEC parameter per Case expression. regards, tom lane
On 16.06.2011 23:56, Tom Lane wrote: > Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >> The complicated part is to ensure that levelsup is always set correctly. >> At parse time, levelsup is always set to 0, as the syntax doesn't allow >> referencing upper levels directly. When an SQL function is inlined, any >> ExpressionParams in the expressions that are substituted for Params need >> to have their levelsup adjusted, so that it still refers to the right >> value if there's CASE expressions in the inlined function. Also, when an >> ExpressionParam is replaced with a Const, the levelsup fields of any >> other ExpressionParams within the CaseExpr referring to higher levels >> need to have their levelsup decremented to account for the fact that the >> CaseExpr doesn't push the expression parameter anymore. > > I believe this is an unworkably complex, and almost certainly buggy > Rube Goldberg device. Even if it manages to work today, it's going to > be impossible to maintain those levelsup values correctly during > any sort of expression rearrangement or optimization. > > Please take another look at just assigning a PARAM_EXEC parameter per > Case expression. I've added this to the TODO list, hopefully someone more skilled with the planner than me will pick this up... -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com