Обсуждение: Constant propagation and similar issues
I'm sure your aware of these limitations, but I'd thought I'd mention them just in case, and to see if you have plans to sort them out: I have a query of the form: SELECT * FROM .... WHERE (now()-date1) > 'interval 1 day'; ..i.e. all rows 'older' than 1 day. This could be efficiently processed using the index on date1, but sadly pg doesn't know this ;-( This transforms an operation which should be O(1) to O(rows).... More worryingly, when I investigated the above, I find it doesn't even use the index for SELECT * FROM .... WHERE date1 > '2000-09-11 00:00:00'::datetime - '1 hour'::interval; ...so it doesn't realise that constant-constant is constant, notwithstanding the more complex issues that now() is pseudo-constant. This could be fixed by constant folding, I guess; any plans for that? Jules
Jules Bean <jules@jellybean.co.uk> writes: > I have a query of the form: > SELECT * FROM .... WHERE (now()-date1) > 'interval 1 day'; > ..i.e. all rows 'older' than 1 day. This could be efficiently > processed using the index on date1, but sadly pg doesn't know this ;-( No, and I don't think it should. Should we implement a general algebraic equation solver, and fire it up for every single query, in order to see if the user has written an indexable condition in a peculiar form? I don't think we want to expend either the development effort or the runtime on that. If you are concerned about performance of this sort of query, you'll need to transform it to SELECT * FROM .... WHERE date1 < now() - interval '1 day'; Of course that still leaves you with problem (b), > SELECT * FROM .... > WHERE date1 > '2000-09-11 00:00:00'::datetime - '1 hour'::interval; > ...so it doesn't realise that constant-constant is constant, > notwithstanding the more complex issues that now() is pseudo-constant. Most of the datetime operations are not considered constant-foldable. The reason is that type timestamp has a special value CURRENT that is a symbolic representation of current time (this is NOT what now() produces, but might be thought of as a data-driven way of invoking now()). This value will get reduced to a simple constant when it is fed into an arithmetic operation. Hence, premature evaluation changes the results and would not be a correct optimization. AFAIK hardly anyone actually uses CURRENT, and I've been thinking of proposing that we eliminate it to make the world safe for constant- folding timestamp operations. (Thomas, any comments here?) In the meantime, there is a workaround that's been discussed on the mailing lists before --- create a function that hides the "unsafe-to-fold" operations and mark it iscachable: create function ago(interval) returns timestamp as'select now() - $1' language 'sql' with (iscachable); Then something like SELECT * FROM .... WHERE date1 < ago('1 day'); will be considered indexable. You can shoot yourself in the foot with this --- don't try to write ago(constant) in a rule or function definition --- but in interactive queries it'll get the job done. regards, tom lane
On Mon, Sep 11, 2000 at 11:15:58AM -0400, Tom Lane wrote: > > Most of the datetime operations are not considered constant-foldable. > The reason is that type timestamp has a special value CURRENT that > is a symbolic representation of current time (this is NOT what now() > produces, but might be thought of as a data-driven way of invoking > now()). This value will get reduced to a simple constant when it is > fed into an arithmetic operation. Hence, premature evaluation changes > the results and would not be a correct optimization. > > AFAIK hardly anyone actually uses CURRENT, and I've been thinking of > proposing that we eliminate it to make the world safe for constant- > folding timestamp operations. (Thomas, any comments here?) > I checked the ansi SQL'99 docs, and CURRENT as a date special constant is not a part of the standard (although CURRENT is a keyword: it is used in the context of cursors) Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
On Mon, Sep 11, 2000 at 11:15:58AM -0400, Tom Lane wrote: > Jules Bean <jules@jellybean.co.uk> writes: > > I have a query of the form: > > SELECT * FROM .... WHERE (now()-date1) > 'interval 1 day'; > > ..i.e. all rows 'older' than 1 day. This could be efficiently > > processed using the index on date1, but sadly pg doesn't know this ;-( > > No, and I don't think it should. Should we implement a general > algebraic equation solver, and fire it up for every single query, > in order to see if the user has written an indexable condition in > a peculiar form? I don't think we want to expend either the development > effort or the runtime on that. If you are concerned about performance > of this sort of query, you'll need to transform it to > > SELECT * FROM .... WHERE date1 < now() - interval '1 day'; Well, I shall speak quietly and timidly, for I'm not offering to do the work, and I respect that other tasks are both more interesting and more important. However, it does seem to me that PostgreSQL /should/ be able to make these transformations (at least, it should IMO recognise that given an expression of the form a + b - c + d < e - f + g where exactly one of a..g is a column name, and the rest are constant, that is a candidate for using the index). > > Of course that still leaves you with problem (b), > > > SELECT * FROM .... > > WHERE date1 > '2000-09-11 00:00:00'::datetime - '1 hour'::interval; > > > ...so it doesn't realise that constant-constant is constant, > > notwithstanding the more complex issues that now() is pseudo-constant. > > Most of the datetime operations are not considered constant-foldable. > The reason is that type timestamp has a special value CURRENT that > is a symbolic representation of current time (this is NOT what now() > produces, but might be thought of as a data-driven way of invoking > now()). This value will get reduced to a simple constant when it is > fed into an arithmetic operation. Hence, premature evaluation changes > the results and would not be a correct optimization. > > AFAIK hardly anyone actually uses CURRENT, and I've been thinking of > proposing that we eliminate it to make the world safe for constant- > folding timestamp operations. (Thomas, any comments here?) Yes. I came across CURRENT in some examples somewhere, got very confused, decided I didn't like it, and used now() instead ;-) I now understand the problem. Personally, I'm thinking drop CURRENT, but only because I've never used it myself... > > In the meantime, there is a workaround that's been discussed on the > mailing lists before --- create a function that hides the > "unsafe-to-fold" operations and mark it iscachable: > > create function ago(interval) returns timestamp as > 'select now() - $1' language 'sql' with (iscachable); > > Then something like > > SELECT * FROM .... WHERE date1 < ago('1 day'); > > will be considered indexable. You can shoot yourself in the foot with > this --- don't try to write ago(constant) in a rule or function > definition --- but in interactive queries it'll get the job done. Thanks very much. I shall try that. Jules
On Mon, Sep 11, 2000 at 12:22:39PM -0400, Tom Lane wrote: > Jules Bean <jules@jellybean.co.uk> writes: > > However, it does seem to me that PostgreSQL /should/ > > be able to make these transformations (at least, it should IMO > > recognise that given an expression of the form a + b - c + d < e - f > > + g where exactly one of a..g is a column name, and the rest are > > constant, that is a candidate for using the index). > > Mumble. I think that'd be a very difficult thing to do without losing > the datatype extensibility of the system. Right now, the only reason > that "a < b" is considered indexable is that the optimizer has a table > that tells it "<" is an indexable operator for btree indexes with > certain datatypes (opclasses). Neither the optimizer nor the btree code > has any real understanding of the relationships between "<" and "-", say. > There is no part of the system anywhere with understanding of algebraic > identities like "a - b < c can be transformed to a < b + c", and no way > I can see to add such knowledge without making it *substantially* harder > to add new datatypes and operators. Yes, actually something like this occurred to me after I sent the above email. I had forgotten about the (rather pretty) extensible type system; I can see that makes spotting optimisations such as the above much more difficult. Seems like it might make a nice subject for a paper, actually. > > Between that and the runtime that would be wasted during typical queries > (IMHO searching for rearrangeable clauses would usually be fruitless), > I really doubt that this is a good goal to pursue in Postgres. I'm afraid I can't buy that second argument ;-) The time it takes to optimise a query is asymptotically irrelevant, after all... Jules
On Mon, Sep 11, 2000 at 10:47:04AM -0500, Ross J. Reedstrom wrote: > On Mon, Sep 11, 2000 at 11:15:58AM -0400, Tom Lane wrote: > > > > Most of the datetime operations are not considered constant-foldable. > > The reason is that type timestamp has a special value CURRENT that > > is a symbolic representation of current time (this is NOT what now() > > produces, but might be thought of as a data-driven way of invoking > > now()). This value will get reduced to a simple constant when it is > > fed into an arithmetic operation. Hence, premature evaluation changes > > the results and would not be a correct optimization. > > > > AFAIK hardly anyone actually uses CURRENT, and I've been thinking of > > proposing that we eliminate it to make the world safe for constant- > > folding timestamp operations. (Thomas, any comments here?) > > > > I checked the ansi SQL'99 docs, and CURRENT as a date special constant > is not a part of the standard (although CURRENT is a keyword: it is > used in the context of cursors) > Following up to myself: Ah, I had forgotten that CURRENT is a magic value, like 'infinity'. The standard does specify in section 6.19: CURRENT_DATE, CURRENT_TIME, LOCALTIME, CURRENT_TIMESTAMP, and LOCALTIMESTAMP as <datetime value function> Which are currently implemented as generating the special value 'CURRENT', which then get's stored in the column. This strikes me as _not_ standards compliant. What do other DB's do with these? I think that they should be equivalent to now(), returning a static date that is stored. I do find the timestamp special values 'infinity' and '-infinity' very useful, but have never found a use for 'current'. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
Jules Bean <jules@jellybean.co.uk> writes: > However, it does seem to me that PostgreSQL /should/ > be able to make these transformations (at least, it should IMO > recognise that given an expression of the form a + b - c + d < e - f > + g where exactly one of a..g is a column name, and the rest are > constant, that is a candidate for using the index). Mumble. I think that'd be a very difficult thing to do without losing the datatype extensibility of the system. Right now, the only reason that "a < b" is considered indexable is that the optimizer has a table that tells it "<" is an indexable operator for btree indexes with certain datatypes (opclasses). Neither the optimizer nor the btree code has any real understanding of the relationships between "<" and "-", say. There is no part of the system anywhere with understanding of algebraic identities like "a - b < c can be transformed to a < b + c", and no way I can see to add such knowledge without making it *substantially* harder to add new datatypes and operators. Between that and the runtime that would be wasted during typical queries (IMHO searching for rearrangeable clauses would usually be fruitless), I really doubt that this is a good goal to pursue in Postgres. regards, tom lane
Jules Bean <jules@jellybean.co.uk> writes: > Presumably then, the optimizer doesn't even know that + is > commutative, so it can't fold the constant in (5 + a + 4). Actually, it does know that + is commutative, courtesy of the oprcom column in pg_operator. But it doesn't know that + is associative, which is something you must also assume to transform 5 + a + 4 (really (5 + a) + 4 in the parser output) into a + (5 + 4). Since, in general, the associative law does NOT hold for computer arithmetic (think about intermediate-result overflow, not to mention roundoff error in floating-point cases), I'm hesitant to want to put such assumptions in. But the more serious problem is the search space involved in discovering that there is a path of manipulations that leads to a more completely reducible expression. With a large expression that could be an exponential time cost --- even if the resultant savings is zero. I do not buy your previous comment that the cost of optimization is negligible. We've already had to put in heuristics to prevent the system from blowing up in cnfify() for moderately large WHERE clauses --- it used to be that a few dozen OR (a=1 AND b=2) OR (a=4 AND b=5) kinds of clauses would bring the optimizer to its knees. And that's for a well-understood deterministic simplification that affects only AND/OR/NOT operators, with no search involved to decide what to do with them. What you're proposing would affect many more operators and require a combinatorial search to see what could be done to the expression. So I stand by my opinion that this isn't likely to be a profitable path to pursue. regards, tom lane
> AFAIK hardly anyone actually uses CURRENT, and I've been thinking of > proposing that we eliminate it to make the world safe for constant- > folding timestamp operations. (Thomas, any comments here?) Well, it is a feature from "the old days". Pretty neat one at that, and is an example of a useful feature not found in other DBs or in standards, but which might show up someday because they are useful. Throwing those things away one at a time will end us up at the lowest common denominator, eventually :( Another way of looking at the problem is to ask how we could retain this feature in the face of the other optimization "desirements". istm that types which have multiple behaviors could be queried for the behavior of a particular example by the optimizer. For most types, a "query" would not be necessary (so there is minimal overhead), but for this case a function could return the property of an example as either cachable or not. Perhaps a true "serial type" would need similar behaviors, as might other future types. - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> AFAIK hardly anyone actually uses CURRENT, and I've been thinking of >> proposing that we eliminate it to make the world safe for constant- >> folding timestamp operations. (Thomas, any comments here?) > Well, it is a feature from "the old days". Pretty neat one at that, and > is an example of a useful feature not found in other DBs or in > standards, but which might show up someday because they are useful. I'm not convinced that it is useful. What I think it is is a good way of shooting yourself in the foot, because it's so hard to control when 'CURRENT' will be reduced to a specific time value. I have no problem with the datetime input converters accepting the input string 'CURRENT' and immediately replacing it with the current time. That behavior is clearly useful and creates no semantic issues. But I don't think that a special data value that symbolically represents current time is either useful or well-defined. Just to give one example of why the concept is broken: consider an index on a timestamp column that contains some CURRENT values. Today the index might look like2000-01-01 11:33:05-042000-09-17 14:39:44-04CURRENT2000-09-18 14:11:07-04 which is fine. But twenty-four hours from now, this index will be out of order and hence broken. (The btree routines do not cope at all gracefully with logically-inconsistent indexes.) So I still recommend that we remove the special value CURRENT. Then we can mark the datetime-related operators constant-foldable, which will eliminate a complaint that we can otherwise expect to hear constantly (saw another instance today in pgsql-sql). regards, tom lane
At 16:37 11/09/00 +0000, Thomas Lockhart wrote: >> AFAIK hardly anyone actually uses CURRENT, and I've been thinking of >> proposing that we eliminate it to make the world safe for constant- >> folding timestamp operations. (Thomas, any comments here?) > >Well, it is a feature from "the old days". Pretty neat one at that, and >is an example of a useful feature not found in other DBs or in >standards, but which might show up someday because they are useful. >Throwing those things away one at a time will end us up at the lowest >common denominator, eventually :( Well, Dec RDB has it: the concept of a 'computed by' field. You can define view-like elements in a table, eg: Create Table t1 ( f1 integer, f2 real, f12_avg computed by (f1 + f2)/2, f4 computed by current_timestamp). It's actually quite a useful feature, but unlike PGSQL, Dec RDB does not allow indexes to be created on the fields. Clearly one can just define a view to get similar results, but it is not as clean. >Another way of looking at the problem is to ask how we could retain this >feature in the face of the other optimization "desirements". istm that >types which have multiple behaviors could be queried for the behavior of >a particular example by the optimizer. For most types, a "query" would >not be necessary (so there is minimal overhead), but for this case a >function could return the property of an example as either cachable or >not. > >Perhaps a true "serial type" would need similar behaviors, as might >other future types. This seems like a nice and extensible idea - allowing types to define a function for testing constancy. At least to allow: field < (current_timestamp - interval '1 hour') to use an index properly. I guess an alternative would be to define a special 'iscacheable' function: eg. function constant(arg <all-types>) returns <same-type-as-arg> so we could have: field < constant(current_timestamp - interval '1 hour') although this looks suspiciously like an optimizer hint which I think people have been opposed to in the past... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
At 19:05 18/09/00 +1000, Philip Warner wrote: > >I guess an alternative would be to define a special 'iscacheable' function: >eg. > > function constant(arg <all-types>) returns <same-type-as-arg> > >so we could have: > > field < constant(current_timestamp - interval '1 hour') > >although this looks suspiciously like an optimizer hint which I think >people have been opposed to in the past... > I just saw the flaw with this: iscacheable is not enough when the args are deemed non-constant. I guess we'd need some kind of 'evaluate_once' flag, which is getting a little obtuse. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
> So I still recommend that we remove the special value CURRENT. Then we > can mark the datetime-related operators constant-foldable, which will > eliminate a complaint that we can otherwise expect to hear constantly > (saw another instance today in pgsql-sql). OK. - Thomas