Обсуждение: indexes and floats
Is this a bug that the index doesn't work on floats or is it a datatype mismatch thing? ----- template1=> create table foo (x float4, y float4); CREATE template1=> create index foox on foo(x); CREATE template1=> create index fooy on foo(y); CREATE template1=> insert into foo values(123.45,678.90); INSERT 239376 1 template1=> insert into foo values(123.25,678.95); INSERT 239377 1 template1=> explain select * from foo where x = 123.4; NOTICE: QUERY PLAN: Seq Scan on foo (cost=0.00 size=0 width=8) EXPLAIN template1=> ----- Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null # include <std/disclaimers.h> TEAM-OS2 Online Searchable Campground Listings http://www.camping-usa.com "There is no outfit less entitled to lecture me about bloat than the federal government" -- Tony Snow ==========================================================================
> Is this a bug that the index doesn't work on floats or is it a > datatype mismatch thing? > Seq Scan on foo (cost=0.00 size=0 width=8) Likely neither. Notice the cost of a sequential scan; it is zero. Why bother doing an index scan? You need at least two more things to happen to provoke Postgres into using your index: 1) Have more entries in the table; rule of thumb is O(100) before an index will be used. 2) Run vacuum after getting that many entries to make sure the optimizer will know about the configuration of the tables and indices. Good luck. - Tom
On 03-Aug-98 Thomas G. Lockhart wrote: >> Is this a bug that the index doesn't work on floats or is it a >> datatype mismatch thing? >> Seq Scan on foo (cost=0.00 size=0 width=8) > > Likely neither. Notice the cost of a sequential scan; it is zero. Why > bother doing an index scan? > > You need at least two more things to happen to provoke Postgres into > using your index: > > 1) Have more entries in the table; rule of thumb is O(100) before an > index will be used. > > 2) Run vacuum after getting that many entries to make sure the optimizer > will know about the configuration of the tables and indices. Here's why I'm bringing it up. I ran vacuum on the database yesterday. The table consists of city, state (both text), lon and lat (both float4). The text index is working but not the floats. I have another table with 12,000 - 14,000 rows and I'm getting the same thing. ----- campsites=> explain select lon from locations where lon = 83.5; NOTICE: QUERY PLAN: Seq Scan on locations (cost=7263.30 size=2 width=4) EXPLAIN campsites=> select count(*) from locations; count ------ 169797 (1 row) campsites=> ----- Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null # include <std/disclaimers.h> TEAM-OS2 Online Searchable Campground Listings http://www.camping-usa.com "There is no outfit less entitled to lecture me about bloat than the federal government" -- Tony Snow ==========================================================================
> >> Is this a bug that the index doesn't work on floats or is it a > >> datatype mismatch thing? > >> Seq Scan on foo (cost=0.00 size=0 width=8) > The table consists of city, state (both text), lon and lat (both > float4). The text index is working but not the floats. I have another > table with 12,000 - 14,000 rows and I'm getting the same thing. Hmm. You are right. I'm glad you insisted on pursuing this, but I might have been able to see the problem right away if you had described it in detail (e.g. "floats" is not clearly float4) and if your initial example had actually been with float4 data (it showed a width of 8 which is likely to be a float8) :/ Anyway, the problem seems to be specific to float4. The workaround is to use float8 for now, and we'll look into fixing the problem for the next release... - Tom Test cases from the regression database follow: regression=> create table test (f4 float4); CREATE regression=> insert into test select seqno from bt_f8_heap; INSERT 0 10000 regression=> create index ti on test using btree (f4); CREATE regression=> vacuum; VACUUM regression=> explain select f4 from test where f4 = 500.0; NOTICE: QUERY PLAN: Seq Scan on test (cost=394.00 size=1 width=4) EXPLAIN regression=> create table test8 as select seqno from bt_f8_heap; SELECT regression=> create index t8 on test8 using btree (seqno); CREATE regression=> vacuum; VACUUM regression=> explain select seqno from test8 where seqno = 500.0; NOTICE: QUERY PLAN: Index Scan using t8 on test8 (cost=2.05 size=1 width=8) EXPLAIN
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: >> The table consists of city, state (both text), lon and lat (both >> float4). The text index is working but not the floats. I have another >> table with 12,000 - 14,000 rows and I'm getting the same thing. > Anyway, the problem seems to be specific to float4. The workaround is to > use float8 for now, and we'll look into fixing the problem for the next > release... Ah-hah, all of a sudden this looks *real* familiar. I bet it's because pgsql is not noticing that "500.0" can be interpreted as a float4. Let's try it. play=> create table f8 (x float8); CREATE play=> \copy f8 from countup; // 0..999 in a text file Successfully copied. play=> create index f8_i on f8 using btree (x); CREATE play=> create table f4 (x float4); CREATE play=> \copy f4 from countup; Successfully copied. play=> create index f4_i on f4 using btree (x); CREATE play=> vacuum verbose analyze; .... blah blah play=> explain select x from f8 where x = 500; NOTICE: QUERY PLAN: Seq Scan on f8 (cost=40.00 size=100 width=8) EXPLAIN play=> explain select x from f8 where x = 500.0; NOTICE: QUERY PLAN: Index Scan using f8_i on f8 (cost=2.05 size=1 width=8) EXPLAIN play=> explain select x from f4 where x = 500; NOTICE: QUERY PLAN: Seq Scan on f4 (cost=40.00 size=100 width=4) EXPLAIN play=> explain select x from f4 where x = 500.0; NOTICE: QUERY PLAN: Seq Scan on f4 (cost=40.00 size=1 width=4) EXPLAIN play=> explain select x from f4 where x = 500 :: float4; NOTICE: QUERY PLAN: Index Scan using f4_i on f4 (cost=2.05 size=1 width=4) EXPLAIN play=> explain select x from f4 where x = 500.0 :: float4; ERROR: parser_typecast: cannot cast this expression to type 'float4' play=> (This is with cvs sources of a few days ago.) I see two different bugs here: 1. select ... where x = constant; is optimized to an index scan only if the constant is of the exact type of the field x. Apparently, the coercion of the constant to match the type of x happens only after the optimizer decides it doesn't know what to do. The coercion ought to happen *before* the optimizer runs. (I have no idea whether this is a new bug caused by the recent type-system changes, or whether it existed in 6.3.2 and earlier.) 2. Type coercion fails for "500.0 :: float4" (ditto for "500.0 :: float8" btw). Presumably this is a simple localized bug in the parser or the type coercion logic for floats. I had previously complained of bug #1 in connection with OIDs; with the present sources, "where oid = 123456" will not use an index on OID, while "where oid = 123456::oid" will. I will bet lunch (at the nearest McD's, I'm not rich ;-)) that Vince Vielhaber's recent gripe about select city from locations where lower(city) = lower('st. ignace'); failing to use an index create index locations_city on locations(lower(city) text_ops); is an artifact of the same sort of type-mismatch problem. regards, tom lane
> Ah-hah, all of a sudden this looks *real* familiar. I bet it's > because pgsql is not noticing that "500.0" can be interpreted as a > float4. Let's try it. Oh, you have nailed it! This is interesting because (probably) a query like select f4 from t4 where f4 = 500.0; is being automatically "upgraded" in the parser backend to select f4 from t4 where float8(f4) = 500.0; So, since there is no functional index float8(f4) on the table we cannot use an existing index on f4 to advantage. This may be a result of my recent enhancements to the automatic type coersion code. But I'm a little suprised that v6.3.x doesn't just complain about a type mismatch but instead actually works. It may be that the old code which converted constants using intermediate strings worked (sort of) for this case. In general, the pre-enhancement code _only_ tried to convert constants, and complained about type mismatches when non-constants were involved. So far in the examples, imho this (new?) feature isn't a _fatal_ problem, because you want to handle the following query correctly (I'll switch to an int column to make it clearer): select i4 from t4 where i4 < 500.1; Now, if we do the "optimizable thing" blindly, then we would transform this to select i4 from t4 where i4 < int4(500.1); But of course this would produce the wrong result if the table contains a value of 500. Perhaps something a bit different could be implemented, but it probably wouldn't generalize very well with the extensible type system. So, is there a problem to fix, or just documentation to write? I've already written a new "Type Conversion" chapter for the User's Guide, but haven't mentioned this specific issue. - Tom
> 1. select ... where x = constant; is optimized to an index scan > only if the constant is of the exact type of the field x. > Apparently, the coercion of the constant to match the type of x > happens only after the optimizer decides it doesn't know what to do. > The coercion ought to happen *before* the optimizer runs. > (I have no idea whether this is a new bug caused by the recent > type-system changes, or whether it existed in 6.3.2 and earlier.) I think the current sources Thomas has modified work better, perhaps? > > 2. Type coercion fails for "500.0 :: float4" (ditto for "500.0 :: float8" > btw). Presumably this is a simple localized bug in the parser or the type > coercion logic for floats. > > I had previously complained of bug #1 in connection with OIDs; > with the present sources, "where oid = 123456" will not use > an index on OID, while "where oid = 123456::oid" will. > > I will bet lunch (at the nearest McD's, I'm not rich ;-)) that > Vince Vielhaber's recent gripe about > select city from locations where lower(city) = lower('st. ignace'); > failing to use an index > create index locations_city on locations(lower(city) text_ops); > is an artifact of the same sort of type-mismatch problem. > > regards, tom lane > > -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Tom Lane wrote: > > I will bet lunch (at the nearest McD's, I'm not rich ;-)) that > Vince Vielhaber's recent gripe about > select city from locations where lower(city) = lower('st. ignace'); > failing to use an index > create index locations_city on locations(lower(city) text_ops); > is an artifact of the same sort of type-mismatch problem. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ No. This is the result of using lower('st. ignace') - function: optimizer considers clause as usable for index only for constants and parameters! We discussed this ~ month ago. lower('st. ignace') could be replaced by parameter of PARAM_EXEC type (implemented for subqueries) to be 1. considered by optimizer as index key value, 2. evaluated _ONCE_ by executor. As I mentioned before, lower('st. ignace') will be evaluated FOR EACH tuple in SeqScan!!! PARAM_EXEC was implemented to handle queries like this: select * from A where A.x = (select max(B.y) from B) - subquery will be executed once and index on A (x) will be used (if exists). Optimizer just rewrites this query as select * from A where A.x = _parameter_ and stores information about _parameter_ in InitPlan of execution plan. Look: vac=> explain select * from test where x = lower('a'); NOTICE: QUERY PLAN: Seq Scan on test (cost=40.00 size=100 width=12) ^^^^^^^^ EXPLAIN vac=> explain select * from test where x = (select lower('a')); NOTICE: QUERY PLAN: Index Scan using itest2 on test (cost=2.05 size=1 width=12) ^^^^^^^^^^ InitPlan -> Result (cost=0.00 size=0 width=0) Nevertheless, vac=> explain select * from test where lower(x) = (select lower('a')); NOTICE: IndexSelectivity: no key -1 in index 20305 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ NOTICE: QUERY PLAN: Seq Scan on test (cost=40.00 size=100 width=12) ^^^^^^^^ InitPlan -> Result (cost=0.00 size=0 width=0) - something is broken for functional indices... Vadim
> > Ah-hah, all of a sudden this looks *real* familiar. I bet it's > > because pgsql is not noticing that "500.0" can be interpreted as a > > float4. Let's try it. > > Oh, you have nailed it! This is interesting because (probably) a query > like > > select f4 from t4 where f4 = 500.0; > > is being automatically "upgraded" in the parser backend to > > select f4 from t4 where float8(f4) = 500.0; > > So, since there is no functional index float8(f4) on the table we cannot > use an existing index on f4 to advantage. > > This may be a result of my recent enhancements to the automatic type > coersion code. But I'm a little suprised that v6.3.x doesn't just > complain about a type mismatch but instead actually works. It may be > that the old code which converted constants using intermediate strings > worked (sort of) for this case. In general, the pre-enhancement code > _only_ tried to convert constants, and complained about type mismatches > when non-constants were involved. Yes, it did numeric constants, I think. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: > Oh, you have nailed it! This is interesting because (probably) a query > like > select f4 from t4 where f4 = 500.0; > is being automatically "upgraded" in the parser backend to > select f4 from t4 where float8(f4) = 500.0; > So, since there is no functional index float8(f4) on the table we cannot > use an existing index on f4 to advantage. OK, that sounds plausible. But in my examples, play=> explain select x from f8 where x = 500; NOTICE: QUERY PLAN: Seq Scan on f8 (cost=40.00 size=100 width=8) Your explanation implies that here, the parser is converting to select x from f8 where int4(x) = 500; which is wrong for the same accuracy-loss reasons you cite later. (And if that isn't what it's doing, what then?) I think it would be a good idea if someone actually dug into this and verified what's going on. I have found some other cases that lead me to think there's more to this than we understand just yet. With an index on an int4 field, I get tree=> explain select * from marketorderhistory where sequenceNo = 140000; NOTICE: QUERY PLAN: Index Scan using marketorderhistory_sequenceno_i on marketorderhistory (cost=2.05 size=1 width=100) tree=> explain select * from marketorderhistory where sequenceNo > 140000; NOTICE: QUERY PLAN: Seq Scan on marketorderhistory (cost=63.38 size=449 width=100) which doesn't look like it could be explained by parser type coercions. Perhaps this one just indicates an omission from the list of type-specific routines that can be used for index comparisons? If so, maybe there are other omissions affecting the results for other types. > ... you want to handle the following query correctly (I'll > switch to an int column to make it clearer): > select i4 from t4 where i4 < 500.1; > Now, if we do the "optimizable thing" blindly, then we would transform > this to > select i4 from t4 where i4 < int4(500.1); > But of course this would produce the wrong result if the table contains > a value of 500. Perhaps something a bit different could be implemented, > but it probably wouldn't generalize very well with the extensible type > system. That's a good point. Still, it would be nice if the system had some reasonable amount of smarts about the "primitive" types that the parser has constant syntax for. In particular I think an automatic coercion of an int constant to float where needed would be a reasonable thing to expect. That's not happening now, see my example above. > So, is there a problem to fix, or just documentation to write? This one is most certainly a bug: play=> select x from f4 where x = 500.0 :: float4; ERROR: parser_typecast: cannot cast this expression to type 'float4' Beyond that, if I can force the right thing to happen by casting the constant to the type of the field, then I can live with it. I have seen a number of cases where the system wouldn't use an index even with a cast, however, so I'm not a happy camper yet. regards, tom lane
On 04-Aug-98 Tom Lane wrote: > OK, that sounds plausible. But in my examples, > > play=> explain select x from f8 where x = 500; > NOTICE: QUERY PLAN: > Seq Scan on f8 (cost=40.00 size=100 width=8) > > Your explanation implies that here, the parser is converting to > select x from f8 where int4(x) = 500; > which is wrong for the same accuracy-loss reasons you cite later. > (And if that isn't what it's doing, what then?) > > I think it would be a good idea if someone actually dug into this > and verified what's going on. I have found some other cases that > lead me to think there's more to this than we understand just yet. > With an index on an int4 field, I get What file(s) are these decisions actually made in? Vince. -- ========================================================================== Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null # include <std/disclaimers.h> TEAM-OS2 Online Searchable Campground Listings http://www.camping-usa.com "There is no outfit less entitled to lecture me about bloat than the federal government" -- Tony Snow ==========================================================================
> Your explanation implies that here, the parser is converting to > select x from f8 where int4(x) = 500; > which is wrong for the same accuracy-loss reasons you cite later. > (And if that isn't what it's doing, what then?) No, that isn't what it would do. The parser does know about a heirarchy of built-in data types, and would not downgrade a float8 to an int4. I've sent some e-mail about the new parser features, and have written them up in doc/src/sgml/typeconv.sgml (but haven't generated a new html version yet). > I think it would be a good idea if someone actually dug into this > and verified what's going on. I have found some other cases that > lead me to think there's more to this than we understand just yet. I'm pretty sure that there is another thing happening, which is getting in the way of using indices with your example using where lowercase(col) = lowercase('const') > That's a good point. Still, it would be nice if the system had some > reasonable amount of smarts about the "primitive" types that the > parser has constant syntax for. In particular I think an automatic > coercion of an int constant to float where needed would be a > reasonable thing to expect. That's not happening now, see my example > above. I agree with your points, but it already does exactly what you would want. I don't see an example above illustrating this problem. > > So, is there a problem to fix, or just documentation to write? > > This one is most certainly a bug: > > play=> select x from f4 where x = 500.0 :: float4; > ERROR: parser_typecast: cannot cast this expression to type 'float4' Might be easy. I'll look at it... Oh, the current workaround is to specify it as where x = '500.0'::float4; or where x = float4(500.0); > Beyond that, if I can force the right thing to happen by casting > the constant to the type of the field, then I can live with it. > I have seen a number of cases where the system wouldn't use an index > even with a cast, however, so I'm not a happy camper yet. Yeah, that was Vadim's example and he sez we need to add a bit of code to get things working the way you'd expect. - Tom
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: > No, that isn't what it would do. The parser does know about a heirarchy > of built-in data types, and would not downgrade a float8 to an int4. OK, that's good. But if it knows that much, I'd expect it to be able to *upgrade* an int4 to a float8 where appropriate. That's not happening, as in my prior example: >> play=> explain select x from f8 where x = 500.0; >> NOTICE: QUERY PLAN: >> Index Scan using f8_i on f8 (cost=2.05 size=1 width=8) >> >> play=> explain select x from f8 where x = 500; >> NOTICE: QUERY PLAN: >> Seq Scan on f8 (cost=40.00 size=100 width=8) The thing that I find peculiar about all this is that the system doesn't reject the query completely, as you'd expect it to if there were really a type mismatch problem. It takes it and just does it a lot slower than it should. It seems the type coercion will happen, but too late for the optimizer to notice. regards, tom lane
> > The parser does know about a heirarchy > > of built-in data types, and would not downgrade a float8 to an int4. > OK, that's good. But if it knows that much, I'd expect it to be able > to *upgrade* an int4 to a float8 where appropriate. That's not > happening Ah, but it _is_ happening. The problem lies later. Read on... > , as in my prior example: > >> play=> explain select x from f8 where x = 500; > >> NOTICE: QUERY PLAN: > >> Seq Scan on f8 (cost=40.00 size=100 width=8) > It seems the type coercion will happen, but too late for the > optimizer to notice. This is the specific case discussed by Vadim, and is, I believe, directly related to the select city from locations where lower(city) = lower('st. ignace'); example given by Vince. The parser is converting this query to become select x from f8 where x = float8(500); The problem appears to be that the optimizer/executor does not know how to evaluate the constant string once-only, and insists on doing a sequential scan for some reason. Not really in the parser's hands. Remember, the types are all set and the functions are all defined by the time the actual optimizer/executor sees anything. There may be a slight bit of "optimization" which happens in the parser, but only to shoehorn SQL92 into the Postgres backend. Oh, there are some heuristics in the parser to fudge the distinctions between char/varchar/text types to allow reuse of some of the support code, but that is a Bad Thing in principle. The downside to putting more heuristics into the parser (as opposed to upgrading the optimizer/executor to handle the case) is that it embeds more assumptions about the _behavior_ of, for example, int4->float8 conversions and would reduce the flexibility of the extensible type system. istm that we should be focusing on Vadim's hints on what it would take to use indices with function calls on constants... - Tom
> istm that we should be focusing on Vadim's hints on what it would take > to use indices with function calls on constants... Looking at Vadim's note again, maybe it will be the parser's duty to insert the PARAM_EXEC node; will need more details or some time to look at it... - Tom
"Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: > The parser is converting this query to become > select x from f8 where x = float8(500); > The problem appears to be that the optimizer/executor does not know how > to evaluate the constant string once-only, and insists on doing a > sequential scan for some reason. Ah, it's finally starting to make some sense to me. What you're saying is that this is a failure to do constant-folding. Doing a sequential scan would be appropriate if the righthand side of the comparison needed to be evaluated afresh at each row. If the optimizer thinks that, then that explains a lot. The question then is why the righthand side doesn't look like a constant. I'd have expected that any expression not involving a table attribute would be evaluated once (folded into a constant) before any decisions are made on how to perform the scan. Is that reasonable, or is there some aspect of SQL semantics that makes it wrong? If it is supposed to be happening, could it be that float8() is for some reason not marked as a safely foldable function? While I'm asking dumb questions: are "float8(500)" and "500::float8" treated differently? Actually, I can see that they are: play=> explain select x from f8 where x = 500::float8; NOTICE: QUERY PLAN: Index Scan using f8_i on f8 (cost=2.05 size=1 width=8) play=> explain select x from f8 where x = float8(500); NOTICE: QUERY PLAN: Seq Scan on f8 (cost=40.00 size=100 width=8) But why? Is there a difference in semantics? regards, tom lane
> The problem appears to be that the optimizer/executor does not know how > to evaluate the constant string once-only, and insists on doing a > sequential scan for some reason. Not really in the parser's hands. > Remember, the types are all set and the functions are all defined by the > time the actual optimizer/executor sees anything. There may be a slight > bit of "optimization" which happens in the parser, but only to shoehorn > SQL92 into the Postgres backend. Oh, there are some heuristics in the > parser to fudge the distinctions between char/varchar/text types to > allow reuse of some of the support code, but that is a Bad Thing in > principle. > > The downside to putting more heuristics into the parser (as opposed to > upgrading the optimizer/executor to handle the case) is that it embeds > more assumptions about the _behavior_ of, for example, int4->float8 > conversions and would reduce the flexibility of the extensible type > system. > > istm that we should be focusing on Vadim's hints on what it would take > to use indices with function calls on constants... I understand the problem now. There is code in the optimizer to handle functions on constants, but it does not appear to be in a place where index matching can use it. I will be looking over the code in the next few weeks, but not sure if I can figure it all out by then. I have added it to my personal TODO list. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> "Thomas G. Lockhart" <lockhart@alumni.caltech.edu> writes: > > The parser is converting this query to become > > select x from f8 where x = float8(500); > > > The problem appears to be that the optimizer/executor does not know how > > to evaluate the constant string once-only, and insists on doing a > > sequential scan for some reason. > > Ah, it's finally starting to make some sense to me. What you're saying > is that this is a failure to do constant-folding. Yep. I believe it happens in the executor, but doesn't appear to happen in the optimizer at a time when it would be useful. You can create functional indexes, and I think they are matched by the function, just not constants. > > Doing a sequential scan would be appropriate if the righthand side of > the comparison needed to be evaluated afresh at each row. If the > optimizer thinks that, then that explains a lot. Yep. I think that is the issue, and index matching does not pre-evaluate a function on a constant. > > The question then is why the righthand side doesn't look like a > constant. I'd have expected that any expression not involving a table > attribute would be evaluated once (folded into a constant) before any > decisions are made on how to perform the scan. Is that reasonable, or > is there some aspect of SQL semantics that makes it wrong? > > If it is supposed to be happening, could it be that float8() is for > some reason not marked as a safely foldable function? > > While I'm asking dumb questions: are "float8(500)" and "500::float8" > treated differently? Actually, I can see that they are: > > play=> explain select x from f8 where x = 500::float8; > NOTICE: QUERY PLAN: > Index Scan using f8_i on f8 (cost=2.05 size=1 width=8) > > play=> explain select x from f8 where x = float8(500); > NOTICE: QUERY PLAN: > Seq Scan on f8 (cost=40.00 size=100 width=8) > > But why? Is there a difference in semantics? Sure. In the :: case (or CAST (const AS type)), the parser actually converts the type INSIDE the parser to the proper type. In the float8() case, the value conversion is delayed until the executor. I may be wrong in some of this, but that is what I think is happening. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
Thomas G. Lockhart wrote: > > > istm that we should be focusing on Vadim's hints on what it would take > > to use indices with function calls on constants... > > Looking at Vadim's note again, maybe it will be the parser's duty to > insert the PARAM_EXEC node; will need more details or some time to look > at it... Sorry, but maybe it would be better to add new attribute to pg_proc (and change CREATE FUNCTION syntax) to let parser know does result of function call on constants depend on execution time or not. This would be much better just execute function in parser and replace function with constant. Currently, only random(), now() and SPI-functions should be replaced by PARAM_EXEC, all others could be evaluated by parser. Vadim
> Thomas G. Lockhart wrote: > > > > > istm that we should be focusing on Vadim's hints on what it would take > > > to use indices with function calls on constants... > > > > Looking at Vadim's note again, maybe it will be the parser's duty to > > insert the PARAM_EXEC node; will need more details or some time to look > > at it... > > Sorry, but maybe it would be better to add new attribute > to pg_proc (and change CREATE FUNCTION syntax) to let > parser know does result of function call on constants depend > on execution time or not. This would be much better just > execute function in parser and replace function with > constant. > Currently, only random(), now() and SPI-functions should be > replaced by PARAM_EXEC, all others could be evaluated by parser. I see. We have pg_proc.proiscachable. This looks like a good candidate, and is not used for anything currently. Maybe rename it to something clearer. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> Sorry, but maybe it would be better to add new attribute > to pg_proc (and change CREATE FUNCTION syntax) to let > parser know does result of function call on constants depend > on execution time or not. This would be much better just > execute function in parser and replace function with > constant. > Currently, only random(), now() and SPI-functions should be > replaced by PARAM_EXEC, all others could be evaluated by parser. I see. We have pg_proc.proiscachable. This looks like a good candidate, and is not used for anything currently. Maybe rename it to something clearer. Wouldn't the concept of VOLATILE be relevant here (ala the same keyword in C and C++)? In those languages the keyword influences the nature of optimizations that can be done, which seems to be the problems involved here. We could have a function bool pg_proc.is_volatile() to determine whether or not the result of the function call can be determined from its arguments. If so and the arguments are constants, the function could be evaluated once in the parser; otherwise, it would need to be evaluated for each tuple. The current situation, I presume, corresponds with all functions being VOLATILE and hence evaluated many times. We could 1) add new syntax for declaring user-defined functions to optionally include VOLATILE (and/or NONVOLATILE; default being VOLATILE in the absense of a keyword to maintain compatibility and to be safe), 2) have the is_volatile() function return appropriate values for all "standard" functions defined in the system, and 3) include handling of the distinction in the parser. Cheers, Brook
> > > Sorry, but maybe it would be better to add new attribute > > to pg_proc (and change CREATE FUNCTION syntax) to let > > parser know does result of function call on constants depend > > on execution time or not. This would be much better just > > execute function in parser and replace function with > > constant. > > Currently, only random(), now() and SPI-functions should be > > replaced by PARAM_EXEC, all others could be evaluated by parser. > > I see. We have pg_proc.proiscachable. This looks like a good > candidate, and is not used for anything currently. Maybe rename it to > something clearer. > > Wouldn't the concept of VOLATILE be relevant here (ala the same > keyword in C and C++)? In those languages the keyword influences the > nature of optimizations that can be done, which seems to be the > problems involved here. We could have a function bool > pg_proc.is_volatile() to determine whether or not the result of the > function call can be determined from its arguments. If so and the > arguments are constants, the function could be evaluated once in the > parser; otherwise, it would need to be evaluated for each tuple. In Illustra this is called "variant". A variant function (eg random()) must be evaluated each time. A nonvariant function can have it's result cached not just in the parser, and not just for constants, but also for any run of identical argument values such as might occur in a table scan. For example assuming projected_price_calc() is an expensive but nonvariant function and the rows returned from the join are ordered by stock symbol the following query wins big time if the result of projected_price_calc() can be cached from one execution to the next. select symbol, price from stocks, trades where trades.symbol = stocks.symbol and projected_price_calc(stocks.symbol) < price; That is, don't think of this as a special case in the parser for doing constant folding, think of it as a general way of "memoizing" or cacheing function results. Hence the current name "iscacheble" is actually pretty good. Also, perhaps instead of doing constant folding in the parser, consider makeing it part of rewrite. This pass would just traverse the tree looking for functions with constant arguements and would pre-evaluate them and and save the result in the wherever the cacheable results would be stored. No special case required except that the optimizer would have to notice that a pre-computed result was available to use as a key value. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - If simplicity worked, the world would be overrun with insects. -
David Gould wrote: > > > > > Wouldn't the concept of VOLATILE be relevant here (ala the same > > keyword in C and C++)? In those languages the keyword influences the > > nature of optimizations that can be done, which seems to be the > > problems involved here. We could have a function bool > > pg_proc.is_volatile() to determine whether or not the result of the > > function call can be determined from its arguments. If so and the > > arguments are constants, the function could be evaluated once in the > > parser; otherwise, it would need to be evaluated for each tuple. ^^^^^^^^^^^^^^ > In Illustra this is called "variant". A variant function (eg random()) must > be evaluated each time. A nonvariant function can have it's result cached ^^^^^^^^^ I don't like the idea to evaluate random() in WHERE x = random() for _each_ tuple. The same for WHERE insert_date = now() - 5 /* 5 days */. Imho, these functions should be evaluated ONCE by executor, but EACH time when executor starts. There is big difference between evaluation in parser/planner and in executor due to ability (currently, very limitted) to have prepared plans (parsed/planned). And nothing more. Imho, queries must return consistent set of data: if I want to get rows inserted 5 days ago and run query with WHERE insert_date = now() - 5 near 12PM then I risk to get inconsistent set of rows if now() will be evaluated for EACH tuple. > Also, perhaps instead of doing constant folding in the parser, consider makeing > it part of rewrite. This pass would just traverse the tree looking for > functions with constant arguements and would pre-evaluate them and and > save the result in the wherever the cacheable results would be stored. No > special case required except that the optimizer would have to notice that > a pre-computed result was available to use as a key value. This is bad for performance. Vadim
Vadim writes: > David Gould wrote: ... ^^^^^^^^^^^^^^ > > In Illustra this is called "variant". A variant function (eg random()) must > > be evaluated each time. A nonvariant function can have it's result cached > ^^^^^^^^^ > I don't like the idea to evaluate random() in WHERE x = random() > for _each_ tuple. The same for WHERE insert_date = now() - 5 > /* 5 days */. Imho, these functions should be evaluated ONCE > by executor, but EACH time when executor starts. There is big > difference between evaluation in parser/planner and in executor > due to ability (currently, very limitted) to have prepared plans > (parsed/planned). And nothing more. Imho, queries must return > consistent set of data: if I want to get rows inserted 5 days > ago and run query with WHERE insert_date = now() - 5 near 12PM > then I risk to get inconsistent set of rows if now() will be > evaluated for EACH tuple. Perhaps random was a bad example, the point I was trying to make was that it is a function that is not wholely determined by its arguments. However, I disagree with your contention about random() and now() also. Think about the statement insert into samples select random(), xval, yval from points; The intent being to associate a random value with each of the x,y pairs... > > Also, perhaps instead of doing constant folding in the parser, consider makeing > > it part of rewrite. This pass would just traverse the tree looking for > > functions with constant arguements and would pre-evaluate them and and > > save the result in the wherever the cacheable results would be stored. No > > special case required except that the optimizer would have to notice that > > a pre-computed result was available to use as a key value. > > This is bad for performance. What_ is the "This" that is bad for performance? It is not clear what you mean here. Do you mean memoizing? If so, I strongly disagree, and there is a ton of work on this that supports me. Or do you mean adding the cacheable function optimization to rewrite? If so, why is this a problem? I am assuming that this can be piggybacked on one of the existing expression tree traversals, so I just don't see how this creates more work than doing it in the parser. > Vadim > -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - If simplicity worked, the world would be overrun with insects. -
David Gould wrote: > > > ^^^^^^^^^ > > I don't like the idea to evaluate random() in WHERE x = random() > > for _each_ tuple. The same for WHERE insert_date = now() - 5 > > /* 5 days */. Imho, these functions should be evaluated ONCE > > by executor, but EACH time when executor starts. There is big > > difference between evaluation in parser/planner and in executor > > due to ability (currently, very limitted) to have prepared plans > > (parsed/planned). And nothing more. Imho, queries must return > > consistent set of data: if I want to get rows inserted 5 days > > ago and run query with WHERE insert_date = now() - 5 near 12PM > > then I risk to get inconsistent set of rows if now() will be > > evaluated for EACH tuple. > > Perhaps random was a bad example, the point I was trying to make was that > it is a function that is not wholely determined by its arguments. > > However, I disagree with your contention about random() and now() also. > Think about the statement > > insert into samples > select random(), xval, yval from points; > > The intent being to associate a random value with each of the x,y pairs... Imho, we have to treat random() etc differently in target list and qual! WHERE defines set of rows to be returned by query, functions in target list define representation of this set. > > > > Also, perhaps instead of doing constant folding in the parser, consider makeing > > > it part of rewrite. This pass would just traverse the tree looking for > > > functions with constant arguements and would pre-evaluate them and and > > > save the result in the wherever the cacheable results would be stored. No > > > special case required except that the optimizer would have to notice that > > > a pre-computed result was available to use as a key value. > > > > This is bad for performance. > > What_ is the "This" that is bad for performance? It is not clear what you > mean here. > > Do you mean memoizing? If so, I strongly disagree, and there is a ton of > work on this that supports me. > > Or do you mean adding the cacheable function optimization to rewrite? If so, ^^^^^^^^^^ This. Why should we traverse the tree once more time? What the problem to let parser handle these functions? > why is this a problem? I am assuming that this can be piggybacked on one > of the existing expression tree traversals, so I just don't see how > this creates more work than doing it in the parser. Vadim
> David Gould wrote: > > > > > ^^^^^^^^^ > > > I don't like the idea to evaluate random() in WHERE x = random() > > > for _each_ tuple. The same for WHERE insert_date = now() - 5 > > > /* 5 days */. Imho, these functions should be evaluated ONCE > > > by executor, but EACH time when executor starts. There is big > > > difference between evaluation in parser/planner and in executor > > > due to ability (currently, very limitted) to have prepared plans > > > (parsed/planned). And nothing more. Imho, queries must return > > > consistent set of data: if I want to get rows inserted 5 days > > > ago and run query with WHERE insert_date = now() - 5 near 12PM > > > then I risk to get inconsistent set of rows if now() will be > > > evaluated for EACH tuple. > > > > Perhaps random was a bad example, the point I was trying to make was that > > it is a function that is not wholely determined by its arguments. > > > > However, I disagree with your contention about random() and now() also. > > Think about the statement > > > > insert into samples > > select random(), xval, yval from points; > > > > The intent being to associate a random value with each of the x,y pairs... > > Imho, we have to treat random() etc differently in target list and > qual! WHERE defines set of rows to be returned by query, functions > in target list define representation of this set. > > > > > > > Also, perhaps instead of doing constant folding in the parser, consider makeing > > > > it part of rewrite. This pass would just traverse the tree looking for > > > > functions with constant arguements and would pre-evaluate them and and > > > > save the result in the wherever the cacheable results would be stored. No > > > > special case required except that the optimizer would have to notice that > > > > a pre-computed result was available to use as a key value. > > > > > > This is bad for performance. > > > > What_ is the "This" that is bad for performance? It is not clear what you > > mean here. > > > > Do you mean memoizing? If so, I strongly disagree, and there is a ton of > > work on this that supports me. > > > > Or do you mean adding the cacheable function optimization to rewrite? If so, > ^^^^^^^^^^ > This. Why should we traverse the tree once more time? > What the problem to let parser handle these functions? > > > why is this a problem? I am assuming that this can be piggybacked on one > > of the existing expression tree traversals, so I just don't see how > > this creates more work than doing it in the parser. Yep. Rewrite already is confusing. Parser has this type of handling in there already. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> Imho, queries must return > consistent set of data: if I want to get rows inserted 5 days > ago and run query with WHERE insert_date = now() - 5 near 12PM > then I risk to get inconsistent set of rows if now() will be > evaluated for EACH tuple. This is perhaps a bad example, because now() (and 'now') return the transaction time, guaranteed to be the same for each row evaluation of a query, and the same for every query within a transaction. What should be behavior of update x set i = random(); be? I would have assumed that random() is evaluated once; are there any truely volatile functions in SQL92? > > Also, perhaps instead of doing constant folding in the parser, > > consider making it part of rewrite. This pass would just traverse > > the tree looking for functions with constant arguements and would > > pre-evaluate them and and save the result in the wherever the > > cacheable results would be stored. No special case required except > > that the optimizer would have to notice that a pre-computed result > > was available to use as a key value. > This is bad for performance. What makes this bad for performance? An extra traversal of the parse tree? Or something else?? - Tom
Thomas G. Lockhart wrote: > > > Imho, queries must return > > consistent set of data: if I want to get rows inserted 5 days > > ago and run query with WHERE insert_date = now() - 5 near 12PM > > then I risk to get inconsistent set of rows if now() will be > > evaluated for EACH tuple. > > This is perhaps a bad example, because now() (and 'now') return the > transaction time, guaranteed to be the same for each row evaluation of a > query, and the same for every query within a transaction. yes, this is bad example -:) > > What should be behavior of > > update x set i = random(); > > be? I would have assumed that random() is evaluated once; are there any I don't know. I mostly concern about WHERE just because of this defines data set to be returned by query. > truely volatile functions in SQL92? This is what I also would like to know. > > > > Also, perhaps instead of doing constant folding in the parser, > > > consider making it part of rewrite. This pass would just traverse > > > the tree looking for functions with constant arguements and would > > > pre-evaluate them and and save the result in the wherever the > > > cacheable results would be stored. No special case required except > > > that the optimizer would have to notice that a pre-computed result > > > was available to use as a key value. > > This is bad for performance. > > What makes this bad for performance? An extra traversal of the parse ^^^^^^^^^^^^^^^ This. I don't see any reasons for this. > tree? Or something else?? Vadim
> This is perhaps a bad example, because now() (and 'now') return the > transaction time, guaranteed to be the same for each row evaluation of a > query, and the same for every query within a transaction. > > What should be behavior of > > update x set i = random(); > > be? I would have assumed that random() is evaluated once; are there any > truely volatile functions in SQL92? Hmmm, I would have thought that was a clever way to get a nice random value into each row ;-) But seriously, we have the iscacheable attribute for functions. So, update pg_func set iscacheable = 'false' where pg_func.name = 'random'; and you get different random values in all the rows. Or do update pg_func set iscacheable = 'true' where pg_func.name = 'random'; and you get the same random value in all the rows. Or at least this is how postgres was intended to work. It was meant to be an extensible table driven system. Hardcoding stuff "because it is the right way" is really assuming that we know better than the users what they want to do. Almost never true. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - If simplicity worked, the world would be overrun with insects. -
Vadim writes: > Imho, we have to treat random() etc differently in target list and > qual! WHERE defines set of rows to be returned by query, functions > in target list define representation of this set. Maybe, but it sure makes the system harder to explain. For example, what happens to a function in the target list of a subquery select? And what should happen. But, perhaps we should just do whatever is easiest since anyone who writes a query: select * from x where x.key = random(); probably deserves whatever they get ;-) Seriously, I have never seen random used in a where clause. And I don't think that queries like select * from events where event.time = timenow(); make a lot of sense either. How did event.time get to be _equal_ to the current time. Given that it had to be inserted earlier and the time of the select would not have been known then. My point is anyone who uses a variant function in a where clause already is pretty confused. To tell them that variant functions aren't variant when used in where clauses probably will just confuse them more. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 - If simplicity worked, the world would be overrun with insects. -