Обсуждение: Casts
It seems odd to me that implicit casts are checked for when you call a function but not when you're implicitly calling a function via a cast. As a result there are a *lot* of redundant casts in our catalog, essentially n! casts for a domain with n types in it. So for example there are 138 casts between the various numeric data types including every possible pairing of char, int2, int4, int8, float4, float8, and numeric. Now I don't think it actually costs us anything to have so many casts but it sure makes adding new fully functional user defined types a pain. Adding a single new numeric data type requires creating 12 new casts and a second one would require 14, etc. What's strange is that you do not have to go to such lengths to get a fully functional data type in other respects. One implicit cast to numeric and you can use +, -, log(), exp(), floor(), ceil(), etc. You may want to implement some of those for performance reasons but for most relying on the implicit cast is perfectly reasonable. It seems like what ought to happen is that every data domain should have a single blessed data type that is the root data type for that domain. Every data type in that domain should have a single implicit cast to that root data type. That effectively is how all the data types are set up in fact. They have all these dozens of assignment casts and a single implicit cast to a type chosen in some sort of unspoken consensus. There has been some fear expressed in the past that too many implicit casts create surprising side effects. I think that's a valid fear but only relevant if we have two or more such casts for a single data type or have a cast to an inappropriate type. As long as we have precisely one implicit cast for every type and it's a cast to a datatype with basically the same semantics it seems like we should be safe. So for example if all the numeric data types had an implicit cast to numeric and an assignment cast from numeric it ought to be possible for the planner to find a way to handle an explicit cast between any two arbitrary numeric types using just those. It could use the same logic it uses to find functions by looking first for an exact match, then any assignment cast from a data type to which it can implicitly cast to first. That would let people add a new fully functional numeric type by creating only two casts instead of 12. And a second one by creating two more instead of 14, and so on. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
stark <stark@enterprisedb.com> writes: > It seems odd to me that implicit casts are checked for when you call a > function but not when you're implicitly calling a function via a cast. As a > result there are a *lot* of redundant casts in our catalog, essentially n! > casts for a domain with n types in it. So for example there are 138 casts > between the various numeric data types including every possible pairing of > char, int2, int4, int8, float4, float8, and numeric. This is intentional. If you explicitly cast type foo to type bar there should not be any question about what function will be invoked. The cost is a few more rows in pg_cast ... so what? Adding rows to pg_cast is not the most painful part of making a new datatype. As for "the parser ought to be able to find two-step cast pathways", no thanks. The increase in search time and the decrease in predictability are both undesirable. > There has been some fear expressed in the past that too many implicit casts > create surprising side effects. Not "some fear" ... we have seen people badly burned, time and time again, by the ill-considered implicit casts that are already in there. IMHO we need fewer implicit casts, not more. regards, tom lane
stark <stark@enterprisedb.com> writes: > I think the ideal combination is having every type have precisely one > implicit cast "up" the type "tree" and assignment casts down the > "tree". No, because for example in the case of the numeric datatypes, that would result in *every* cross-type operation being done in NUMERIC. Do you really want, for example, "bigint var = integer constant" to be interpreted as "var::numeric = const::numeric"? (Hint: you'll lose the benefit of any index on the var.) Actually it's worse than that, because the top of the numeric datatype hierarchy is float8 not numeric; this is more or less forced by SQL's rules about exact vs inexact arithmetic. So your proposal would really reduce to doing all cross-type arithmetic in float8 ... which would at least be a lot faster than numeric, but it's not gonna fly from an accuracy point of view. What we want, and what the existing pg_cast rules give us, is a rule that an operation between two different numeric types is done in the "wider" of those two types. You could probably propose some explicit representation of this concept that would be more compact than the present pg_cast table --- but it would also be less flexible. I don't really see much to be gained that way. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: Tom Lane <tgl@sss.pgh.pa.us> writes: > stark <stark@enterprisedb.com> writes: >> It seems odd to me that implicit casts are checked for when you call a >> function but not when you're implicitly calling a function via a cast. As a >> result there are a *lot* of redundant casts in our catalog, essentially n! >> casts for a domain with n types in it. So for example there are 138 casts >> between the various numeric data types including every possible pairing of >> char, int2, int4, int8, float4, float8, and numeric. > > This is intentional. If you explicitly cast type foo to type bar there > should not be any question about what function will be invoked. The > cost is a few more rows in pg_cast ... so what? And hundreds of lines of copy-pasted code with potential bugs. > As for "the parser ought to be able to find two-step cast pathways", > no thanks. The increase in search time and the decrease in > predictability are both undesirable. > >> There has been some fear expressed in the past that too many implicit casts >> create surprising side effects. > > Not "some fear" ... we have seen people badly burned, time and time > again, by the ill-considered implicit casts that are already in there. > IMHO we need fewer implicit casts, not more. I agree we need fewer implicit casts. But I think you take that too far to dogmatically imply that every implicit cast is suspect. I think the ideal combination is having every type have precisely one implicit cast "up" the type "tree" and assignment casts down the "tree". I don't see us every needing anything more complex than a flat "tree" of a single base type for each domain and everyone directly a child of that one type. So every numeric data type would have an implicit cast to numeric, every text type would have an implicit cast to text, etc. The implicit casts from numeric to other types seem suspect to me. And there are tons of cases where we have implicit casts in both directions such as bit to varbit and varbit to bit. That makes no sense to me. That would be where the surprising effects come from. As long as every type has precisely one implicit cast up I think it would become safe to rely on it for many of the operations and not have to define redundant operations for every data type. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Wed, Aug 09, 2006 at 12:21:40PM +0100, stark wrote: > I think the ideal combination is having every type have precisely one implicit > cast "up" the type "tree" and assignment casts down the "tree". I don't see us > every needing anything more complex than a flat "tree" of a single base type > for each domain and everyone directly a child of that one type. So every > numeric data type would have an implicit cast to numeric, every text type > would have an implicit cast to text, etc. That's basically what we have now, except any numeric type cast be "upcast" implicity but "downcast" requires assignment. The tree basically goes like: smallint -> integer -> bigint -> numeric -> real -> double precision. You have to be able to cast all the inbetween version implicitly also. If someone writes an expression with "int2 = int4" you don't really want to cast them both up to numeric, or worse, double precision. Currently we can upcast the int2 to int4 implicitly and that allows us to use an index on the int4 column. > > The implicit casts from numeric to other types seem suspect to me. And there > are tons of cases where we have implicit casts in both directions such as bit > to varbit and varbit to bit. That makes no sense to me. That would be where > the surprising effects come from. The reason for those is basically there is no real difference between bit and varbit, or character varying and text. Those types are essentially the same, so why would you want to make a function that treated them differently? There are no such round-trips amongst the numeric types. > As long as every type has precisely one implicit cast up I think it would > become safe to rely on it for many of the operations and not have to define > redundant operations for every data type. You're not required to provide all the casts, but it's user friendly to do so. Requiring double casts to go between two essentially compatable types seems silly... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > You're not required to provide all the casts, but it's user friendly to > do so. Requiring double casts to go between two essentially compatable > types seems silly... I believe what Greg had in mind included the idea that the parser would automatically find two-step cast pathways if there wasn't a direct path. This scares me though, as it seems like a recipe for surprising behavior. regards, tom lane