Обсуждение: Domains versus arrays versus typmods
In bug #5717, Richard Huxton complains that a domain declared likeCREATE DOMAIN mynums numeric(4,2)[1]; doesn't work properly, ie, the typmod isn't enforced in places where it reasonably ought to be. I dug into this a bit, and found that there are more worms in this can than I first expected. In the first place, plpgsql seems a few bricks shy of a load, in that its code to handle assignment to an array element doesn't appear to be considering arrays with typmod at all: it just passes typmod -1 to exec_simple_cast_value() in the PLPGSQL_DTYPE_ARRAYELEM case in exec_assign_value(). Nonetheless, if you try a case like declare x numeric(4,2)[1];begin x[1] := $1; you'll find the typmod *is* enforced. It turns out that the reason that it works is that after constructing the modified array value, exec_assign_value() calls itself recursively to do the actual assignment to the array variable --- and that call sees that the array variable has a typmod, so it applies the cast *at the array level*, ie it disassembles the array, re-coerces each element, and builds a new array. So it's horridly inefficient but it works. That could and should be improved, but it seems to be just a matter of local changes in pl_exec.c, and it's only an efficiency hack anyway. The big problem is that none of this happens when the variable is declared as a domain, and I believe that that is a significantly more wide-ranging issue. The real issue as I see it is that it's possible to subscript an array without realizing that the array's type is really a domain that incorporates a typmod specification. This happens because of a hack we did to simplify implementation of domains over arrays: we expose the array element type as typelem of the domain as well. For example, given Richard's declaration we have regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums';typname | oid | typelem| typbasetype | typtypmod ---------+-------+---------+-------------+-----------mynums | 92960 | 1700 | 1231 | 262150 (1 row) If we were to wrap another domain around that, we'd have regression=# create domain mynums2 as mynums; CREATE DOMAIN regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums2';typname | oid | typelem| typbasetype | typtypmod ---------+-------+---------+-------------+-----------mynums2 | 92976 | 1700 | 92960 | -1 (1 row) and at this point it's completely unobvious from inspection of the pg_type entry that any typmod needs to be applied when subscripting. So I'm suspicious that there are a boatload of bugs related to this kind of domain declaration, not just the one case in plpgsql. I think that what we ought to do about it is to stop exposing typelem in domains' pg_type rows. If you want to subscript a domain value, you should have to drill down to its base type with getBaseTypeAndTypmod, which would also give you the correct typmod to apply. If we set typelem to zero in domain pg_type rows, it shouldn't take too long to find any places that are missing this consideration --- the breakage will be obvious rather than subtle. This catalog definitional change obviously shouldn't be back-patched, but possibly the ensuing code changes could be, since the typelem change is just to expose where things are wrong and wouldn't be a prerequisite for corrected code to behave correctly. Or we could decide that this is a corner case we're not going to try to fix in back branches. It's been wrong since day 0, so there's certainly an argument that it's not important enough to risk back-patching. Comments? regards, tom lane
On Tue, Oct 19, 2010 at 6:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > In bug #5717, Richard Huxton complains that a domain declared like > CREATE DOMAIN mynums numeric(4,2)[1]; > doesn't work properly, ie, the typmod isn't enforced in places where > it reasonably ought to be. I dug into this a bit, and found that there > are more worms in this can than I first expected. > > In the first place, plpgsql seems a few bricks shy of a load, in that > its code to handle assignment to an array element doesn't appear to > be considering arrays with typmod at all: it just passes typmod -1 to > exec_simple_cast_value() in the PLPGSQL_DTYPE_ARRAYELEM case in > exec_assign_value(). Nonetheless, if you try a case like > > declare > x numeric(4,2)[1]; > begin > x[1] := $1; > > you'll find the typmod *is* enforced. It turns out that the reason that > it works is that after constructing the modified array value, > exec_assign_value() calls itself recursively to do the actual assignment > to the array variable --- and that call sees that the array variable has > a typmod, so it applies the cast *at the array level*, ie it > disassembles the array, re-coerces each element, and builds a new array. > So it's horridly inefficient but it works. > > That could and should be improved, but it seems to be just a matter > of local changes in pl_exec.c, and it's only an efficiency hack anyway. > The big problem is that none of this happens when the variable is > declared as a domain, and I believe that that is a significantly more > wide-ranging issue. > > The real issue as I see it is that it's possible to subscript an array > without realizing that the array's type is really a domain that > incorporates a typmod specification. This happens because of a hack > we did to simplify implementation of domains over arrays: we expose the > array element type as typelem of the domain as well. For example, given > Richard's declaration we have > > regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums'; > typname | oid | typelem | typbasetype | typtypmod > ---------+-------+---------+-------------+----------- > mynums | 92960 | 1700 | 1231 | 262150 > (1 row) > > If we were to wrap another domain around that, we'd have > > regression=# create domain mynums2 as mynums; > CREATE DOMAIN > regression=# select typname,oid,typelem,typbasetype,typtypmod from pg_type where typname = 'mynums2'; > typname | oid | typelem | typbasetype | typtypmod > ---------+-------+---------+-------------+----------- > mynums2 | 92976 | 1700 | 92960 | -1 > (1 row) > > and at this point it's completely unobvious from inspection of the > pg_type entry that any typmod needs to be applied when subscripting. > > So I'm suspicious that there are a boatload of bugs related to this kind > of domain declaration, not just the one case in plpgsql. > > I think that what we ought to do about it is to stop exposing typelem > in domains' pg_type rows. If you want to subscript a domain value, you > should have to drill down to its base type with getBaseTypeAndTypmod, > which would also give you the correct typmod to apply. If we set > typelem to zero in domain pg_type rows, it shouldn't take too long to > find any places that are missing this consideration --- the breakage > will be obvious rather than subtle. I fear that this is going to degrade the performance of common cases as a way of debugging rare cases. > This catalog definitional change obviously shouldn't be back-patched, > but possibly the ensuing code changes could be, since the typelem change > is just to expose where things are wrong and wouldn't be a prerequisite > for corrected code to behave correctly. Or we could decide that this is > a corner case we're not going to try to fix in back branches. It's been > wrong since day 0, so there's certainly an argument that it's not > important enough to risk back-patching. > > Comments? It might be reasonable to back-patch whatever we decide on into 9.0, because it is so new, but I would be reluctant to go back further unless we have some evidence that it's bothering people. It seems to me that this can could have a lot of worms in it, and I fear that there could be several rounds of fixes, which I would rather not inflict on users of supposedly-stable branches. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 19, 2010 at 6:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think that what we ought to do about it is to stop exposing typelem >> in domains' pg_type rows. �If you want to subscript a domain value, you >> should have to drill down to its base type with getBaseTypeAndTypmod, >> which would also give you the correct typmod to apply. �If we set >> typelem to zero in domain pg_type rows, it shouldn't take too long to >> find any places that are missing this consideration --- the breakage >> will be obvious rather than subtle. > I fear that this is going to degrade the performance of common cases > as a way of debugging rare cases. We've already accepted the cost of doing getBaseTypeAndTypmod() in a whole lot of performance-critical parsing paths, on the off chance that the target datatype might be a domain. It's not apparent to me that array subscripting is so important as to deserve an exemption from that. Especially when not doing so doesn't work. >> Comments? > It might be reasonable to back-patch whatever we decide on into 9.0, > because it is so new, but I would be reluctant to go back further > unless we have some evidence that it's bothering people. It seems to > me that this can could have a lot of worms in it, and I fear that > there could be several rounds of fixes, which I would rather not > inflict on users of supposedly-stable branches. Well, we have bug #5717 as evidence that it's bothering people ;-). But I agree that the case for back-patching is a bit thin, especially if it might result in any user-visible behavioral changes. One case that I've realized is a problem is domain constraints at the array level: regression=# create domain orderedpair as int[2] check (value[1] < value[2]); CREATE DOMAIN regression=# select array[2,1]::orderedpair; -- expect failure ERROR: value for domain orderedpair violates check constraint "orderedpair_check" regression=# create table op (f1 orderedpair); CREATE TABLE regression=# insert into op values (array[1,2]); INSERT 0 1 regression=# insert into op values (array[2,1]); -- expect failure ERROR: value for domain orderedpair violates check constraint "orderedpair_check" regression=# update op set f1[2] = 0; -- expect failure ... oops UPDATE 1 regression=# select * from op; f1 -------{1,0} (1 row) The reason this fails is that the result of the ArrayRef "f1[2] := 0" is considered to be of the domain type, so we don't recheck the constraint. As this example demonstrates, that assumption is simply broken. The correct implementation, I believe, is that the result ought to be considered to be of the base type (ie, just int[]), which would cause an implicit re-coercion to the domain type to occur during the assignment, offering a chance to re-verify the constraint. Right offhand I don't see a way that that sort of change can safely be back-patched. The incorrect assumption about the ArrayRef's result type is already baked into stored rules in existing databases. regards, tom lane
On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Oct 19, 2010 at 6:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I think that what we ought to do about it is to stop exposing typelem >>> in domains' pg_type rows. If you want to subscript a domain value, you >>> should have to drill down to its base type with getBaseTypeAndTypmod, >>> which would also give you the correct typmod to apply. If we set >>> typelem to zero in domain pg_type rows, it shouldn't take too long to >>> find any places that are missing this consideration --- the breakage >>> will be obvious rather than subtle. > >> I fear that this is going to degrade the performance of common cases >> as a way of debugging rare cases. > > We've already accepted the cost of doing getBaseTypeAndTypmod() in a > whole lot of performance-critical parsing paths, on the off chance that > the target datatype might be a domain. It's not apparent to me that > array subscripting is so important as to deserve an exemption from that. > Especially when not doing so doesn't work. Hmm... so are there no cases where zeroing out the typelem will cost us an otherwise-unnecessary syscache lookup? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We've already accepted the cost of doing getBaseTypeAndTypmod() in a >> whole lot of performance-critical parsing paths, on the off chance that >> the target datatype might be a domain. �It's not apparent to me that >> array subscripting is so important as to deserve an exemption from that. >> Especially when not doing so doesn't work. > Hmm... so are there no cases where zeroing out the typelem will cost > us an otherwise-unnecessary syscache lookup? My point is that anyplace that is relying on the surface typelem, without drilling down to see what the base type is, is wrong. So yeah, those lookups are (will be) necessary. regards, tom lane
On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Oct 19, 2010 at 9:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> We've already accepted the cost of doing getBaseTypeAndTypmod() in a >>> whole lot of performance-critical parsing paths, on the off chance that >>> the target datatype might be a domain. It's not apparent to me that >>> array subscripting is so important as to deserve an exemption from that. >>> Especially when not doing so doesn't work. > >> Hmm... so are there no cases where zeroing out the typelem will cost >> us an otherwise-unnecessary syscache lookup? > > My point is that anyplace that is relying on the surface typelem, > without drilling down to see what the base type is, is wrong. > So yeah, those lookups are (will be) necessary. OK. In that case, +1 from me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 20/10/10 01:47, Robert Haas wrote: > On Tue, Oct 19, 2010 at 6:14 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Comments? > > It might be reasonable to back-patch whatever we decide on into 9.0, > because it is so new, but I would be reluctant to go back further > unless we have some evidence that it's bothering people. It seems to > me that this can could have a lot of worms in it, and I fear that > there could be several rounds of fixes, which I would rather not > inflict on users of supposedly-stable branches. The work-around I applied when I stumbled across this was just to apply an explicit cast before my function's RETURN. That neatly solves my particular problem (which I at first thought was a formatting issue somewhere in my app). The real danger with this is the opportunity to end up with occasional bad data in tables, quite possibly unnoticed. If I'd come across this in an existing system rather than a new app I'm pretty sure it would have confused me for a lot longer than it did. -- Richard Huxton Archonet Ltd
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Oct 20, 2010 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> My point is that anyplace that is relying on the surface typelem, >> without drilling down to see what the base type is, is wrong. >> So yeah, those lookups are (will be) necessary. > OK. In that case, +1 from me. I've come across another interesting definitional issue, which is what properties should domains have with respect to matching to polymorphic arguments. Currently, the polymorphic matching functions take domains at face value (ie, without noticing their relationships to their base types), with one exception: because they use get_element_type() to decide if a type matches ANYARRAY, domains over arrays will be considered to match ANYARRAY. This leads to some weird inconsistencies and at least one genuine bug. Observe (this is with 9.0.x HEAD): regression=# create domain myi as int; CREATE DOMAIN regression=# select array[1,2] || 3;?column? ----------{1,2,3} (1 row) regression=# select array[1,2] || 3::myi; ERROR: operator does not exist: integer[] || myi In this case, one might expect myi to be automatically downcast to int so that it could be matched up with the int array, but that's not happening. However: regression=# create domain myia as int[]; CREATE DOMAIN regression=# select array[1,2]::myia || 3;?column? ----------{1,2,3} (1 row) So we will downcast myia to int[], or at least one might assume that's what's happening. But actually it's worse than that: the result of this operation is thought to be myia not int[], because myia itself is taken as matching ANYARRAY, and the operator result is the same ANYARRAY type. Thus, this case goes off the rails completely: regression=# create domain myia2 as int[] check(array_length(value,1) = 2); CREATE DOMAIN regression=# select array[1,2]::myia2;array -------{1,2} (1 row) regression=# select array[1,2,3]::myia2; ERROR: value for domain myia2 violates check constraint "myia2_check" regression=# select array[1,2]::myia2 || 3;?column? ----------{1,2,3} (1 row) The result of the || is considered to be myia2, as can be seen for example this way: regression=# create view vvv as select array[1,2]::myia2 || 3 as x; CREATE VIEW regression=# \d vvv View "public.vvv"Column | Type | Modifiers --------+-------+-----------x | myia2 | So we have a value that's claimed to belong to the domain, but it doesn't meet the domain's constraints. What I am intending to do about this in the short run is to leave the anyarray-ness tests in the polymorphic-compatibility-checking functions as-is. That will mean (with the change in typelem for domains) that a domain over array doesn't match ANYARRAY unless you explicitly downcast it. I argue that this is consistent with the current behavior of not auto-downcasting domains to match the element type of an array. We could go back and change it later, but if we do, we should try to make both cases provide auto-downcast-when-needed behavior. I have not dug into just what code changes would be needed for that. Auto-downcast wouldn't be exactly compatible with the current behavior anyway, since it would result in a different claimed type for the operator result. Comments? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > regression=# select array[1,2] || 3::myi; > ERROR: operator does not exist: integer[] || myi > > In this case, one might expect myi to be automatically downcast to < int so that it could be matched up with the int array, but that's > not happening. I guess it should allow that, although for my uses of domains it's hard to see a reasonable use case for it, so that part doesn't bother me too much. > regression=# create domain myia as int[]; > CREATE DOMAIN > regression=# select array[1,2]::myia || 3; > ?column? > ---------- > {1,2,3} > (1 row) > > So we will downcast myia to int[], or at least one might assume > that's what's happening. But actually it's worse than that: the > result of this operation is thought to be myia not int[], because > myia itself is taken as matching ANYARRAY, and the operator result > is the same ANYARRAY type. That is actually what I would want and expect. Let's say I have an array of attorney bar numbers, and I add one more as a literal. While an argument could be made that the integer should be cast to a bar number before being added to the array, we don't require that for an assignment to a simple variable in the domain, so it would be surprising to require a cast here, and even more surprising for the concatenation to result in an array of primitive integers rather than a array of attorney bar numbers. > regression=# create domain myia2 as int[] > check(array_length(value,1) = 2); > CREATE DOMAIN > regression=# select array[1,2]::myia2 || 3; > ?column? > ---------- > {1,2,3} > (1 row) > So we have a value that's claimed to belong to the domain, but it > doesn't meet the domain's constraints. Yeah, that's obviously wrong. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So we will downcast myia to int[], or at least one might assume >> that's what's happening. But actually it's worse than that: the >> result of this operation is thought to be myia not int[], because >> myia itself is taken as matching ANYARRAY, and the operator result >> is the same ANYARRAY type. > That is actually what I would want and expect. Let's say I have an > array of attorney bar numbers, and I add one more as a literal. > While an argument could be made that the integer should be cast to > a bar number before being added to the array, we don't require that > for an assignment to a simple variable in the domain, so it would be > surprising to require a cast here, and even more surprising for the > concatenation to result in an array of primitive integers rather > than a array of attorney bar numbers. I disagree with that argument: you are confusing an array over a domain type with a domain over an array type. In the latter case, the domain could have additional constraints (such as the length constraint in my other example), and there's no reason to assume that || or other array operators would preserve those constraints. A perhaps comparable example is create domain verysmallint as int check (value < 10); select 9::verysmallint + 1; The result of the addition is int, not verysmallint, which is why you don't get an error. From an abstract-data-type point of view, the fact that any of these operations are even allowed without an explicit downcast is a bit uncool: it exposes the implementation of the domain type, which one could argue shouldn't be allowed, at least not without some notational marker showing you know what you're doing. But the SQL committee seems to have decided to ignore that tradition and allow auto-downcasts. Nonetheless, when a domain type is fed to an operator that works on its base type, it has to be clearly understood that there *is* an implied downcast, and whatever special properties the domain may have had will be lost. As far as the operator and its result are concerned, the domain is just its base type. I'm not against fixing these cases so that auto downcasts happen, I'm just saying that it's outside the scope of what I'm going to do in response to the current bug. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: > you are confusing an array over a domain type with a domain over > an array type. Yes I was. Sorry. Argument withdrawn. -Kevin