Обсуждение: [HACKERS] user-defined numeric data types triggering ERROR: unsupported type
Hi, while testing a custom data type FIXEDDECIMAL [1], implementing a numeric-like data type with limited range, I ran into a several issues that I suspect may not be entirely intentional / expected behavior. [1] https://github.com/2ndQuadrant/fixeddecimal Attached is a minimal subset of the extension SQL definition, which may be more convenient when looking into the issue. The most important issue is that when planning a simple query, the estimation of range queries on a column with the custom data type fails like this: test=# create table t (a fixeddecimal); CREATE TABLE test=# insert into t select random() from generate_series(1,100000); INSERT 0 100000 test=# analyze t; ANALYZE test=# select * from t where a > 0.9; ERROR: unsupported type: 16385 The error message here comes from convert_numeric_to_scalar, which gets called during histogram processing (ineq_histogram_selectivity) when approximating the histogram. convert_to_scalar does this: switch (valuetypeid) { ... case NUMERICOID: ... *scaledvalue = convert_numeric_to_scalar(value, valuetypid); *scaledlobound = convert_numeric_to_scalar(lobound, boundstypid); *scaledhibound = convert_numeric_to_scalar(hibound, boundstypid); return true; ... } The first call works fine, as the constant really is numeric (valuetypeid=1700). But the histogram boundaries are using the custom data type, causing the error as convert_numeric_to_scalar expects only a bunch of hard-coded data types. So it's pretty much guaranteed to fail with any user-defined data type. This seems a bit unfortunate :-( One solution would be to implement custom estimation function, replacing scalarltsel/scalargtsel. But that seems rather unnecessary, especially considering there is an implicit cast from fixeddecimal to numeric. Another thing is that when there's just an MCV, the estimation works just fine. So I see two basic ways to fix this: * Make convert_numeric_to_scalar smarter, so that it checks if there is an implicit cast to numeric, and fail only if it does not find one. * Make convert_to_scalar smarter, so that it does return false for unexpected data types, so that ineq_histogram_selectivity uses the default estimate of 0.5 (for that one bucket). Both options seem more favorable than what's happening currently. Is there anything I missed, making those fixes unacceptable? If anything, the fact that MCV estimates work while histogram does not makes this somewhat unpredictable - a change in the data distribution (or perhaps even just a different sample in ANALYZE) may result in sudden failures. I ran into one additional strange thing while investigating this. The attached SQL script defines two operator classes - fixeddecimal_ops and fixeddecimal_numeric_ops, defining (fixeddecimal,fixeddecimal) and (fixeddecimal,numeric) operators. Dropping one of those operator classes changes the estimates in a somewhat suspicious ways. When only fixeddecimal_ops is defined, we get this: test=# explain select * from t where a > 0.1; QUERY PLAN -------------------------------------------------------- Seq Scan on t (cost=0.00..1943.00 rows=33333 width=8) Filter: ((a)::numeric > 0.1) (2 rows) That is, we get the default estimate for inequality clauses, 33%. But when only fixeddecimal_numeric_ops, we get this: test=# explain select * from t where a > 0.1; QUERY PLAN -------------------------------------------------------- Seq Scan on t (cost=0.00..1693.00 rows=50000 width=8) Filter: (a > 0.1) (2 rows) That is, we get 50% estimate, because that's what scalarineqsel uses when it ineq_histogram_selectivity can't compute selectivity from a histogram for some reason. Wouldn't it make it more sense to use the default 33% estimate here? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > [ scalarineqsel may fall over when used by extension operators ] I concur with your thought that we could have ineq_histogram_selectivity fall back to a "mid bucket" default if it's working with a datatype it is unable to convert_to_scalar. But I think if we're going to touch this at all, we ought to have higher ambition than that, and try to provide a mechanism whereby an extension that's willing to work a bit harder could get that additional increment of estimation accuracy. I don't care for this way to do that: > * Make convert_numeric_to_scalar smarter, so that it checks if there is > an implicit cast to numeric, and fail only if it does not find one. because it's expensive, and it only works for numeric-category cases, and it will fail outright for numbers outside the range of "double". (Notice that convert_numeric_to_scalar is *not* calling the regular cast function for numeric-to-double.) Moreover, any operator ought to know what types it can accept; we shouldn't have to do more catalog lookups to figure out what to do. Now that scalarltsel and friends are just trivial wrappers for a common function, we could imagine exposing scalarineqsel_wrapper as a non-static function, with more arguments (and perhaps a better-chosen name ;-)). The idea would be for extensions that want to go this extra mile to provide their own selectivity estimation functions, which again would just be trivial wrappers for the core function, but would provide additional knowledge through additional arguments. The additional arguments I'm envisioning are a couple of C function pointers, one function that knows how to convert the operator's left-hand input type to scalar, and one function that knows how to convert the right-hand type to scalar. (Identical APIs of course.) Passing a NULL would imply that the core code must fall back on its own devices for that input. Now the thing about convert_to_scalar is that there are several different conversion conventions already (numeric, string, timestamp, ...), and there probably could be more once extension types are coming to the party. So I'm imagining that the API for these conversion functions could be something like bool convert(Datum value, Oid valuetypeid, int *conversion_convention, double *converted_value); The conversion_convention output would make use of some agreed-on constants, say 1 for numeric, 2 for string, etc etc. In the core code, if either converter fails (returns false) or if they don't return the same conversion_convention code, we give up and use the mid-bucket default. If they both produce results with the same conversion_convention, then we can treat the converted_values as commensurable. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] user-defined numeric data types triggering ERROR:unsupported type
От
Tomas Vondra
Дата:
On 09/21/2017 04:24 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> [ scalarineqsel may fall over when used by extension operators ] > > I concur with your thought that we could have > ineq_histogram_selectivity fall back to a "mid bucket" default if > it's working with a datatype it is unable to convert_to_scalar. But I > think if we're going to touch this at all, we ought to have higher > ambition than that, and try to provide a mechanism whereby an > extension that's willing to work a bit harder could get that > additional increment of estimation accuracy. I don't care for this > way to do that: > The question is - do we need a solution that is back-patchable? This means we can't really use FIXEDDECIMAL without writing effectively copying a lot of the selfuncs.c stuff, only to make some minor fixes. What about using two-pronged approach: 1) fall back to mid bucket in back branches (9.3 - 10) 2) do something smarter (along the lines you outlined) in PG11 I'm willing to spend some time on both, but (2) alone is not a particularly attractive for us as it only helps PG11 and we'll have to do the copy-paste evil anyway to get the data type working on back branches. >> * Make convert_numeric_to_scalar smarter, so that it checks if >> there is an implicit cast to numeric, and fail only if it does not >> find one. > > because it's expensive, and it only works for numeric-category cases, > and it will fail outright for numbers outside the range of "double". > (Notice that convert_numeric_to_scalar is *not* calling the regular > cast function for numeric-to-double.) Moreover, any operator ought to > know what types it can accept; we shouldn't have to do more catalog > lookups to figure out what to do. > Ah, I see. I haven't realized it's not using the regular cast functions, but now that you point that out it's clear relying on casts would fail. > Now that scalarltsel and friends are just trivial wrappers for a > common function, we could imagine exposing scalarineqsel_wrapper as a > non-static function, with more arguments (and perhaps a better-chosen > name ;-)). The idea would be for extensions that want to go this > extra mile to provide their own selectivity estimation functions, > which again would just be trivial wrappers for the core function, but > would provide additional knowledge through additional arguments. > > The additional arguments I'm envisioning are a couple of C function > pointers, one function that knows how to convert the operator's > left-hand input type to scalar, and one function that knows how to > convert the right-hand type to scalar. (Identical APIs of course.) > Passing a NULL would imply that the core code must fall back on its > own devices for that input. > > Now the thing about convert_to_scalar is that there are several > different conversion conventions already (numeric, string, timestamp, > ...), and there probably could be more once extension types are > coming to the party. So I'm imagining that the API for these > conversion functions could be something like > > bool convert(Datum value, Oid valuetypeid, > int *conversion_convention, double *converted_value); > > The conversion_convention output would make use of some agreed-on > constants, say 1 for numeric, 2 for string, etc etc. In the core > code, if either converter fails (returns false) or if they don't > return the same conversion_convention code, we give up and use the > mid-bucket default. If they both produce results with the same > conversion_convention, then we can treat the converted_values as > commensurable. > OK, so instead of re-implementing the whole function, we'd essentially do just something like this: #typedef bool (*convert_callback) (Datum value, Oid valuetypeid, \ int *conversion_convention,\ double *converted_value); double scalarineqsel(PlannerInfo *root, Oid operator, bool isgt, bool iseq, VariableStatData *vardata, Datum constval, Oid consttype, convert_calback convert); and then, from the extension double scalarineqsel_wrapper(PlannerInfo *root, Oid operator, bool isgt, bool iseq, VariableStatData *vardata, Datum constval, Oid consttype) { return scalarineqsel(root, operator, isgt, iseq, vardata, constval, consttype, my_convert_func); } Sounds reasonable to me, I guess - I can't really think about anything simpler giving us the same flexibility. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>> [ scalarineqsel may fall over when used by extension operators ] > What about using two-pronged approach: > 1) fall back to mid bucket in back branches (9.3 - 10) > 2) do something smarter (along the lines you outlined) in PG11 Sure. We need to test the fallback case anyway. >> [ sketch of a more extensible design ] > Sounds reasonable to me, I guess - I can't really think about anything > simpler giving us the same flexibility. Actually, on further thought, that's still too simple. If you look at convert_string_to_scalar() you'll see it's examining all three values concurrently (the probe value, of one datatype, and two bin boundary values of possibly a different type). The reason is that it's stripping off whatever common prefix those strings have before trying to form a numeric equivalent. While certainly convert_string_to_scalar is pretty stupid in the face of non-ASCII sort orders, the prefix-stripping is something I really don't want to give up. So the design I sketched of considering each value totally independently isn't good enough. We could, perhaps, provide an API that lets an operator estimation function replace convert_to_scalar in toto, but that seems like you'd still end up duplicating code in many cases. Not sure about how to find a happy medium. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] user-defined numeric data types triggering ERROR:unsupported type
От
Tomas Vondra
Дата:
OK, time to revive this old thread ... On 09/23/2017 05:27 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>> [ scalarineqsel may fall over when used by extension operators ] > >> What about using two-pronged approach: > >> 1) fall back to mid bucket in back branches (9.3 - 10) >> 2) do something smarter (along the lines you outlined) in PG11 > > Sure. We need to test the fallback case anyway. > Attached is a minimal fix adding a flag to convert_numeric_to_scalar, tracking when it fails because of unsupported data type. If any of the 3 calls (value + lo/hi boundaries) sets it to 'true' we simply fall back to the default estimate (0.5) within the bucket. >>> [ sketch of a more extensible design ] > >> Sounds reasonable to me, I guess - I can't really think about anything >> simpler giving us the same flexibility. > > Actually, on further thought, that's still too simple. If you look > at convert_string_to_scalar() you'll see it's examining all three > values concurrently (the probe value, of one datatype, and two bin > boundary values of possibly a different type). The reason is that > it's stripping off whatever common prefix those strings have before > trying to form a numeric equivalent. While certainly > convert_string_to_scalar is pretty stupid in the face of non-ASCII > sort orders, the prefix-stripping is something I really don't want > to give up. So the design I sketched of considering each value > totally independently isn't good enough. > > We could, perhaps, provide an API that lets an operator estimation > function replace convert_to_scalar in toto, but that seems like > you'd still end up duplicating code in many cases. Not sure about > how to find a happy medium. > I plan to work on this improvement next, once I polish a couple of other patches for the upcoming commit fest. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > OK, time to revive this old thread ... >>> [ scalarineqsel may fall over when used by extension operators ] > Attached is a minimal fix adding a flag to convert_numeric_to_scalar, > tracking when it fails because of unsupported data type. If any of the 3 > calls (value + lo/hi boundaries) sets it to 'true' we simply fall back > to the default estimate (0.5) within the bucket. I think this is a little *too* minimal, because it only covers convert_numeric_to_scalar and not the other sub-cases in which convert_to_scalar will throw an error instead of returning "false". I realize that that would be enough for your use-case, but I think we need to think more globally. So please fix the other ones too. I notice BTW that whoever shoehorned in the bytea case failed to pay attention to the possibility that not all three inputs are of the same type; so that code is flat out broken, and capable of crashing if fed dissimilar types. We ought to deal with that while we're at it, since (IMO) the goal is to make convert_to_scalar fail-soft for any combination of supplied data types. regards, tom lane
Re: [HACKERS] user-defined numeric data types triggering ERROR:unsupported type
От
Tomas Vondra
Дата:
On 03/03/2018 01:56 AM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> OK, time to revive this old thread ... >>>> [ scalarineqsel may fall over when used by extension operators ] > >> Attached is a minimal fix adding a flag to convert_numeric_to_scalar, >> tracking when it fails because of unsupported data type. If any of the 3 >> calls (value + lo/hi boundaries) sets it to 'true' we simply fall back >> to the default estimate (0.5) within the bucket. > > I think this is a little *too* minimal, because it only covers > convert_numeric_to_scalar and not the other sub-cases in which > convert_to_scalar will throw an error instead of returning "false". > I realize that that would be enough for your use-case, but I think > we need to think more globally. So please fix the other ones too. > Will do. > I notice BTW that whoever shoehorned in the bytea case failed to > pay attention to the possibility that not all three inputs are > of the same type; so that code is flat out broken, and capable > of crashing if fed dissimilar types. We ought to deal with that > while we're at it, since (IMO) the goal is to make convert_to_scalar > fail-soft for any combination of supplied data types. > OK, I'll look into that too. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] user-defined numeric data types triggering ERROR:unsupported type
От
Tomas Vondra
Дата:
Hi, Here is v2 of the fix. It does handle all the convert_to_scalar calls for various data types, just like the numeric one did in v1 with the exception of bytea. The bytea case is fixed by checking that the boundary values are varlenas. This seems better than checking for BYTEAOID explicitly, which would fail for custom varlena-based types. At first I've been thinking there might be issues when the data types has mismatching ordering, but I don't think the patch makes it any worse. I've also added a bunch of regression tests, checking each case. The bytea test it should cause segfault on master, of course. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Вложения
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > Here is v2 of the fix. It does handle all the convert_to_scalar calls > for various data types, just like the numeric one did in v1 with the > exception of bytea. Pushed with some adjustments. > The bytea case is fixed by checking that the boundary values are > varlenas. This seems better than checking for BYTEAOID explicitly, which > would fail for custom varlena-based types. At first I've been thinking > there might be issues when the data types has mismatching ordering, but > I don't think the patch makes it any worse. I didn't like this one bit. It's unlike all the other cases, which accept only specific type OIDs, and there's no good reason to assume that a bytea-vs-something-else comparison operator would have bytea-like semantics. So I think it's better to punt, pending the invention of an API to let the operator supply its own convert_to_scalar logic. > I've also added a bunch of regression tests, checking each case. The > bytea test it should cause segfault on master, of course. I was kind of underwhelmed with these test cases, too, so I didn't commit them. But they were good for proving that the bytea bug wasn't hypothetical :-) regards, tom lane
Re: [HACKERS] user-defined numeric data types triggering ERROR:unsupported type
От
Tomas Vondra
Дата:
On 03/04/2018 02:37 AM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> Here is v2 of the fix. It does handle all the convert_to_scalar calls >> for various data types, just like the numeric one did in v1 with the >> exception of bytea. > > Pushed with some adjustments. > Thanks. I see I forgot to tweak a call in btree_gist - sorry about that. >> The bytea case is fixed by checking that the boundary values are >> varlenas. This seems better than checking for BYTEAOID explicitly, which >> would fail for custom varlena-based types. At first I've been thinking >> there might be issues when the data types has mismatching ordering, but >> I don't think the patch makes it any worse. > > I didn't like this one bit. It's unlike all the other cases, which accept > only specific type OIDs, and there's no good reason to assume that a > bytea-vs-something-else comparison operator would have bytea-like > semantics. So I think it's better to punt, pending the invention of an > API to let the operator supply its own convert_to_scalar logic. > OK, understood. That's the "mismatching ordering" I was wondering about. >> I've also added a bunch of regression tests, checking each case. The >> bytea test it should cause segfault on master, of course. > > I was kind of underwhelmed with these test cases, too, so I didn't > commit them. But they were good for proving that the bytea bug > wasn't hypothetical :-) Underwhelmed in what sense? Should the tests be constructed in some other way, or do you think it's something that doesn't need the tests? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 03/04/2018 02:37 AM, Tom Lane wrote: >> I was kind of underwhelmed with these test cases, too, so I didn't >> commit them. But they were good for proving that the bytea bug >> wasn't hypothetical :-) > Underwhelmed in what sense? Should the tests be constructed in some > other way, or do you think it's something that doesn't need the tests? The tests seemed pretty ugly, and I don't think they were doing much to improve test coverage by adding all those bogus operators. Now, a look at https://coverage.postgresql.org/src/backend/utils/adt/selfuncs.c.gcov.html says that our test coverage for convert_to_scalar stinks, but we could (and probably should) improve that just by testing extant operators. A concrete argument for not creating those operators is that they pose a risk of breaking concurrently-running tests by capturing inexact argument matches (cf CVE-2018-1058). There are ways to get around that, eg run the whole test inside a transaction we never commit; but I don't really think we need the complication. regards, tom lane
Re: [HACKERS] user-defined numeric data types triggering ERROR:unsupported type
От
Tomas Vondra
Дата:
On 03/04/2018 05:59 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On 03/04/2018 02:37 AM, Tom Lane wrote: >>> I was kind of underwhelmed with these test cases, too, so I didn't >>> commit them. But they were good for proving that the bytea bug >>> wasn't hypothetical :-) > >> Underwhelmed in what sense? Should the tests be constructed in some >> other way, or do you think it's something that doesn't need the tests? > > The tests seemed pretty ugly, and I don't think they were doing much to > improve test coverage by adding all those bogus operators. Now, a look at > https://coverage.postgresql.org/src/backend/utils/adt/selfuncs.c.gcov.html > says that our test coverage for convert_to_scalar stinks, but we could > (and probably should) improve that just by testing extant operators. > Hmmm, OK. I admit the tests weren't a work of art, but I didn't consider them particularly ugly either. But that's a matter of taste, I guess. Using existing operators was the first thing I considered, of course, but to exercise this part of the code you need an operator that: 1) exercises uses ineq_histogram_selectivity (so either scalarineqsel or prefix_selectivity) 2) has oprright != oprleft 3) either oprright or oprleft is unsupported by convert_to_scalar I don't think we have such operator (built-in or in regression tests). At least I haven't found one, and this was the simplest case that I've been able to come up with. But maybe you had another idea. > A concrete argument for not creating those operators is that they pose a > risk of breaking concurrently-running tests by capturing inexact argument > matches (cf CVE-2018-1058). There are ways to get around that, eg run > the whole test inside a transaction we never commit; but I don't really > think we need the complication. > I don't think the risk of breaking other tests is very high - the operators were sufficiently "bogus" to make it unlikely, I think. But it's simple to mitigate that by either running the test in a transaction, dropping the operators at the end of the test, or just using some sufficiently unique operator name (and not '>'). FWIW I originally constructed the tests merely to verify that the fix actually fixes the issue, but I think we should add some tests to make sure it does not get broken in the future. It took quite a bit of time until someone even hit the issue, so a breakage may easily get unnoticed for a long time. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > FWIW I originally constructed the tests merely to verify that the fix > actually fixes the issue, but I think we should add some tests to make > sure it does not get broken in the future. It took quite a bit of time > until someone even hit the issue, so a breakage may easily get unnoticed > for a long time. Oh, well, that was another problem I had with it: those tests do basically nothing to ensure that we won't add another such problem in the future. If somebody added support for a new type FOO, and forgot to ensure that FOO-vs-not-FOO cases behaved sanely, these tests wouldn't catch it. (At least, not unless the somebody added a FOO-vs-not-FOO case to these tests, but it's hardly likely they'd do that if they hadn't considered the possibility.) I think actually having put in the coding patterns for what to do with unsupported cases is our best defense against similar oversights in future: probably people will copy those coding patterns. Maybe the bytea precedent proves that some people will fail to think about it no matter what, but I don't know what more we can do. regards, tom lane
Re: [HACKERS] user-defined numeric data types triggering ERROR:unsupported type
От
Tomas Vondra
Дата:
On 03/04/2018 08:37 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> FWIW I originally constructed the tests merely to verify that the fix >> actually fixes the issue, but I think we should add some tests to make >> sure it does not get broken in the future. It took quite a bit of time >> until someone even hit the issue, so a breakage may easily get unnoticed >> for a long time. > > Oh, well, that was another problem I had with it: those tests do basically > nothing to ensure that we won't add another such problem in the future. > If somebody added support for a new type FOO, and forgot to ensure that > FOO-vs-not-FOO cases behaved sanely, these tests wouldn't catch it. > (At least, not unless the somebody added a FOO-vs-not-FOO case to these > tests, but it's hardly likely they'd do that if they hadn't considered > the possibility.) > I don't follow. How would adding new custom types break the checks? If someone adds a new type along with operators for comparing it with the built-in types (supported by convert_to_scalar), then surely it would hit a code path tested by those tests. > I think actually having put in the coding patterns for what to do with > unsupported cases is our best defense against similar oversights in > future: probably people will copy those coding patterns. Maybe the bytea > precedent proves that some people will fail to think about it no matter > what, but I don't know what more we can do. > Maybe. It's true the tests served more like a safety against someone messing with convert_to_scalar (e.g. by adding support for new types), and undoing the fix for the current data types in some way without realizing it. It wouldn't catch issues in handling of additional data types, of course (like the bytea case). So perhaps the best thing we can do is documenting this in the comment before convert_to_scalar? kind regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 03/04/2018 08:37 PM, Tom Lane wrote: >> Oh, well, that was another problem I had with it: those tests do basically >> nothing to ensure that we won't add another such problem in the future. > I don't follow. How would adding new custom types break the checks? If > someone adds a new type along with operators for comparing it with the > built-in types (supported by convert_to_scalar), then surely it would > hit a code path tested by those tests. Well, I think the existing bytea bug is a counterexample to that. If someone were to repeat that mistake with, say, UUID, these tests would not catch it, because none of them would exercise UUID-vs-something-else. For that matter, your statement is false on its face, because even if somebody tried to add say uuid-versus-int8, these tests would not catch lack of support for that in convert_to_scalar unless the specific case of uuid-versus-int8 were added to the tests. > So perhaps the best thing we can do is documenting this in the comment > before convert_to_scalar? I already updated the comment inside it ... regards, tom lane
Re: [HACKERS] user-defined numeric data types triggering ERROR:unsupported type
От
Tomas Vondra
Дата:
On 03/04/2018 09:46 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On 03/04/2018 08:37 PM, Tom Lane wrote: >>> Oh, well, that was another problem I had with it: those tests do basically >>> nothing to ensure that we won't add another such problem in the future. > >> I don't follow. How would adding new custom types break the checks? If >> someone adds a new type along with operators for comparing it with the >> built-in types (supported by convert_to_scalar), then surely it would >> hit a code path tested by those tests. > > Well, I think the existing bytea bug is a counterexample to that. If > someone were to repeat that mistake with, say, UUID, these tests would not > catch it, because none of them would exercise UUID-vs-something-else. > For that matter, your statement is false on its face, because even if > somebody tried to add say uuid-versus-int8, these tests would not catch > lack of support for that in convert_to_scalar unless the specific case of > uuid-versus-int8 were added to the tests. > I suspect we're simply having different expectations what the tests could/should protect against - my intention was to make sure someone does not break convert_to_scalar for the currently handled types. So for example if someone adds the uuid-versus-int8 operator you mention, then int8_var > '55e65ca2-4136-4a4b-ba78-cd3fe4678203'::uuid simply returns false because convert_to_scalar does not handle UUIDs yet (and you're right none of the tests enforces it), while uuid_var > 123456::int8 is handled by the NUMERICOID case, and convert_numeric_to_scalar returns failure=true. And this is already checked by one of the tests. If someone adds UUID handling into convert_to_scalar, that would need to include a bunch of new tests, of course. One reason why I wanted to include the tests was that we've been talking about reworking this code to allow custom conversion routines etc. In which case being able to verify the behavior stays the same is quite valuable, I think. >> So perhaps the best thing we can do is documenting this in the comment >> before convert_to_scalar? > > I already updated the comment inside it ... > Oh, right. Sorry, I missed that somehow. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 03/04/2018 09:46 PM, Tom Lane wrote: >> Well, I think the existing bytea bug is a counterexample to that. If >> someone were to repeat that mistake with, say, UUID, these tests would not >> catch it, because none of them would exercise UUID-vs-something-else. >> For that matter, your statement is false on its face, because even if >> somebody tried to add say uuid-versus-int8, these tests would not catch >> lack of support for that in convert_to_scalar unless the specific case of >> uuid-versus-int8 were added to the tests. > I suspect we're simply having different expectations what the tests > could/should protect against - my intention was to make sure someone > does not break convert_to_scalar for the currently handled types. I concur that we could use better test coverage for the existing code --- the coverage report is pretty bleak there. But I think we could improve that just by testing with the existing operators. I do not see much use in checking that unsupported cross-type cases fail cleanly, because there isn't a practical way to have full coverage for such a concern. regards, tom lane
Re: [HACKERS] user-defined numeric data types triggering ERROR:unsupported type
От
Tomas Vondra
Дата:
On 03/05/2018 08:34 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On 03/04/2018 09:46 PM, Tom Lane wrote: >>> Well, I think the existing bytea bug is a counterexample to that. If >>> someone were to repeat that mistake with, say, UUID, these tests would not >>> catch it, because none of them would exercise UUID-vs-something-else. >>> For that matter, your statement is false on its face, because even if >>> somebody tried to add say uuid-versus-int8, these tests would not catch >>> lack of support for that in convert_to_scalar unless the specific case of >>> uuid-versus-int8 were added to the tests. > >> I suspect we're simply having different expectations what the tests >> could/should protect against - my intention was to make sure someone >> does not break convert_to_scalar for the currently handled types. > > I concur that we could use better test coverage for the existing > code --- the coverage report is pretty bleak there. But I think we > could improve that just by testing with the existing operators. I do > not see much use in checking that unsupported cross-type cases fail > cleanly, because there isn't a practical way to have full coverage > for such a concern. > Hmmm, OK. I'm sure we could improve the coverage of the file in general by using existing operators, no argument here. Obviously, that does not work for failure cases in convert_to_scalar(), because no existing operator can exercise those. I wouldn't call those cases 'unsupported' - they are pretty obviously supported, just meant to handle unknown user-defined data types. Admittedly, the operators in the tests were rather silly but I find them rather practical. In any case, thanks for the discussion! I now understand the reasoning why you did not commit the tests, at least. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services