Обсуждение: Is MinMaxExpr really leakproof?
contain_leaked_vars_walker asserts the following about MinMaxExpr: ... case T_MinMaxExpr: ... /* * We know these node types don't contain function calls; but * something further down in the node tree might. */ break; Now, the idea that it "doesn't contain a function call" is nonsense, because the node will invoke the btree comparison function for the datatype of its arguments. So this coding amounts to an undocumented assumption that every non-cross-type btree comparison function is leakproof. A quick catalog query finds 15 counterexamples just among the built-in datatypes: select p.oid::regprocedure from pg_proc p, pg_amproc a, pg_opfamily f where p.oid=a.amproc and f.oid=a.amprocfamily and opfmethod=403 and not proleakproof and amproclefttype=amprocrighttype andamprocnum=1; bpcharcmp(character,character) btarraycmp(anyarray,anyarray) btbpchar_pattern_cmp(character,character) btoidvectorcmp(oidvector,oidvector) btrecordcmp(record,record) btrecordimagecmp(record,record) bttext_pattern_cmp(text,text) bttextcmp(text,text) enum_cmp(anyenum,anyenum) jsonb_cmp(jsonb,jsonb) numeric_cmp(numeric,numeric) pg_lsn_cmp(pg_lsn,pg_lsn) range_cmp(anyrange,anyrange) tsquery_cmp(tsquery,tsquery) tsvector_cmp(tsvector,tsvector) so this assumption is, on its face, wrong. In practice it might be all right, because it's hard to see a reason why a btree comparison function would ever throw an error except for internal failures, which are probably outside the scope of leakproofness guarantees anyway. Nonetheless, if we didn't mark these functions as leakproof, why not? I think that we should either change contain_leaked_vars_walker to explicitly verify leakproofness of the comparison function, or decide that it's project policy that btree comparison functions are leakproof, and change the markings on those (and their associated operators). regards, tom lane
This thread duplicates https://postgr.es/m/flat/16539.1431472961%40sss.pgh.pa.us On Sun, Dec 30, 2018 at 01:24:02PM -0500, Tom Lane wrote: > So this coding amounts to an undocumented > assumption that every non-cross-type btree comparison function is > leakproof. > select p.oid::regprocedure from pg_proc p, pg_amproc a, pg_opfamily f > where p.oid=a.amproc and f.oid=a.amprocfamily and opfmethod=403 and not proleakproof and amproclefttype=amprocrighttypeand amprocnum=1; > > bpcharcmp(character,character) > btarraycmp(anyarray,anyarray) > btbpchar_pattern_cmp(character,character) > btoidvectorcmp(oidvector,oidvector) > btrecordcmp(record,record) > btrecordimagecmp(record,record) > bttext_pattern_cmp(text,text) > bttextcmp(text,text) > enum_cmp(anyenum,anyenum) > jsonb_cmp(jsonb,jsonb) > numeric_cmp(numeric,numeric) > pg_lsn_cmp(pg_lsn,pg_lsn) > range_cmp(anyrange,anyrange) > tsquery_cmp(tsquery,tsquery) > tsvector_cmp(tsvector,tsvector) > > so this assumption is, on its face, wrong. > > In practice it might be all right, because it's hard to see a reason why > a btree comparison function would ever throw an error except for internal > failures, which are probably outside the scope of leakproofness guarantees > anyway. Nonetheless, if we didn't mark these functions as leakproof, > why not? pg_lsn_cmp() and btoidvectorcmp() surely could advertise leakproofness. I'm not sure about enum_cmp(), numeric_cmp(), tsquery_cmp() or tsvector_cmp(). I can't think of a reason those would leak, though. btrecordcmp() and other polymorphic cmp functions can fail: create type boxrec as (a box); select '("(1,1),(0,0)")'::boxrec = '("(1,1),(0,0)")'::boxrec; => ERROR: could not identify an equality operator for type box The documentation says, "a function which throws an error message for some argument values but not others ... is not leakproof." I would be comfortable amending that to allow the "could not identify an equality operator" error, because that error follows from type specifics, not value specifics. bttextcmp() and other varstr_cmp() callers fall afoul of the same restriction with their "could not convert string to UTF-16" errors (https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com). Leaking the binary fact that an unspecified string contains an unspecified rare Unicode character is not a serious leak, however. Also, those errors would be a substantial usability impediment if they happened much in practice; you couldn't index affected values. > I think that we should either change contain_leaked_vars_walker to > explicitly verify leakproofness of the comparison function, or decide > that it's project policy that btree comparison functions are leakproof, > and change the markings on those (and their associated operators). Either of those solutions sounds fine. Like last time, I'll vote for explicitly verifying leakproofness.
On Mon, 31 Dec 2018 at 12:26, Noah Misch <noah@leadboat.com> wrote:
bttextcmp() and other varstr_cmp() callers fall afoul of the same restriction
with their "could not convert string to UTF-16" errors
(https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com).
Leaking the binary fact that an unspecified string contains an unspecified rare
Unicode character is not a serious leak, however. Also, those errors would be a
substantial usability impediment if they happened much in practice; you couldn't
index affected values.
I'm confused. What characters cannot be represented in UTF-16?
Noah Misch <noah@leadboat.com> writes: > This thread duplicates https://postgr.es/m/flat/16539.1431472961%40sss.pgh.pa.us Ah, so it does. Not sure why that fell off the radar without getting fixed; possibly because it was right before PGCon. > pg_lsn_cmp() and btoidvectorcmp() surely could advertise leakproofness. Agreed; I'll go fix those. > I'm not sure about enum_cmp(), numeric_cmp(), tsquery_cmp() or > tsvector_cmp(). I can't think of a reason those would leak, though. I've not looked at the last three, but enum_cmp can potentially report the value of an input, if it fails to find a matching pg_enum record: enum_tup = SearchSysCache1(ENUMOID, ObjectIdGetDatum(arg1)); if (!HeapTupleIsValid(enum_tup)) ereport(ERROR, (errcode(ERRCODE_INVALID_BINARY_REPRESENTATION), errmsg("invalid internal value for enum: %u", arg1))); and there are similar error reports inside compare_values_of_enum(). Whether that amounts to an interesting security leak is debatable. It's hard to see how an attacker could arrange for those to fail, much less do so in a way that would reveal a value he didn't know already. > btrecordcmp() and other polymorphic > cmp functions can fail: > create type boxrec as (a box); select '("(1,1),(0,0)")'::boxrec = '("(1,1),(0,0)")'::boxrec; > => ERROR: could not identify an equality operator for type box > The documentation says, "a function which throws an error message for some > argument values but not others ... is not leakproof." I would be comfortable > amending that to allow the "could not identify an equality operator" error, > because that error follows from type specifics, not value specifics. I think the real issue with btrecordcmp, btarraycmp, etc, is that they invoke other type-specific comparison functions. Therefore, we cannot mark them leakproof unless every type-specific comparison function is leakproof. So we're right back at the policy question. > bttextcmp() and other varstr_cmp() callers fall afoul of the same restriction > with their "could not convert string to UTF-16" errors > (https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com). > Leaking the binary fact that an unspecified string contains an unspecified rare > Unicode character is not a serious leak, however. Also, those errors would be a > substantial usability impediment if they happened much in practice; you couldn't > index affected values. Yeah. I think that there might be a usability argument for marking textcmp and related operators as leakproof despite this theoretical leak condition, because not marking them puts a large practical constraint on what conditions we can optimize. However, that discussion just applies narrowly to the string data types; it is independent of what we want to say the general policy is. >> I think that we should either change contain_leaked_vars_walker to >> explicitly verify leakproofness of the comparison function, or decide >> that it's project policy that btree comparison functions are leakproof, >> and change the markings on those (and their associated operators). > Either of those solutions sounds fine. Like last time, I'll vote for explicitly > verifying leakproofness. Yeah, I'm leaning in that direction as well. Other than comparisons involving strings, it's not clear that we'd gain much from insisting on leakproofness in general, and it seems like it might be rather a large burden to put on add-on data types. regards, tom lane
Isaac Morland <isaac.morland@gmail.com> writes: > On Mon, 31 Dec 2018 at 12:26, Noah Misch <noah@leadboat.com> wrote: >> bttextcmp() and other varstr_cmp() callers fall afoul of the same >> restriction with their "could not convert string to UTF-16" errors > I'm confused. What characters cannot be represented in UTF-16? What's actually being reported there is failure of Windows' MultiByteToWideChar function. Probable causes could include invalid data (not valid UTF8), or conditions such as out-of-memory which might have nothing at all to do with the input. There are similar, equally nonspecific, error messages in the non-Windows code path. In principle, an attacker might be able to find out the existence of extremely long strings in a column by noting out-of-memory failures in this code, but that doesn't seem like a particularly interesting information leak ... regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> bttextcmp() and other varstr_cmp() callers fall afoul of the same >> restriction with their "could not convert string to UTF-16" errors >> (https://postgr.es/m/CADyhKSXPwrUv%2B9LtqPAQ_gyZTv4hYbr2KwqBxcs6a3Vee1jBLQ%40mail.gmail.com). >> Leaking the binary fact that an unspecified string contains an >> unspecified rare Unicode character is not a serious leak, however. >> Also, those errors would be a substantial usability impediment if >> they happened much in practice; you couldn't index affected values. Tom> Yeah. I think that there might be a usability argument for marking Tom> textcmp and related operators as leakproof despite this Tom> theoretical leak condition, because not marking them puts a large Tom> practical constraint on what conditions we can optimize. However, Tom> that discussion just applies narrowly to the string data types; it Tom> is independent of what we want to say the general policy is. I think that's not even a theoretical leak; the documentation for MultiByteToWideChar does not indicate any way in which it can return an error for the specific parameters we pass to it. In particular we do not tell it to return errors for invalid input characters. -- Andrew (irc:RhodiumToad)
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Noah Misch <noah@leadboat.com> writes: > > Either of those solutions sounds fine. Like last time, I'll vote for explicitly > > verifying leakproofness. > > Yeah, I'm leaning in that direction as well. Other than comparisons > involving strings, it's not clear that we'd gain much from insisting > on leakproofness in general, and it seems like it might be rather a > large burden to put on add-on data types. While I'd actually like it if we required leakproofness for what we ship, I agree that we shouldn't blindly assume that add-on data types are always leakproof and that then requires that we explicitly verify it. Perhaps an argument can be made that there are some cases where what we ship can't or shouldn't be leakproof for usability but, ideally, those would be relatively rare exceptions that don't impact common use-cases. Thanks! Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> Noah Misch <noah@leadboat.com> writes: >>> Either of those solutions sounds fine. Like last time, I'll vote for explicitly >>> verifying leakproofness. >> Yeah, I'm leaning in that direction as well. Other than comparisons >> involving strings, it's not clear that we'd gain much from insisting >> on leakproofness in general, and it seems like it might be rather a >> large burden to put on add-on data types. > While I'd actually like it if we required leakproofness for what we > ship, I agree that we shouldn't blindly assume that add-on data types > are always leakproof and that then requires that we explicitly verify > it. Perhaps an argument can be made that there are some cases where > what we ship can't or shouldn't be leakproof for usability but, ideally, > those would be relatively rare exceptions that don't impact common > use-cases. Seems like we have consensus that MinMaxExpr should verify leakproofness rather than just assume it, so I'll go fix that. What's your opinion on the question of whether to try to make text_cmp et al leakproof? I notice that texteq/textne are (correctly) marked leakproof, so perhaps the usability issue isn't as pressing as I first thought; but it remains true that there are fairly common cases where the current marking is going to impede optimization. I also realized that in the wake of 586b98fdf, we have to remove the leakproofness marking of name_cmp and name inequality comparisons --- which I did at d01e75d68, but that's potentially a regression in optimizability of catalog queries, so it's not very nice. Also, I believe that Peter's work towards making text equality potentially collation-sensitive will destroy the excuse for marking texteq/textne leakproof if we're going to be 100% rigid about that, and that would be a very big regression. So I'd like to get to a point where we're comfortable marking these functions leakproof despite the possibility of corner-case failures. We could just decide that the existing failure cases in varstr_cmp are not usefully exploitable for information leakage, or perhaps we could dumb down the error messages some more to make them even less so. It'd also be nice to have some articulatable policy that encompasses a choice like this. Thoughts? regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> Noah Misch <noah@leadboat.com> writes: > >>> Either of those solutions sounds fine. Like last time, I'll vote for explicitly > >>> verifying leakproofness. > > >> Yeah, I'm leaning in that direction as well. Other than comparisons > >> involving strings, it's not clear that we'd gain much from insisting > >> on leakproofness in general, and it seems like it might be rather a > >> large burden to put on add-on data types. > > > While I'd actually like it if we required leakproofness for what we > > ship, I agree that we shouldn't blindly assume that add-on data types > > are always leakproof and that then requires that we explicitly verify > > it. Perhaps an argument can be made that there are some cases where > > what we ship can't or shouldn't be leakproof for usability but, ideally, > > those would be relatively rare exceptions that don't impact common > > use-cases. > > Seems like we have consensus that MinMaxExpr should verify leakproofness > rather than just assume it, so I'll go fix that. > > What's your opinion on the question of whether to try to make text_cmp > et al leakproof? I notice that texteq/textne are (correctly) marked > leakproof, so perhaps the usability issue isn't as pressing as I first > thought; but it remains true that there are fairly common cases where > the current marking is going to impede optimization. I also realized > that in the wake of 586b98fdf, we have to remove the leakproofness > marking of name_cmp and name inequality comparisons --- which I did > at d01e75d68, but that's potentially a regression in optimizability > of catalog queries, so it's not very nice. Well, as mentioned, I'd really be happier to have more things be leakproof, when they really are leakproof. What we've done in some places, and I'm not sure how practical this is elsewhere, is to show data when we know the user is allowed to see it anyway, to aide in debugging and such (I'm thinking here specifically of BuildIndexValueDescription(), which will just return a NULL in the case where the user doesn't have permission to view all of the columns involved). As these are error cases, we're generally happy to consider spending a bit of extra time to figure that out, is there something similar we could do for these cases where we'd really like to report useful information to the user, but only if we think they're probably allowed to see it? > Also, I believe that Peter's work towards making text equality potentially > collation-sensitive will destroy the excuse for marking texteq/textne > leakproof if we're going to be 100% rigid about that, and that would be > a very big regression. That could be a serious problem, I agree. > So I'd like to get to a point where we're comfortable marking these > functions leakproof despite the possibility of corner-case failures. > We could just decide that the existing failure cases in varstr_cmp are > not usefully exploitable for information leakage, or perhaps we could > dumb down the error messages some more to make them even less so. > It'd also be nice to have some articulatable policy that encompasses > a choice like this. I'd rather not say "well, these are mostly leakproof and therefore it's good enough" unless those corner-case failures you're referring to are really "this system call isn't documented to ever fail in a way we can't handle, but somehow it did and we're blowing up because of it." At least, in the cases where we're actually leaking knowledge that we shouldn't be. If what we're leaking is some error being returned where all we're returning is an error code and not the actual data then that doesn't seem like it's really much of a leak to me..? I'm just glancing through varstr_cmp and perhaps I'm missing something but it seems like everywhere we're returning an error, at least from there, it's an error code of some kind being returned and not the data that was passed in to the function. I didn't spend a lot of time hunting through it though. Thanks! Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> So I'd like to get to a point where we're comfortable marking these >> functions leakproof despite the possibility of corner-case failures. >> We could just decide that the existing failure cases in varstr_cmp are >> not usefully exploitable for information leakage, or perhaps we could >> dumb down the error messages some more to make them even less so. >> It'd also be nice to have some articulatable policy that encompasses >> a choice like this. > I'd rather not say "well, these are mostly leakproof and therefore it's > good enough" unless those corner-case failures you're referring to are > really "this system call isn't documented to ever fail in a way we can't > handle, but somehow it did and we're blowing up because of it." Well, that's pretty much what we've got here. 1. As Noah noted, every failure case in varstr_cmp is ideally a can't happen case, since if it could happen on valid data then that data couldn't be put into a btree index. 2. AFAICS, all the error messages in question just report that a system operation failed, with some errno string or equivalent; none of the original data is reported. (Obviously, we'd want to add comments discouraging people from changing that ...) Conceivably, an attacker could learn the length of some long string by noting a palloc failure report --- but we've already accepted an equivalent hazard in texteq or byteaeq, I believe, and anyway it's pretty hard to believe that an attacker could control such failures well enough to weaponize it. So the question boils down to whether you think that somebody could infer something else useful about the contents of a string from the strerror (or ICU u_errorName()) summary of a system function failure. This seems like a pretty thin argument to begin with, and then the presumed difficulty of making such a failure happen repeatably makes it even harder to credit as a useful information leak. So I'm personally satisfied that we could mark text_cmp et al as leakproof, but I'm not sure how we define a project policy that supports such a determination. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Tom Lane (tgl@sss.pgh.pa.us) wrote: > >> So I'd like to get to a point where we're comfortable marking these > >> functions leakproof despite the possibility of corner-case failures. > >> We could just decide that the existing failure cases in varstr_cmp are > >> not usefully exploitable for information leakage, or perhaps we could > >> dumb down the error messages some more to make them even less so. > >> It'd also be nice to have some articulatable policy that encompasses > >> a choice like this. > > > I'd rather not say "well, these are mostly leakproof and therefore it's > > good enough" unless those corner-case failures you're referring to are > > really "this system call isn't documented to ever fail in a way we can't > > handle, but somehow it did and we're blowing up because of it." > > Well, that's pretty much what we've got here. Good. Those all almost certainly fall under the category of 'covert channels' and provided they're low bandwidth and hard to control, as seems to be the case here, then I believe we can accept them. I'm afraid there isn't really any hard-and-fast definition that could be used as a basis for a project policy around this, unfortunately. We certainly shouldn't be returning direct data from the heap or indexes as part of error messages in leakproof functions, and we should do our best to ensure that anything from system calls we make also don't, but strerror-like results or the error codes themselves should be fine. > 1. As Noah noted, every failure case in varstr_cmp is ideally a can't > happen case, since if it could happen on valid data then that data > couldn't be put into a btree index. That's certainly a good point. > 2. AFAICS, all the error messages in question just report that a system > operation failed, with some errno string or equivalent; none of the > original data is reported. (Obviously, we'd want to add comments > discouraging people from changing that ...) Agreed, we should definitely add comments here (and, really, in any other cases where we need to be thinking about similar issues..). > Conceivably, an attacker could learn the length of some long string > by noting a palloc failure report --- but we've already accepted an > equivalent hazard in texteq or byteaeq, I believe, and anyway it's > pretty hard to believe that an attacker could control such failures > well enough to weaponize it. Right, that's a low bandwidth covert channel and as such should be acceptable. > So the question boils down to whether you think that somebody could > infer something else useful about the contents of a string from > the strerror (or ICU u_errorName()) summary of a system function > failure. This seems like a pretty thin argument to begin with, > and then the presumed difficulty of making such a failure happen > repeatably makes it even harder to credit as a useful information > leak. > > So I'm personally satisfied that we could mark text_cmp et al as > leakproof, but I'm not sure how we define a project policy that > supports such a determination. I'm not sure how to formalize such a policy either though perhaps we could discuss specific "don't do this" things and have a light dicussion about what covert channel are. Thanks! Stephen