Обсуждение: Review of patch Bugfix for XPATH() if expression returns a scalar value
This is review of patch https://commitfest.postgresql.org/action/patch_view?id=565 "Bugfix for XPATH() if expression returns a scalar value" Patch applies cleanly, and compiles cleanly too, I didn't checked tests. Form discussion about patch, and referenced thread in this patch http://archives.postgresql.org/pgsql-general/2010-07/msg00355.php, if I understand good such functionality is desired. This patch, I think, gives good approach for dealing with scalar values, and, as well converting value to it's string representation is good too (according to current support for xml), with one exception detailed below. In this patch submitter, similarly to https://commitfest.postgresql.org/action/patch_view?id=580, added functionality for XML-escaping of some kind of values (I think only string scalars are escaped), which is not-natural and may lead to double escaping in some situation, example query may be: SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as foo; xmlelement -------------------------<root sth="<n"/>It's clearly visible that value from attribute is "<n", not "<". Every parser will read this as "<n" which is not-natural and will require form consumer/developer to de-escape this on his side - roughly speaking this will be reported as serious bug.I didn't found good reasons why XML-escaping should be included, submitter wrote about inserting this to xml column and possibility of data damage, but didn't give an example. Such example is desired. Conclusion I vote +1 for this patch if escaping will be removed. Regards, Radoslaw Smogura
On Jun29, 2011, at 19:57 , Radosław Smogura wrote: > This is review of patch > https://commitfest.postgresql.org/action/patch_view?id=565 > "Bugfix for XPATH() if expression returns a scalar value" > > SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT > (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name > "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as foo; > > xmlelement > ------------------------- > <root sth="<n"/> > > It's clearly visible that value from attribute is "<n", not "<". Every > parser will read this as "<n" which is not-natural and will require form > consumer/developer to de-escape this on his side - roughly speaking this will > be reported as serious bug. There's a problem there, no doubt, but your proposed solution just moves the problem around. Here's your query, reformatted to be more legible SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth) ) FROM ( SELECT (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name "root", XMLATTRIBUTES('<n'AS xmlns, '<v' AS value),'<t') ) ) v(x)) as foo; What happens is that the XPATH expression selects the xmlns attribute with the value '<n'. To be well-formed xml, that value must be escaped, so what is actually returned by the XPATH call is '<n'. The XMLATTRIBUTES() function, however, doesn't distinguish between input of type XML and input of type TEXT, so it figures it has to represent the *textual* value '<n' in xml, and produces '<n'. Not escaping textual values returned by XPATH isn't a solution, though. For example, assume someone inserts the value returned by the XPATH() call in your query into a column of type XML. If XPATH() returned '<n' literally instead of escaping it, the column would contain non-well-formed XML in a column of type XML. Not pretty, and pretty fatal during your next dump/reload cycle. Or, if you want another example, simply remove the XMLATTRIBUTES call and use the value returned by XPATH as a child node directly. SELECT XMLELEMENT(name root, foo.namespace ) FROM ( SELECT (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name "root", XMLATTRIBUTES('<n'AS xmlns, '<v' AS value),'<t') ) ) v(x)) as foo; xmlelement --------------------<root><n</root> Note that this correct! The <root> node contains a text node representing the textual value '<n'. If XPATH() hadn't return the value escaped, the query above would have produces <root><n</root> which is obviously wrong. It works in this case because XMLELEMENT is smart enough to distinguish between child nodes gives as TEXT values (those are escaped) and child nodes provided as XML values (those are inserted unchanged). XMLATTRIBUTES, however, doesn't make that distinction, and simply escaped all attributes values independent of their type. Now, the reason it does that is probably that *not* all XML values are well-formed attribute values without additional escaping. For example, you cannot have literal '<' and '>' in attribute values. So if XMLATTRIBUTES, like XMLELEMENT never escaped values which are already of type XML, the following piece of code XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v)) would produce <a v="<node/>"/> which, alas, is not well-formed :-( The correct solution, I believe, is to teach XMLATTRIBUTES to only escape those characters which are invalid in attribute values but valid in XML (Probably only '<' and '>' but I'll check). If there are no objects, I'll prepare a separate patch for that. I don't want to include it in this patch because it's really a distinct issue (even modifies a different function). best regards, Florian Pflug
Re: Review of patch Bugfix for XPATH() if expression returns a scalar value
От
Radosław Smogura
Дата:
On Wed, 29 Jun 2011 22:26:39 +0200, Florian Pflug wrote: > On Jun29, 2011, at 19:57 , Radosław Smogura wrote: >> This is review of patch >> https://commitfest.postgresql.org/action/patch_view?id=565 >> "Bugfix for XPATH() if expression returns a scalar value" >> >> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) >> FROM (SELECT >> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES >> (XMLELEMENT(name >> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as >> foo; >> >> xmlelement >> ------------------------- >> <root sth="<n"/> >> >> It's clearly visible that value from attribute is "<n", not "<". >> Every >> parser will read this as "<n" which is not-natural and will >> require form >> consumer/developer to de-escape this on his side - roughly speaking >> this will >> be reported as serious bug. > > There's a problem there, no doubt, but your proposed solution just > moves the > problem around. > > Here's your query, reformatted to be more legible > > SELECT > XMLELEMENT(name root, > XMLATTRIBUTES(foo.namespace AS sth) > ) > FROM ( > SELECT > (XPATH('namespace-uri(/*)', x))[1] AS namespace > FROM (VALUES (XMLELEMENT(name "root", > XMLATTRIBUTES('<n' AS xmlns, > '<v' AS value),'<t') > ) > ) v(x)) as foo; > > What happens is that the XPATH expression selects the xmlns > attribute with the value '<n'. To be well-formed xml, that value > must be escaped, so what is actually returned by the XPATH > call is '<n'. The XMLATTRIBUTES() function, however, doesn't > distinguish between input of type XML and input of type TEXT, > so it figures it has to represent the *textual* value '<n' > in xml, and produces '<n'. > > Not escaping textual values returned by XPATH isn't a solution, > though. For example, assume someone inserts the value returned > by the XPATH() call in your query into a column of type XML. > If XPATH() returned '<n' literally instead of escaping it, > the column would contain non-well-formed XML in a column of type > XML. Not pretty, and pretty fatal during your next dump/reload > cycle. > > Or, if you want another example, simply remove the XMLATTRIBUTES > call and use the value returned by XPATH as a child node directly. > > SELECT > XMLELEMENT(name root, > foo.namespace > ) > FROM ( > SELECT > (XPATH('namespace-uri(/*)', x))[1] AS namespace > FROM (VALUES (XMLELEMENT(name "root", > XMLATTRIBUTES('<n' AS xmlns, > '<v' AS value),'<t') > ) > ) v(x)) as foo; > > xmlelement > -------------------- > <root><n</root> > > Note that this correct! The <root> node contains a text node > representing the textual value '<n'. If XPATH() hadn't return > the value escaped, the query above would have produces > > <root><n</root> > > which is obviously wrong. It works in this case because XMLELEMENT > is smart enough to distinguish between child nodes gives as TEXT > values (those are escaped) and child nodes provided as XML values > (those are inserted unchanged). > > XMLATTRIBUTES, however, doesn't make that distinction, and simply > escaped all attributes values independent of their type. Now, the > reason it does that is probably that *not* all XML values are > well-formed attribute values without additional escaping. For > example, > you cannot have literal '<' and '>' in attribute values. So if > XMLATTRIBUTES, like XMLELEMENT never escaped values which are > already of type XML, the following piece of code > > XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v)) > > would produce > > <a v="<node/>"/> > > which, alas, is not well-formed :-( > > The correct solution, I believe, is to teach XMLATTRIBUTES to > only escape those characters which are invalid in attribute > values but valid in XML (Probably only '<' and '>' but I'll > check). If there are no objects, I'll prepare a separate patch > for that. I don't want to include it in this patch because it's > really a distinct issue (even modifies a different function). > > best regards, > Florian Pflug You may manually enter invalid xml too, You don't need xpath for this. Much moreIn PostgreSQL 9.0.1, compiled by Visual C++build 1500, 64-bitI executedSELECT XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v))and it's looks like I got correctoutput "<a v="<node/>"/>"when I looked with text editor into table files I saw same value. I will check on last master if I can reproduce this. But indeed, now PostgreSQL is not type safe against XML type. See SELECT XMLELEMENT(name "root", '<', (xpath('//text()','<root><</root>'))[1])). You found some problem with escaping, but solution is bad, I think problem lies with XML type, which may be used to holdstrings and proper xml. For example above query can't distinguish if (xpath('//text()', '<root><</root>'))[1] is xmltext, or xml element. For me adding support for scalar is really good and needed, but escaping is not. Regards,Radosław Smogura
On Jun30, 2011, at 10:33 , Radosław Smogura wrote: > You may manually enter invalid xml too, You don't need xpath for this. Please give an example, instead of merely asserting I get postgres=# select '<'::xml; ERROR: invalid XML content at character 8 > Much more > In PostgreSQL 9.0.1, compiled by Visual C++ build 1500, 64-bit > I executed > SELECT XMLELEMENT(name "a", XMLATTRIBUTES('<node/>'::xml as v)) > and it's looks like I got correct output "<a v="<node/>"/>" > when I looked with text editor into table files I saw same value. I fail to see your point. XMLATTRIBUTES currently escapes the value unconditionally, which happens to work as expected in this case. But try SELECT XMLELEMENT(name "a", XMLATTRIBUTES('<'::xml as v)) which returns <a v="<"/> which I guess isn't what one would expect, otherwise one wouldn't have added the cast to xml. I believe the latter example should probably return <a v="<"/> instead, just as XMLELEMENT(name "a", '<'::xml) returns <a><</a> But again, this has little to do with the patch at hand, and should therefore be discussed separately. The fact that XPATH's failure to escape it's return value happens to fit XMLATTRIBUTE's failure to not escape it's input value doesn't prove that both functions are correct. Especially not if other functions like XMLELEMENT *don't* escape their input values of they're already of type XML. > But indeed, now PostgreSQL is not type safe against XML type. See > SELECT XMLELEMENT(name "root", '<', (xpath('//text()', '<root><</root>'))[1])). That is *exactly* what my other patch fixes, the one you deem undesirable. With that other patch applied, I get xmlelement -----------------------<root><<</root> wich is correct. So here too I fail to see your point. > You found some problem with escaping, but solution is bad, > I think problem lies with XML type, which may be used to hold strings > and proper xml. But it's not *supposed* to hold arbitrary strings. If it was, the input function wouldn't check whether or not the value was valid or not. Do you agree that, for any type, "SELECT value_of_type_X::text::X" should never throw an error? Because that, IMHO quite basic, guarantee is broken without proper escaping of XML values. Try SELECT (XPATH('/text()', '<t><</t>'::xml))[1]::text::xml on unpatched 9.0. It fails, which IMNSHO is very clearly a bug. > For example above query can't distinguish if > (xpath('//text()', '<root><</root>'))[1] is xml text, or xml element. Text *is* a kind of element in XML. There are different types of nodes, one of which are is "text node". The fact that it's not surrounded by <..>, </..> doesn't make a different, and doesn't change the quoting rules one bit. > For me adding support for scalar is really good and needed, but escaping is not. So far, I remain unconvinced by your arguments. What you argue for might be sensible if we *didn't* have an XML type and instead simply had stuff like XPATH() for type TEXT. But we *do* have a type XML, which *does* validate it's input, so I fail to see how allowing values of type XML which *don't* pass that input validation can be anything but a bug. Compare this to the textual types. We've gone through some pain in the past to guarantee that textual types never hold values which aren't valid text in the database's encoding (e.g, we prevent textual values from containing bytes sequences which aren't valid UTF-8 if the database encoding is UTF-8). It follows that we should do the same for XML types. best regards, Florian Pflug
Hi Radosław, Do you still have objections, or did I manage to convince you? Also, aside from the question of whether to escape or not, did you find any issues with either the patch (like spurious whitespace, ...) or the code, or is the patch OK implementation-wise? best regards, Florian Pflug
Hi Radosław, Do you plan to comment on this patch (the one that adds support for XPATH expressions returning scalar values, not the one that escapes text and attributes nodes which are selected) further, or should it be marked "Ready for Committer"? I'm asking because I just noticed that you marked the other patch as "Ready for Commiter", but not this one. best regards, Florian Pflug