Обсуждение: XMLATTRIBUTES vs. values of type XML

Поиск
Список
Период
Сортировка

XMLATTRIBUTES vs. values of type XML

От
Florian Pflug
Дата:
Hi

While reviewing the (now applied) XPATH escaping patches, Radoslaw found one
case where the previous failure of XPATH to escape its return value was offset
by XMLATTRIBUTES insistence to escape all input values, even if they're
already of type XML.

To wit, if you do
 SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES('&'::XML AS "a"))

you get
    xmlelement
--------------------<t a="&amp;"/>

which is somewhat surprising. Especially since
 SELECT XMLELEMENT(NAME "t", '&'::XML);

gives
 xmlelement
--------------<t>&</t>

as one would except. Now, it seems rather unlikely that a real application
would actually contain a query similar to the former one - you usually don 't
store XML in the attributes of XML nodes. But since XPATH() returns XML, it's
not unlikely that an application would do
 SELECT XMLELEMENT(NAME ..., XMLATTRIBUTES((XPATH(...))[1] AS ...))

As it stands, on 9.2 the values returned by the XPath expression will be
escaped twice - once by XPATH and a second time by XMLATTRIBUTES, while on 9.1
the value will be escaped only once.

Since this seems to be the most likely situation in which XMLATTRIBUTES would
receive an input argument of type XML, and since it's not entirely logical for
XMLATTRIBUTES to unconditionally escapes values already of type XML, I propose
to change the behaviour of XMLATTRIBUTES as follows.

Values not of type XML are be treated as they always have been, i.e. all
special characters (<,>,&,",') are replaced by an entity reference (<,
>, &, ", ').

Values of type XML are assumed to be already escaped, so '&' isn't treated as
a special character to avoid double escaping. The other special characters
(<,>,",') are escaped as usual.

The safety of this relies on the fact that '&' may never appear in a
well-formed XML document, except as part of an entity reference. This seems to
be the case, except if CDATA sections are involved, which may contain plain
ampersands. So the actual escaping rule would have to be a bit more complex
than I made it sound above - we'd need to detect CDATA sections, and
re-enabled the escaping of '&' until the end of such a section.

To actually implement this, I'd remove the use of libxml from the
implementation of XMLELEMENT, and instead use our already existing
escape_xml() function, enhanced with the ability to handle the partial
escaping algorithm outlined above. We already only use libxml to escape
attribute values, so doing that isn't a radical departure from xml.c's ways.
As an added benefit, all the encoding-related problems of XMLELEMENT and
XMLATTRIBUTES would go away once libxml is removed from this code path. So
XPATH() would be the only remaining function which breaks in non-UTF-8
databases.

Comments? Thoughts? Suggestions?

best regards,
Florian Pflug





Re: XMLATTRIBUTES vs. values of type XML

От
Peter Eisentraut
Дата:
On tis, 2011-07-26 at 22:44 +0200, Florian Pflug wrote:
> While reviewing the (now applied) XPATH escaping patches, Radoslaw
> found one
> case where the previous failure of XPATH to escape its return value
> was offset
> by XMLATTRIBUTES insistence to escape all input values, even if
> they're
> already of type XML.
> 
> To wit, if you do
> 
>   SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES('&'::XML AS "a"))
> 
> you get
> 
>      xmlelement     
> --------------------
>  <t a="&amp;"/> 

Per SQL standard, the attribute values may not be of type XML, so maybe
we should just prohibit it.



Re: XMLATTRIBUTES vs. values of type XML

От
Florian Pflug
Дата:
On Jul27, 2011, at 16:18 , Peter Eisentraut wrote:
> On tis, 2011-07-26 at 22:44 +0200, Florian Pflug wrote:
>> While reviewing the (now applied) XPATH escaping patches, Radoslaw
>> found one
>> case where the previous failure of XPATH to escape its return value
>> was offset
>> by XMLATTRIBUTES insistence to escape all input values, even if
>> they're
>> already of type XML.
>> 
>> To wit, if you do
>> 
>>  SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES('&'::XML AS "a"))
>> 
>> you get
>> 
>>     xmlelement     
>> --------------------
>> <t a="&amp;"/> 
> 
> Per SQL standard, the attribute values may not be of type XML, so maybe
> we should just prohibit it.

We probably should have, but I think it's too late for that. I don't
believe I'm the only one who uses XPATH results as attribute values,
and we'd severely break that use-case.

You might say the same thing about my proposal, of course, but I believe
the risk is much smaller there. Applications would only break if they (a) Pass XML from a source other than a XPath
expressionselecting     a text or attribute and (b) actually want double-escaping to occur.
 

As a data point, I've written an application with makes heavy use of
our XML infrastructure over the last few months (as you might have guessed
from the stream of patches ;-)). That application would be pretty much
untroubled by the changes to XMLATTRIBUTES I proposed, but would be
severely broken if we rejected values of type XML all together.

best regards,
Florian Pflug




Re: XMLATTRIBUTES vs. values of type XML

От
Peter Eisentraut
Дата:
On ons, 2011-07-27 at 19:37 +0200, Florian Pflug wrote:
> > Per SQL standard, the attribute values may not be of type XML, so
> maybe
> > we should just prohibit it.
> 
> We probably should have, but I think it's too late for that. I don't
> believe I'm the only one who uses XPATH results as attribute values,
> and we'd severely break that use-case.
> 
> You might say the same thing about my proposal, of course, but I
> believe
> the risk is much smaller there. Applications would only break if they
>   (a) Pass XML from a source other than a XPath expression selecting
>       a text or attribute and
>   (b) actually want double-escaping to occur.

Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
would behave as in

SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES(XMLSERIALIZE(content '&'::XML AS text) AS "a"))

which is what it indeed does in 9.1.

So if we don't want to restrict this, for backward compatibility, then I
would suggest that we fix it to work like it used to.

I would be very hesitant about adding another escape mechanism that
escapes some things but not others.  We already have two or three of
those for XML, and it doesn't seem worth adding another one just for
this, which is outside the standard and for which a valid workaround
exists.




Re: XMLATTRIBUTES vs. values of type XML

От
Florian Pflug
Дата:
On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
> Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
> would behave as in
>
> SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES(XMLSERIALIZE(content '&'::XML AS text) AS "a"))

With both 9.1 and 9.2 this query returns
    xmlelement
--------------------<t a="&amp;"/>

i.e. makes the value of "a" represent the *literal* string '&', *not*
the literal string '&'. Just to be sure there's no miss-understanding here
- is this what you expect?

> which is what it indeed does in 9.1.
>
> So if we don't want to restrict this, for backward compatibility, then I
> would suggest that we fix it to work like it used to.

There's currently no difference in behaviour between 9.1 and 9.2 there.
We've only modified XPATH to always correctly escape it's result in 9.2,
so there's only a difference if you pass the result of XPATH() to
XMLATTRIBUTES. Which I figured to be the most likely reason for to pass
values of type XML to XMLATTRIBUTES, but maybe you disagree there.

> I would be very hesitant about adding another escape mechanism that
> escapes some things but not others.  We already have two or three of
> those for XML, and it doesn't seem worth adding another one just for
> this, which is outside the standard and for which a valid workaround
> exists.

What's the workaround if you have a value of type XML, e.g. '&',
and want to set an attribute to the value represented by that XML fragment
(i.e. '&')? Since we have no XMLUNESCAPE function, I don't see an easy
way to do that. Maybe I'm missing something, though.

best regards,
Florian Pflug



Re: XMLATTRIBUTES vs. values of type XML

От
Peter Eisentraut
Дата:
On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
> On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
> > Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
> > would behave as in
> > 
> > SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES(XMLSERIALIZE(content '&'::XML AS text) AS "a"))
> 
> With both 9.1 and 9.2 this query returns
> 
>      xmlelement     
> --------------------
>  <t a="&amp;"/>
> 
> i.e. makes the value of "a" represent the *literal* string '&', *not*
> the literal string '&'. Just to be sure there's no miss-understanding here
> - is this what you expect?

Well, I expect it to fail.

> What's the workaround if you have a value of type XML, e.g. '&',
> and want to set an attribute to the value represented by that XML fragment
> (i.e. '&')? Since we have no XMLUNESCAPE function, I don't see an easy
> way to do that. Maybe I'm missing something, though.

It may be worth researching whether the XMLSERIALIZE function is
actually doing what it should, or whether it could/should do some
unescaping.  Unfortunately, in the latest SQL/XML standard the final
answer it nested deep in the three other standards, so I don't have an
answer right now.  But there are plenty of standards in this area, so
I'd hope that one of them can give us the right behavior, instead of us
making something up.




Re: XMLATTRIBUTES vs. values of type XML

От
Florian Pflug
Дата:
On Jul28, 2011, at 22:51 , Peter Eisentraut wrote:
> On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
>> On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
>>> Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
>>> would behave as in
>>>
>>> SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES(XMLSERIALIZE(content '&'::XML AS text) AS "a"))
>>
>> With both 9.1 and 9.2 this query returns
>>
>>     xmlelement
>> --------------------
>> <t a="&amp;"/>
>>
>> i.e. makes the value of "a" represent the *literal* string '&', *not*
>> the literal string '&'. Just to be sure there's no miss-understanding here
>> - is this what you expect?
>
> Well, I expect it to fail.

Now you've lost me. What exactly should fail under what circumstances?

>> What's the workaround if you have a value of type XML, e.g. '&',
>> and want to set an attribute to the value represented by that XML fragment
>> (i.e. '&')? Since we have no XMLUNESCAPE function, I don't see an easy
>> way to do that. Maybe I'm missing something, though.
>
> It may be worth researching whether the XMLSERIALIZE function is
> actually doing what it should, or whether it could/should do some
> unescaping.

I don't see how that could work. It can't unescape anything in e.g.
'<t>&</t>', because the result would be a quite useless not-really-XML
kind of thing. It could unescape '&' but that kind of content-dependent
behaviour seem even worse than my proposed escaping rules for XMLATTRIBUTES.

> Unfortunately, in the latest SQL/XML standard the final
> answer it nested deep in the three other standards, so I don't have an
> answer right now.  But there are plenty of standards in this area, so
> I'd hope that one of them can give us the right behavior, instead of us
> making something up.

Which standards to you have in mind there? If you can point me to a place
where I can obtain them, I could check if there's something in them
which helps.

best regards,
Florian Pflug



Re: XMLATTRIBUTES vs. values of type XML

От
Peter Eisentraut
Дата:
On fre, 2011-07-29 at 11:37 +0200, Florian Pflug wrote:
> On Jul28, 2011, at 22:51 , Peter Eisentraut wrote:
> > On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
> >> On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
> >>> Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
> >>> would behave as in
> >>> 
> >>> SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES(XMLSERIALIZE(content '&'::XML AS text) AS "a"))
> >> 
> >> With both 9.1 and 9.2 this query returns
> >> 
> >>     xmlelement     
> >> --------------------
> >> <t a="&amp;"/>
> >> 
> >> i.e. makes the value of "a" represent the *literal* string '&', *not*
> >> the literal string '&'. Just to be sure there's no miss-understanding here
> >> - is this what you expect?
> > 
> > Well, I expect it to fail.
> 
> Now you've lost me. What exactly should fail under what circumstances?

To me, the best solution still appears to be forbidding passing values
of type xml to XMLATTRIBUTES, unless we find an obviously better
solution that is not, "I came up with this custom escape function that I
tweaked so that it appears to make sense".

> > Unfortunately, in the latest SQL/XML standard the final
> > answer it nested deep in the three other standards, so I don't have an
> > answer right now.  But there are plenty of standards in this area, so
> > I'd hope that one of them can give us the right behavior, instead of us
> > making something up.
> 
> Which standards to you have in mind there? If you can point me to a place
> where I can obtain them, I could check if there's something in them
> which helps.

In SQL/XML 2008, the actual behavior of XMLSERIALIZE is delegated to
"XSLT 2.0 and XQuery 1.0 Serialization".  I'm not familiar with this
latter standard, but it appears to have lots of options and parameters,
one of which might help us.




Re: XMLATTRIBUTES vs. values of type XML

От
Florian Pflug
Дата:
On Aug11, 2011, at 09:16 , Peter Eisentraut wrote:
> On fre, 2011-07-29 at 11:37 +0200, Florian Pflug wrote:
>> On Jul28, 2011, at 22:51 , Peter Eisentraut wrote:
>>> On ons, 2011-07-27 at 23:21 +0200, Florian Pflug wrote:
>>>> On Jul27, 2011, at 23:08 , Peter Eisentraut wrote:
>>>>> Well, offhand I would expect that passing an XML value to XMLATTRIBUTES
>>>>> would behave as in
>>>>>
>>>>> SELECT XMLELEMENT(NAME "t", XMLATTRIBUTES(XMLSERIALIZE(content '&'::XML AS text) AS "a"))
>>>>
>>>> With both 9.1 and 9.2 this query returns
>>>>
>>>>    xmlelement
>>>> --------------------
>>>> <t a="&amp;"/>
>>>>
>>>> i.e. makes the value of "a" represent the *literal* string '&', *not*
>>>> the literal string '&'. Just to be sure there's no miss-understanding here
>>>> - is this what you expect?
>>>
>>> Well, I expect it to fail.
>>
>> Now you've lost me. What exactly should fail under what circumstances?
>
> To me, the best solution still appears to be forbidding passing values
> of type xml to XMLATTRIBUTES, unless we find an obviously better
> solution that is not, "I came up with this custom escape function that I
> tweaked so that it appears to make sense".

Hm, OK, I see your point. However, if we simply raise an error in 9.2,
and do nothing else, that we make it impossible to use the result of
an XPath expression as an XML attribute value. Not just inconvenient,
but impossible, so I don't think we can do that. We'd thus need to add a
function
 XMLUNESCAPE(XML) RETURNS TEXT

to restore that functionality. Defining a sane behaviour for such a function,
however, seems no easier than defining sane behaviour for an XML attribute
of already of type XML. The core of the problems remains to define the result
of XMLUNESCAPE('<tag>content</tag>'), just as the core of the XMLATTRIBUTES
problems is to define XMLELEMENT(... XMLATTRIBUTES('<tag>content</tag>' as a)).

Thinking about this further, it seems that we essentially have two distinct
classes of XML values. Some are essentially plain text, but might contains
entity references, while others are "real" XML fragments which contain at
least one tag. That suggests that a sensible behaviour for XMLUNESCAPE might
be to return a string with the entity references resolved in the former case,
and simply return an error in the latter.

To summarize, we'd have
 XMLUNESCAPE(''::XML) -> 'a' XMLUNESCAPE('a'::XML) -> 'a' XMLUNESCAPE('<'::XML) -> '<' XMLUNESCAPE('<t/>'::XML) ->
error

To not break applications needlessly, I'd then be inclined to make
 XMLATTRIBUTES(xml_value as "a")

mean
 XMLATTRIBUTES(XMLUNESCAPE(xml_value) as "a")

i.e. throw an error if xml_value contains anything but plain text and
entity references. But I could probably also live with not doing that.

>>> Unfortunately, in the latest SQL/XML standard the final
>>> answer it nested deep in the three other standards, so I don't have an
>>> answer right now.  But there are plenty of standards in this area, so
>>> I'd hope that one of them can give us the right behavior, instead of us
>>> making something up.
>>
>> Which standards to you have in mind there? If you can point me to a place
>> where I can obtain them, I could check if there's something in them
>> which helps.
>
> In SQL/XML 2008, the actual behavior of XMLSERIALIZE is delegated to
> "XSLT 2.0 and XQuery 1.0 Serialization".  I'm not familiar with this
> latter standard, but it appears to have lots of options and parameters,
> one of which might help us.

I'll try to obtain a copy of that. Thanks.

best regards,
Florian Pflug