Обсуждение: BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath

Поиск
Список
Период
Сортировка

BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath

От
onic@live.fr
Дата:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz
aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDE1MQpMb2dnZWQgYnk6ICAg
ICAgICAgIG9saSBuaWMKRW1haWwgYWRkcmVzczogICAgICBvbmljQGxpdmUu
ZnIKUG9zdGdyZVNRTCB2ZXJzaW9uOiA5LjUuMwpPcGVyYXRpbmcgc3lzdGVt
OiAgIFdpbmRvd3MKRGVzY3JpcHRpb246ICAgICAgICAKClRoZSBmb2xsb3dp
bmcgYnVnIGhhcyBiZWVuIGxvZ2dlZCBvbiB0aGUgd2Vic2l0ZSBhbmQgdGhl
IG1haWxpbmcgbGlzdC4NCkJ1ZyByZWZlcmVuY2U6IDc5NzEuDQoNCkl0IHdv
cmtkcyBwZXJmZWN0bHkgaW4gOC54IGJ1dCBub3QgaW4gOS41Lg0KDQpJJ3Zl
IGZvdW5kIHRoaXMgYnVnIGhhcyBhbHJlYWR5IGJlZW4gZGlzY3Vzc2VkIGlu
IDIgdGhyZWFkcyB3aXRoIG5vIHJlc3VsdAo6DQpodHRwOi8vd3d3LnBvc3Rn
cmVzcWwub3JnL21lc3NhZ2UtaWQvQkFZMTUyLVc1MUM4NDUxMjQxNTUwNjRE
NTNGM0NBOEE4QzBAcGh4LmdibA0KaHR0cDovL3d3dy5wb3N0Z3Jlc3FsLm9y
Zy9tZXNzYWdlLWlkL0UxVUh5VXctMDAwMW9qLUhFQHdyaWdsZXlzLnBvc3Rn
cmVzcWwub3JnDQoNClNFTEVDVCB1bm5lc3QoeHBhdGgoJy9uYW1lL3RleHQo
KScsIHhtbGVsZW1lbnQobmFtZSBuYW1lLCAnQVQmVCcsIG51bGwgKSkpDQpU
aGlzIGdpdmVzIG1lICdBVCZhbXA7VCcgYW5kDQpJIGhhdmUgTk8gV0FZIGlu
c2lkZSBwZ3NxbCB0byBnZXQgJ0F0JnQnIHZhbHVlIA0KDQoNCklzIHRoZXJl
IGFueSBzdGF0dXMvcHJvZ3Jlc3MgYWJvdXQgdGhpcyBidWc/Cgo=
onic@live.fr writes:
> SELECT unnest(xpath('/name/text()', xmlelement(name name, 'AT&T', null )))
> This gives me 'AT&T' and

AFAICS, that behavior is correct.  xpath returns a value of type xml
(well, really xml[]) and 'AT&T' is not a legal value of that type;
only 'AT&T' is.

> I have NO WAY inside pgsql to get 'At&t' value

I agree that there ought to be an "unescape" function that would convert
this back to 'AT&T', but the lack of one seems like a missing feature not
a bug.  I'd wonder for example what an unescape function ought to do with
other markup such as '<name>'.

The last concrete discussion we had on this seems to have been
this thread:

http://www.postgresql.org/message-id/flat/C71079E6-12D8-4048-B8C5-18368936FD5D@phlo.org

which petered out for lack of anyone finding the time to investigate
the relevant standards.

            regards, tom lane

Re: BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath

От
"David G. Johnston"
Дата:
On Thu, May 19, 2016 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> onic@live.fr writes:
> > SELECT unnest(xpath('/name/text()', xmlelement(name name, 'AT&T', null
> )))
> > This gives me 'AT&T' and
>
> AFAICS, that behavior is correct.  xpath returns a value of type xml
> (well, really xml[]) and 'AT&T' is not a legal value of that type;
> only 'AT&T' is.
>
> > I have NO WAY inside pgsql to get 'At&t' value
>
> I agree that there ought to be an "unescape" function that would convert
> this back to 'AT&T', but the lack of one seems like a missing feature not
> a bug.  I'd wonder for example what an unescape function ought to do with
> other markup such as '<name>'.
>
> The last concrete discussion we had on this seems to have been
> this thread:
>

>
> http://www.postgresql.org/message-id/flat/C71079E6-12D8-4048-B8C5-1836893=
6FD5D@phlo.org
>
> which petered out for lack of anyone finding the time to investigate
> the relevant standards.
>

This
2013
=E2=80=8B thread further discusses our=E2=80=8B existing problematic behavi=
or.

http://www.postgresql.org/message-id/flat/25508.1383590668@sss.pgh.pa.us#25=
508.1383590668@sss.pgh.pa.us

I don't recall anything discussed in depth since then - just a bug report
or two from users.

http://www.postgresql.org/message-id/CALr6pkhSe20gh5Hci1H=3DuT_7QE4av0m9h2e=
QMjqUX6D6AD9H1Q@mail.gmail.com

A basic entity decoder is fairly simply to write and doesn't require C code
- a single SQL function using replace will likely suffice.

The 2013 post I linked shows a potentially deeper consideration that we
need to take into account.

I'd challenge any community members who really want to see this get fixed
at least start things off by generating a set of queries and expected
outputs that can be commented upon and documented as being the desired
behavior with respect to both internal and external entities.  At least
then we'd have clearly defined what the current and expected behaviors
are.  With that in hand there is at least hope that someone less familiar
with xpath/xml generally, but familiar with PostgreSQL internals, would be
willing to take on the task of writing a patch that causes the tests to
pass.  The existing test queries and resultant .out files should be
reasonably accessible even for someone not familiar with the code.

I'm inclined to think, however, that we will end up wanting an
"xpath_text(...)" function that returns text instead of xml.  The
underlying problem here is that the xpath standard allows for different
kinds of outputs - which depend upon the xpath expression - while
PostgreSQL only allows for xml regardless of the underlying expression.
This dichotomy seems likely to be impossible to overcome without
introducing a new function.

Please, help us scratch your itch.

David J.