Обсуждение: BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath
BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath
От
onic@live.fr
Дата:
VGhlIGZvbGxvd2luZyBidWcgaGFzIGJlZW4gbG9nZ2VkIG9uIHRoZSB3ZWJz aXRlOgoKQnVnIHJlZmVyZW5jZTogICAgICAxNDE1MQpMb2dnZWQgYnk6ICAg ICAgICAgIG9saSBuaWMKRW1haWwgYWRkcmVzczogICAgICBvbmljQGxpdmUu ZnIKUG9zdGdyZVNRTCB2ZXJzaW9uOiA5LjUuMwpPcGVyYXRpbmcgc3lzdGVt OiAgIFdpbmRvd3MKRGVzY3JpcHRpb246ICAgICAgICAKClRoZSBmb2xsb3dp bmcgYnVnIGhhcyBiZWVuIGxvZ2dlZCBvbiB0aGUgd2Vic2l0ZSBhbmQgdGhl IG1haWxpbmcgbGlzdC4NCkJ1ZyByZWZlcmVuY2U6IDc5NzEuDQoNCkl0IHdv cmtkcyBwZXJmZWN0bHkgaW4gOC54IGJ1dCBub3QgaW4gOS41Lg0KDQpJJ3Zl IGZvdW5kIHRoaXMgYnVnIGhhcyBhbHJlYWR5IGJlZW4gZGlzY3Vzc2VkIGlu IDIgdGhyZWFkcyB3aXRoIG5vIHJlc3VsdAo6DQpodHRwOi8vd3d3LnBvc3Rn cmVzcWwub3JnL21lc3NhZ2UtaWQvQkFZMTUyLVc1MUM4NDUxMjQxNTUwNjRE NTNGM0NBOEE4QzBAcGh4LmdibA0KaHR0cDovL3d3dy5wb3N0Z3Jlc3FsLm9y Zy9tZXNzYWdlLWlkL0UxVUh5VXctMDAwMW9qLUhFQHdyaWdsZXlzLnBvc3Rn cmVzcWwub3JnDQoNClNFTEVDVCB1bm5lc3QoeHBhdGgoJy9uYW1lL3RleHQo KScsIHhtbGVsZW1lbnQobmFtZSBuYW1lLCAnQVQmVCcsIG51bGwgKSkpDQpU aGlzIGdpdmVzIG1lICdBVCZhbXA7VCcgYW5kDQpJIGhhdmUgTk8gV0FZIGlu c2lkZSBwZ3NxbCB0byBnZXQgJ0F0JnQnIHZhbHVlIA0KDQoNCklzIHRoZXJl IGFueSBzdGF0dXMvcHJvZ3Jlc3MgYWJvdXQgdGhpcyBidWc/Cgo=
onic@live.fr writes: > SELECT unnest(xpath('/name/text()', xmlelement(name name, 'AT&T', null ))) > This gives me 'AT&T' and AFAICS, that behavior is correct. xpath returns a value of type xml (well, really xml[]) and 'AT&T' is not a legal value of that type; only 'AT&T' is. > I have NO WAY inside pgsql to get 'At&t' value I agree that there ought to be an "unescape" function that would convert this back to 'AT&T', but the lack of one seems like a missing feature not a bug. I'd wonder for example what an unescape function ought to do with other markup such as '<name>'. The last concrete discussion we had on this seems to have been this thread: http://www.postgresql.org/message-id/flat/C71079E6-12D8-4048-B8C5-18368936FD5D@phlo.org which petered out for lack of anyone finding the time to investigate the relevant standards. regards, tom lane
Re: BUG #14151: Xml special symbols are not unescaped when gettting value of Xml via xpath
От
"David G. Johnston"
Дата:
On Thu, May 19, 2016 at 1:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > onic@live.fr writes: > > SELECT unnest(xpath('/name/text()', xmlelement(name name, 'AT&T', null > ))) > > This gives me 'AT&T' and > > AFAICS, that behavior is correct. xpath returns a value of type xml > (well, really xml[]) and 'AT&T' is not a legal value of that type; > only 'AT&T' is. > > > I have NO WAY inside pgsql to get 'At&t' value > > I agree that there ought to be an "unescape" function that would convert > this back to 'AT&T', but the lack of one seems like a missing feature not > a bug. I'd wonder for example what an unescape function ought to do with > other markup such as '<name>'. > > The last concrete discussion we had on this seems to have been > this thread: > > > http://www.postgresql.org/message-id/flat/C71079E6-12D8-4048-B8C5-1836893= 6FD5D@phlo.org > > which petered out for lack of anyone finding the time to investigate > the relevant standards. > This 2013 =E2=80=8B thread further discusses our=E2=80=8B existing problematic behavi= or. http://www.postgresql.org/message-id/flat/25508.1383590668@sss.pgh.pa.us#25= 508.1383590668@sss.pgh.pa.us I don't recall anything discussed in depth since then - just a bug report or two from users. http://www.postgresql.org/message-id/CALr6pkhSe20gh5Hci1H=3DuT_7QE4av0m9h2e= QMjqUX6D6AD9H1Q@mail.gmail.com A basic entity decoder is fairly simply to write and doesn't require C code - a single SQL function using replace will likely suffice. The 2013 post I linked shows a potentially deeper consideration that we need to take into account. I'd challenge any community members who really want to see this get fixed at least start things off by generating a set of queries and expected outputs that can be commented upon and documented as being the desired behavior with respect to both internal and external entities. At least then we'd have clearly defined what the current and expected behaviors are. With that in hand there is at least hope that someone less familiar with xpath/xml generally, but familiar with PostgreSQL internals, would be willing to take on the task of writing a patch that causes the tests to pass. The existing test queries and resultant .out files should be reasonably accessible even for someone not familiar with the code. I'm inclined to think, however, that we will end up wanting an "xpath_text(...)" function that returns text instead of xml. The underlying problem here is that the xpath standard allows for different kinds of outputs - which depend upon the xpath expression - while PostgreSQL only allows for xml regardless of the underlying expression. This dichotomy seems likely to be impossible to overcome without introducing a new function. Please, help us scratch your itch. David J.