Re: XML with invalid chars
От | Andrew Dunstan |
---|---|
Тема | Re: XML with invalid chars |
Дата | |
Msg-id | 4DB8DD7D.3070905@dunslane.net обсуждение исходный текст |
Ответ на | Re: XML with invalid chars (Noah Misch <noah@leadboat.com>) |
Ответы |
Re: XML with invalid chars
(Noah Misch <noah@leadboat.com>)
|
Список | pgsql-hackers |
On 04/27/2011 05:30 PM, Noah Misch wrote: > >> I'm not sure what to do about the back branches and cases where data is >> already in databases. This is fairly ugly. Suggestions welcome. > We could provide a script in (or linked from) the release notes for testing the > data in all your xml columns. Yeah, we'll have to do something like that. What a blasted mess, > To make things worse, the dump/reload problems seems to depend on your version > of libxml2, or something. With git master, a CentOS 5 system with > 2.6.26-2.1.2.8.el5_5.1 accepts the ^A byte, but an Ubuntu 8.04 LTS system with > 2.6.31.dfsg-2ubuntu rejects it. Even with a patch like this, systems with a > lenient libxml2 will be liable to store XML data that won't restore on a system > with a strict libxml2. Perhaps we should emit a build-time warning if the local > libxml2 is lenient? No, I think we need to be strict ourselves. >> + if (*p< '\x20') > This needs to be an unsigned comparison. On my system, "char" is signed, so > "SELECT xmlelement(name foo, null, E'\u0550')" fails incorrectly. Good point. Perhaps we'd be better off using iscntrl(*p). > The XML character set forbids more than just control characters; see > http://www.w3.org/TR/xml/#charsets. We also ought to reject, for example, > "SELECT xmlelement(name foo, null, E'\ufffe')". > > Injecting the check here aids "xmlelement" and "xmlforest" , but "xmlcomment" > and "xmlpi" still let the invalid byte through. You can also still inject the > byte into an attribute value via "xmlelement". I wonder if it wouldn't make > more sense to just pass any XML that we generate from scratch through libxml2. > There are a lot of holes to plug, otherwise. > Maybe there are, but I'd want lots of convincing that we should do that at this stage. Maybe for 9.2. I think we can plug the holes fairly simply for xmlpi and xmlcomment, and catch the attributes by moving this check up into map_sql_value_to_xml_value(). This is a significant data integrity bug, much along the same lines as the invalidly encoded data holes we plugged a release or two back. I'm amazed we haven't hit it till now, but we're sure to see more of it - XML use with Postgres is growing substantially, I believe. cheers andrew
В списке pgsql-hackers по дате отправления: