Re: XML with invalid chars

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: XML with invalid chars
Дата
Msg-id 4DB8DD7D.3070905@dunslane.net
обсуждение исходный текст
Ответ на Re: XML with invalid chars  (Noah Misch <noah@leadboat.com>)
Ответы Re: XML with invalid chars  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers

On 04/27/2011 05:30 PM, Noah Misch wrote:
>
>> I'm not sure what to do about the back branches and cases where data is
>> already in databases. This is fairly ugly. Suggestions welcome.
> We could provide a script in (or linked from) the release notes for testing the
> data in all your xml columns.

Yeah, we'll have to do something like that. What a blasted mess,

> To make things worse, the dump/reload problems seems to depend on your version
> of libxml2, or something.  With git master, a CentOS 5 system with
> 2.6.26-2.1.2.8.el5_5.1 accepts the ^A byte, but an Ubuntu 8.04 LTS system with
> 2.6.31.dfsg-2ubuntu rejects it.  Even with a patch like this, systems with a
> lenient libxml2 will be liable to store XML data that won't restore on a system
> with a strict libxml2.  Perhaps we should emit a build-time warning if the local
> libxml2 is lenient?

No, I think we need to be strict ourselves.

>> +                 if (*p<  '\x20')
> This needs to be an unsigned comparison.  On my system, "char" is signed, so
> "SELECT xmlelement(name foo, null, E'\u0550')" fails incorrectly.

Good point. Perhaps we'd be better off using iscntrl(*p).


> The XML character set forbids more than just control characters; see
> http://www.w3.org/TR/xml/#charsets.  We also ought to reject, for example,
> "SELECT xmlelement(name foo, null, E'\ufffe')".
>
> Injecting the check here aids "xmlelement" and "xmlforest" , but "xmlcomment"
> and "xmlpi" still let the invalid byte through.  You can also still inject the
> byte into an attribute value via "xmlelement".  I wonder if it wouldn't make
> more sense to just pass any XML that we generate from scratch through libxml2.
> There are a lot of holes to plug, otherwise.
>


Maybe there are, but I'd want lots of convincing that we should do that 
at this stage. Maybe for 9.2. I think we can plug the holes fairly 
simply for xmlpi and xmlcomment, and catch the attributes by moving this 
check up into map_sql_value_to_xml_value().

This is a significant data integrity bug, much along the same lines as 
the invalidly encoded data holes we plugged a release or two back. I'm 
amazed we haven't hit it till now, but we're sure to see more of it - 
XML use with Postgres is growing substantially, I believe.

cheers

andrew


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Vlad Arkhipov
Дата:
Сообщение: Re: Predicate locking
Следующее
От: Noah Misch
Дата:
Сообщение: Re: XML with invalid chars