XML Issue with DTDs

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема XML Issue with DTDs
Дата
Msg-id 8E3B4E77-5539-431A-9E14-CAC3AD9938A3@phlo.org
обсуждение исходный текст
Ответы Re: XML Issue with DTDs
Re: XML Issue with DTDs
Список pgsql-hackers
Hi,

While looking into ways to implement a XMLSTRIP function which extracts the textual contents of an XML value and
de-escapesthem (i.e. replaces entity references by their text equivalent), I've ran into another issue with the XML
type.

XML values can either contain a DOCUMENT or CONTENT. In the first case, the value is well-formed XML according to the
XMLspecification. In the latter case, the value is a collection of nodes, each of which may contain children. Without
DTDsin the mix, CONTENT is thus a generalization of DOCUMENT, i.e. a DOCUMENT may contain only a single root node while
aCONTENT may contain multiple. That guarantees that a concatenation of two XML values is always at least valid CONTENT.
That,however, is no longer true once DTDs enter the picture. A DOCUMENT may contain a DTD as long as it precedes the
rootnode (processing instructions and comments may precede the DTD, though). Yet CONTENT may not include a DTD at all.
Aconcatenation of a DOCUMENT with a DTD and CONTENT thus yields something that is neither a DOCUMENT nor a CONTENT, yet
XMLCONCATfails to complain. The following example fails for XMLOPTION set to DOCUMENT as well as for XMLOPTION set to
CONTENT.
 select xmlconcat(   xmlparse(document '<!DOCTYPE test [<!ELEMENT test EMPTY>]><test/>'),   xmlparse(content '<test/>')
)::text::xml;

Solving this seems a bit messy, unfortunately. First, I think we need to have some XMLOPTION value which is a superset
ofall the others - otherwise, dump & restore won't work reliably. That means either allowing DTDs if XMLOPTION is
CONTENT,or inventing a third XMLOPTION, say ANY. 

We then need to ensure that combining XML values yields something that is valid according to the most general XMLOPTION
setting.That means either  

(1) Removing the DTD from all but the first argument to XMLCONCAT, and similarly all but the first value passed to
XMLAGG

or

(2) Complaining if these values contain a DTD.

or

(3) Allowing multiple DTDs in a document if XMLOPTION is, say, ANY.

I'm not in favour of (3), since clients are unlikely to be able to process such a value. (1) matches how we currently
handleXML declarations (<?xml …?>), so I'm slightly in favour of that. 

Thoughts?

best regards,
Florian Pflug




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: clang's -Wmissing-variable-declarations shows some shoddy programming
Следующее
От: Gregory Smith
Дата:
Сообщение: Re: gaussian distribution pgbench