Re: Question about xmloption and pg_restore

Поиск
Список
Период
Сортировка
От Chapman Flack
Тема Re: Question about xmloption and pg_restore
Дата
Msg-id 5BD1C44B.6040300@anastigmatix.net
обсуждение исходный текст
Ответ на Re: Question about xmloption and pg_restore  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 10/25/18 05:02, Tom Lane wrote:
> Chapman Flack <chap@anastigmatix.net> writes:
>> a difference between the 2003 SQL/XML standard (which PG implements) and
>> the later versions, which changed the data model so there really is a
>> containment relationship between 'content' and 'document'.
>> https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#XML_OPTION
> 
> See also
> https://www.postgresql.org/message-id/flat/153478795159.1302.9617586466368699403%40wrigleys.postgresql.org
> 
> It's odd that people are just reporting this now when it's been like that
> for quite a few years, but anyway we've got a problem.  Sounds like maybe
> adopting the later standards' definitions would fix it?  Although I have
> no idea how complicated that'd be.

Supporting the later standards entirely would be a commendable thing,
but honest work:

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards#Possible_ways_forward

OTOH, making the current XML parsing not fail in this particular case
(which could be viewed as adopting the later standards' relationship
of CONTENT to DOCUMENT) might just be as simple as having the current
parsing code for CONTENT detect whether the string "starts with" a
<!DOCTYPE and fall back to the existing parsing code for DOCUMENT
if it does.

... where "starts with" actually means "possibly following some
whitespace, comments, or PIs, but you can stop looking if you see
a start-element", so essentially a port to C of:

https://github.com/tada/pljava/blob/V1_5_1/pljava/src/main/java/org/postgresql/pljava/jdbc/SQLXMLImpl.java#L409

which decides whether the input should be passed straight to the DOCUMENT-
style parser or somehow treated specially to parse as CONTENT. In Java
the special treatment involves a wrapping element, in xml.c it involves
calling a different libxml2 function, xmlParseBalancedChunkMemory, but
the choice of which method to apply is the same choice.

IIRC, XML comments don't nest, so it may be that "possibly following
some whitespace, comments, or PIs" could be shown to be a regular language,
and checked with a regex. I did it the more explicit way in Java for
clarity, and because the API was there, and so I wouldn't have to think
about it.

-Chap


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marius Timmer
Дата:
Сообщение: [PATCH] pg_hba.conf : new auth option : clientcert=verify-full
Следующее
От: Hironobu SUZUKI
Дата:
Сообщение: Re: Support custom socket directory in pg_upgrade