Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context

Поиск
Список
Период
Сортировка
От Markus Winand
Тема Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated with wrong context
Дата
Msg-id 8BDB0627-2105-4564-AA76-7849F028B96E@winand.at
обсуждение исходный текст
Ответ на Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated withwrong context  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: XML/XPath issues: text/CDATA in XMLTABLE, XPath evaluated withwrong context  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
Hi!

> I have pushed the patch now (in your original form rather than my later
> formulation) -- let's see how the buildfarm likes it.  There are (at
> least) three issues remaining, as per below; Pavel, do you have any
> insight on these?


Attached patch 0001 fixes all three issues and one more that is, again, 
an incompatible change (see below).



> On 2018-06-21, at 22:27 , Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> First one is about array indexes not working sanely (I couldn't get this
> to work in Oracle)
>
>>> Also, array indexes behave funny.  First let's add more XML comments
>>> inside that number, and query for the subscripts:
>>>
>>> update xmldata set data = regexp_replace(data::text, '7<!--small country-->91',
'<!--ah-->7<!--oh-->9<!--uh-->1')::xml;
>>>
>>> SELECT  xmltable.*
>>>  FROM (SELECT data FROM xmldata) x,
>>>       LATERAL XMLTABLE('/ROWS/ROW'
>>>                        PASSING data
>>>                        COLUMNS
>>>                                 country_name text PATH 'COUNTRY_NAME/text()' NOT NULL,
>>>                                 size_text float PATH 'SIZE/text()',
>>>                                 size_text_1 float PATH 'SIZE/text()[1]',
>>>                                 size_text_2 float PATH 'SIZE/text()[2]',
>>>                                 "SIZE" float, size_xml xml PATH 'SIZE')
>>> where size_text is not null;
>>>
>>> country_name │ size_text │ size_text_1 │ size_text_2 │ size_text_3 │ SIZE │                       size_xml
              
>>>
──────────────┼───────────┼─────────────┼─────────────┼─────────────┼──────┼───────────────────────────────────────────────────────
>>> Singapore    │       791 │         791 │          91 │           1 │  791 │ <SIZE
unit="km"><!--ah-->7<!--oh-->9<!--uh-->1</SIZE>
>>> (1 fila)

The concatenation was caused by the use of xmlNodeListGetString, which I have
completely eliminated because it was even wrong for element nodes.

As per my understanding (and how Oracle and DB2 do it), text nodes contained
in nested elements must also be included in the string representation. This is an
incompatible change that needs adoption of an regression test:

SELECT * FROM xmltable('/root' passing '<root><element>a1a<!-- aaaa -->a2a<?aaaaa?> <!--z-->
bbbb<x>xxx</x>cccc</element></root>'COLUMNS element text); 
      element
--------------------
a1aa2a   bbbbcccc

This should actually include the ‘xxx’ from the nested <x> element as well.

As I could not find a libxml function that does this, I’ve introduced a new one
(xml_xmlnodetostring) in the vein of libxml’s xmlNodeListGetString.

> The second one is about (lack of!) processing instructions and comments:
>
>> Also, node() matching comments or processing instructions
>> seems to be broken too:
>>
>> SELECT *
>> FROM (VALUES ('<xml><!--comment--></xml>'::xml)
>>            , ('<xml><?pi content?></xml>'::xml)
>>      ) d(x)
>> CROSS JOIN LATERAL
>>       XMLTABLE('/xml'
>>                PASSING x
>>                COLUMNS "node()" TEXT PATH 'node()'
>>               ) t
>>
>>             x             | node()
>> ---------------------------+--------
>> <xml><!--comment--></xml> |
>> <xml><?pi content?></xml> |
>> (2 rows)
>>

This was easy to fix: PI’s and comments contain their data in their own content,
not in a child.


> The third issue is the way we output comments when they're in a column
> of type XML:
>
>>> Note what happens if I change the type from text to xml in that
>>> column:
>>>
>>> SELECT *
>>> FROM (VALUES ('<xml>te<!-- ahoy -->xt</xml>'::xml)
>>>            , ('<xml><![CDATA[some <!-- really --> weird <stuff>]]></xml>'::xml)
>>>      ) d(x)
>>> CROSS JOIN LATERAL
>>>       XMLTABLE('/xml'
>>>                PASSING x
>>>                COLUMNS "node()" xml PATH 'node()'
>>>               ) t;
>>>
>>>                            x                             │                     node()
>>> ───────────────────────────────────────────────────────────┼────────────────────────────────────────────────
>>> <xml>te<!-- ahoy -->xt</xml>                              │ te ahoy xt
>>> <xml><![CDATA[some <!-- really --> weird <stuff>]]></xml> │ some <!-- really --> weird <stuff>
>>> (2 filas)

I included comments, CDATA sections and PIs into the list of nodes that need to use libxml's xmlNodeDump.

The 0001 patch also removes the distinction between the case of
"count == 1 && typid == XMLOID” and "count > 1" because I don’t see any reason
for two different code paths there (except that one string copy is saved in
the count == 1 case).

The 0001 patch also adds regression tests for all three issues.

There is a second patch, for which I don’t have strong opinions, that eliminates
the use of xmlStrdup in two cases. As far as I can see, the xmlStrdup is only there
so the xmlFree at the end can be done unconditionally. I found it disturbing to do
a strdup for the sole purpose it can be free-ed again. However, the patch is not a beauty,
just don’t apply if there is any doubt.

-markus


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: hot_standby_feedback vs excludeVacuum and snapshots
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: cost_sort() improvements