Обсуждение: XMLEXISTS on legacy XML with malformed xmlns

Поиск
Список
Период
Сортировка

XMLEXISTS on legacy XML with malformed xmlns

От
Edson Richter
Дата:
Hi!

I've some (about 1M records) containing legacy XML I would like to parse
and apply XMLEXISTS.

This is the query:

select * from xmllog
  where xpath_exists(('//MyDocument[@DocNum = ''000411828'']'::text),
xmlparse(document cdataout));

This is the error:

ERRO: could not parse XML document
SQL state: 2200M
Detail: line 2: xmlns:leg: 'LEGACYAPP - SEND MSG EVENTS ABOUT' is not a
valid URI
<leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
^

This is the sample XML with malformed xmlns (I've shortenet the data,
but the important thing here is the malformed xmlns):

"<?xml version="1.0" encoding="utf-8" ?>
<leg:sendmsgeventsabout  xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
   <carrier xmlns="" controlnum="04503660000146">
     <MyDocument DocNum="000511852">
       <other_info>0</other_info>
       <complement info (...)"


I can easly read this XML in Notepad++, and also in Java - but
PostgreSQL always throw error.

Can you plase tell me how can make PostgreSQL ignore this malformed
xmlns and proceed processing the XML?

Thanks,

--
Atenciosamente,

Edson Carlos Ericksson Richter



Re: XMLEXISTS on legacy XML with malformed xmlns

От
Pavel Stehule
Дата:


2016-02-12 1:53 GMT+01:00 Edson Richter <edsonrichter@hotmail.com>:
Hi!

I've some (about 1M records) containing legacy XML I would like to parse and apply XMLEXISTS.

This is the query:

select * from xmllog
 where xpath_exists(('//MyDocument[@DocNum = ''000411828'']'::text), xmlparse(document cdataout));

This is the error:

ERRO: could not parse XML document
SQL state: 2200M
Detail: line 2: xmlns:leg: 'LEGACYAPP - SEND MSG EVENTS ABOUT' is not a valid URI
<leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
^

This is the sample XML with malformed xmlns (I've shortenet the data, but the important thing here is the malformed xmlns):

"<?xml version="1.0" encoding="utf-8" ?>
<leg:sendmsgeventsabout  xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
  <carrier xmlns="" controlnum="04503660000146">
    <MyDocument DocNum="000511852">
      <other_info>0</other_info>
      <complement info (...)"


I can easly read this XML in Notepad++, and also in Java - but PostgreSQL always throw error.

Can you plase tell me how can make PostgreSQL ignore this malformed xmlns and proceed processing the XML?

PostgreSQL uses libxml2, but the usage isn't too configurable. So my advice is using defensive strategy and clean/fix wrong namespace with string tools - replace function.

Regards

Pavel
 

Thanks,

--
Atenciosamente,

Edson Carlos Ericksson Richter



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: XMLEXISTS on legacy XML with malformed xmlns

От
Edson Richter
Дата:

2016-02-12 1:53 GMT+01:00 Edson Richter <edsonrichter@hotmail.com>:
Hi!

I've some (about 1M records) containing legacy XML I would like to parse and apply XMLEXISTS.

This is the query:

select * from xmllog
 where xpath_exists(('//MyDocument[@DocNum = ''000411828'']'::text), xmlparse(document cdataout));

This is the error:

ERRO: could not parse XML document
SQL state: 2200M
Detail: line 2: xmlns:leg: 'LEGACYAPP - SEND MSG EVENTS ABOUT' is not a valid URI
<leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
^

This is the sample XML with malformed xmlns (I've shortenet the data, but the important thing here is the malformed xmlns):

"<?xml version="1.0" encoding="utf-8" ?>
<leg:sendmsgeventsabout  xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
  <carrier xmlns="" controlnum="04503660000146">
    <MyDocument DocNum="000511852">
      <other_info>0</other_info>
      <complement info (...)"


I can easly read this XML in Notepad++, and also in Java - but PostgreSQL always throw error.

Can you plase tell me how can make PostgreSQL ignore this malformed xmlns and proceed processing the XML?

PostgreSQL uses libxml2, but the usage isn't too configurable. So my advice is using defensive strategy and clean/fix wrong namespace with string tools - replace function.

Regards

Pavel

Thanks, Pavel.

I did suspect that. But then I have about 10.000 new records each week, and I've no control over the system that generates it.
It is a shame, but sometimes we have to live with such problems.

Regards,

Edson Richter


Re: XMLEXISTS on legacy XML with malformed xmlns

От
Pavel Stehule
Дата:


2016-02-12 17:53 GMT+01:00 Edson Richter <edsonrichter@hotmail.com>:

2016-02-12 1:53 GMT+01:00 Edson Richter <edsonrichter@hotmail.com>:
Hi!

I've some (about 1M records) containing legacy XML I would like to parse and apply XMLEXISTS.

This is the query:

select * from xmllog
 where xpath_exists(('//MyDocument[@DocNum = ''000411828'']'::text), xmlparse(document cdataout));

This is the error:

ERRO: could not parse XML document
SQL state: 2200M
Detail: line 2: xmlns:leg: 'LEGACYAPP - SEND MSG EVENTS ABOUT' is not a valid URI
<leg:sendmsgeventsabout xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
^

This is the sample XML with malformed xmlns (I've shortenet the data, but the important thing here is the malformed xmlns):

"<?xml version="1.0" encoding="utf-8" ?>
<leg:sendmsgeventsabout  xmlns:leg="LEGACYAPP - SEND MSG EVENTS ABOUT">
  <carrier xmlns="" controlnum="04503660000146">
    <MyDocument DocNum="000511852">
      <other_info>0</other_info>
      <complement info (...)"


I can easly read this XML in Notepad++, and also in Java - but PostgreSQL always throw error.

Can you plase tell me how can make PostgreSQL ignore this malformed xmlns and proceed processing the XML?

PostgreSQL uses libxml2, but the usage isn't too configurable. So my advice is using defensive strategy and clean/fix wrong namespace with string tools - replace function.

Regards

Pavel

Thanks, Pavel.

I did suspect that. But then I have about 10.000 new records each week, and I've no control over the system that generates it.
It is a shame, but sometimes we have to live with such problems.

I understand - you can handle this error and broken xml you can ignore. Or you can use a parser from Python, Perl - PLPerlu or PLPythonu is great for it.

Regards

Pavel
 

Regards,

Edson Richter