Re: [HACKERS] possible encoding issues with libxml2 functions

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: [HACKERS] possible encoding issues with libxml2 functions
Дата
Msg-id CAFj8pRA9N9kd2ZqsH3oieonPB5JeVPgHRR6XDTfhro9x69ftVg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] possible encoding issues with libxml2 functions  (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы Re: [HACKERS] possible encoding issues with libxml2 functions  (Pavel Stehule <pavel.stehule@gmail.com>)
Список pgsql-hackers


Isn't the most correct solution to call xml_parse function?

I am reply to self. Probably not. 

Now, I am thinking so I found a reason of this issue. The document processed in xpath_internal is passed to libXML2 by 

    doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);

We don't pass a encoding parameter so libXML2 expecting "UTF8" or expecting correct encoding decl in XML document. When we pass incorrect document - XML is in database encoding, but encoding decl is original, then it should to fail.

the regress test can looks like your (but all chars are valid there)

postgres=# do $$ 
declare str text;
begin
  if current_setting('server_encoding') <> 'UTF8' then return; end if;
  str = '<?xml version="1.0" encoding="windows-1250"?><enprimeur><vino><id>909</id><remark>'
          || convert_from('\xc588', 'UTF8')
          || '</remark></vino></enprimeur>';
  raise notice '%', xpath('/enprimeur/vino/id', str::xml);
end; $$;
ERROR:  could not parse XML document
DETAIL:  input conversion failed due to input error, bytes 0x88 0x3C 0x2F 0x72
line 1: switching encoding: encoder error
�</remark></vino></enprimeur>
                             ^
CONTEXT:  PL/pgSQL function inline_code_block line 8 at RAISE

After correct fix:

        doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
                                pg_encoding_to_char(GetDatabaseEncoding()), 0);

It is working

postgres=# do $$ 
declare str text;
begin
  if current_setting('server_encoding') <> 'UTF8' then return; end if;
  str = '<?xml version="1.0" encoding="windows-1250"?><enprimeur><vino><id>909</id><remark>'
          || convert_from('\xc588', 'UTF8')
          || '</remark></vino></enprimeur>';
  raise notice '%', xpath('/enprimeur/vino/id', str::xml);
end; $$;
NOTICE:  {<id>909</id>}
DO

This fix should be apply to xmltable function too. 

patch attached

It doesn't fix xpath and xmltable functions issues when server encoding is not UTF8. Looks so XPath functions from libXML2 requires UTF8 encoded strings and the result is in UTF8 too - so result should be recoded to server encoding.

I didn't find any info how to enable libXML2 XPath functions for other encoding than UTF8 :( ??

Regards

Pavel



Regards

Pavel

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] Page Scan Mode in Hash Index
Следующее
От: Ildar Musin
Дата:
Сообщение: Re: [HACKERS] Proposal: global index