Re: [HACKERS] possible encoding issues with libxml2 functions

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: [HACKERS] possible encoding issues with libxml2 functions
Дата
Msg-id CAFj8pRAECZC9v==YOFi2bpgcxDvMmpe80oC90Z9Ches41xOiNw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] possible encoding issues with libxml2 functions  (Noah Misch <noah@leadboat.com>)
Ответы Re: [HACKERS] possible encoding issues with libxml2 functions  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
Hi


xpath-bugfix.patch affected only xml values containing an xml declaration with
"encoding" attribute.  In UTF8 databases, this latest proposal
(xpath-parsing-error-fix.patch) is equivalent to xpath-bugfix.patch.  In
non-UTF8 databases, xpath-parsing-error-fix.patch affects all xml values
containing non-ASCII data.  In a LATIN1 database, the following works today
but breaks under your latest proposal:

  SELECT xpath('text()', ('<x>' || convert_from('\xc2b0', 'LATIN1') || '</x>')::xml);

I don't understand, why it should not work?
 

It's acceptable to break that, since the documentation explicitly disclaims
support for it.  xpath-bugfix.patch breaks different use cases, which are
likewise acceptable to break.  See my 2017-08-08 review for details.

We have xpath-bugfix.patch and xpath-parsing-error-fix.patch.  Both are
equivalent under supported use cases (xpath in UTF8 databases).  Among
non-supported use cases, they each make different things better and different
things worse.  We should prefer to back-patch the version harming fewer
applications.  I expect non-ASCII data is more common than xml declarations
with "encoding" attribute, so xpath-bugfix.patch will harm fewer applications.

Having said that, I now see a third option.  Condition this thread's patch's
effects on GetDatabaseEncoding()==PG_UTF8.  That way, we fix supported cases,
and we remain bug-compatible in unsupported cases.  I think that's better than
the other options discussed so far.  If you agree, please send a patch based
on xpath-bugfix.patch with the GetDatabaseEncoding()==PG_UTF8 change and the
two edits I described earlier.

I am sorry -  too long day today. Do you think some like

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 24229c2dff..9fd6f3509f 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3914,7 +3914,14 @@ xpath_internal(text *xpath_expr_text, xmltype *data, ArrayType *namespaces,
        if (ctxt == NULL || xmlerrcxt->err_occurred)
            xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
                        "could not allocate parser context");
-       doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL, 0);
+
+       /*
+        * Passed XML is always in server encoding. When server encoding
+        * is UTF8, we can pass this information to libxml2 to ignore
+        * possible invalid encoding declaration in XML document.
+        */
+       doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
+               GetDatabaseEncoding() == PG_UTF8 ? "UTF-8" : NULL, 0);
        if (doc == NULL || xmlerrcxt->err_occurred)
            xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
                        "could not parse XML document");

This fix only UTF8 servers and has no any effect on other cases

?

Regards

Pavel



Thanks,
nm

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: [HACKERS] pgbench tap tests & minor fixes
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: [HACKERS] Support for Secure Transport SSL library on macOS as OpenSSL alternative