Strange output of XML attribute values

Поиск
Список
Период
Сортировка
От Andrew Marynchuk (Андрей Маринчук)
Тема Strange output of XML attribute values
Дата
Msg-id CAJt8d+D3xe6bPJz7W7Acrwtrturpm+VygrnBQe4r0VQGTeYCpQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Strange output of XML attribute values  (Pavel Stehule <pavel.stehule@gmail.com>)
Список pgsql-bugs
This problem is quite old, but it leads to the inability to use XML generation functions in PostgreSQL database for some cases, or at least requires to perform subsequent parsing and regenerating XML by an external utility. It reproduces in PostgreSQL 12.4, compiled by Visual C++ build 1914, 64-bit (windows 10), but I've seen the same problem in 9.6 build from CentOS yum package.

How to reproduce:
Just execute the query (actually the xmlelement call is enough to reproduce the proble):
select xmlserialize(document xmlroot(xmlelement(name "ЭлементВКириллице", xmlattributes('ЗначениеВКириллице' as "АтрибутВКириллице"), 'ТекстВКириллице'), version '1.0', standalone yes) as text);

Expected result:
<?xml version="1.0" standalone="yes"?><ЭлементВКириллице АтрибутВКириллице="ЗначениеВКириллице">ТекстВКириллице</ЭлементВКириллице>

Actual result:
<?xml version="1.0" standalone="yes"?><ЭлементВКириллице АтрибутВКириллице="&#x417;&#x43D;&#x430;&#x447;&#x435;&#x43D;&#x438;&#x435;&#x412;&#x41A;&#x438;&#x440;&#x438;&#x43B;&#x43B;&#x438;&#x446;&#x435;">ТекстВКириллице</ЭлементВКириллице>

This example uses cyrillic letters, but it could be any non-ASCII character.
According to the discussion, this problem arises because PostgreSQL does not provides libxml2 an information of document encoding due to the lack of xmlTextWriterStartDocument call, so libxml2 has no idea that encoding is UTF-8 and non-ASCII characters could be written without converting to &#x...;-sequences.

In the modern world, UTF-8 encoding is used everywhere and such unnecessary character converting looks strange. Current workaround is passing generated content to the pl/python function which parses and writes back the xml (xml.dom.minidom.parseString(...).toxml()).

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #16619: Amcheck detects corruption in hstore' btree index (ver 2)
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: Strange output of XML attribute values