Re: xPath in a database with LATIN1 encoding

Поиск
Список
Период
Сортировка
От Holger Jakobs
Тема Re: xPath in a database with LATIN1 encoding
Дата
Msg-id caa976c4-6070-7eb8-1570-dee06b876a58@jakobs.com
обсуждение исходный текст
Ответ на Re: xPath in a database with LATIN1 encoding  (Jorge Silva <jorge.silva93@gmail.com>)
Ответы Re: xPath in a database with LATIN1 encoding
Список pgsql-admin


Am 24.04.21 um 20:20 schrieb Jorge Silva:
Thanks for the quick reply, Tom. I am trying something simpler. I am trying to find a way to run the xPath function with a xml file type, which has latin characters, such as:
SELECT xpath(‘//xml_test/text()’, convert_from(convert_to(‘<xml_test>çã</xml_test>','utf-8’),'utf-8')::xml)

This line returns the same error, as follows:
[Code: 0, SQL State: 2200M]  ERROR: could not parse XML document
  Detail: line 1: Input is not proper UTF-8, indicate encoding !
Bytes: 0xE7 0xE3 0x3C 0x2F
<xml_test>çã</xml_test>
          ^
This happens because the “text”, which is the output from convert_from() function is encoded with LATIN1, the database encoding set, and not UTF-8.  Is there a way that a text variable is not encoded as the database encoding set, but some other encoding set, such as UTF-8? From what I’ve searched for, it seems that it would be something similar to nvarchar that only exists in Microsoft SQL Server.



On 24 Apr 2021, at 13:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jorge Silva <jorge.silva93@gmail.com> writes:
The characters that it is not recognizing are both “ç” and “ã” because I think they are encoded differently in latin1 and utf-8. Is it possible to somehow use the xPath function with special characters in the XML and in a database which is not encoded with utf-8?

I don't have a lot of expertise in this area, but I think you need
an explicit encoding indicator in the xml header, a la

   <?xml encoding="latin1"?> ...

On the whole, the xml type is definitely easier to use with database
encoding set to utf8.  I think you'll be paying for encoding conversion
every time we interact with libxml, for instance.

regards, tom lane

The point is that the XML standard dictates that every document without an explicit encoding, must be encodied in UTF-8. So, whenever you want to use a different encoding, this has to be stated in the XML header, as indicated in Tom Lane's answer.

So the encoding of XML columns isn't actually the one of the database (although UTF-8 is best in any case), but simply is the XML standard. Your client encoding may vary, though, but this would limit the characters which can be transferred.

Regards,

Holger


-- 
Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
Вложения

В списке pgsql-admin по дате отправления:

Предыдущее
От: Jorge Silva
Дата:
Сообщение: Re: xPath in a database with LATIN1 encoding
Следующее
От: adnan salam
Дата:
Сообщение: Facing issue while upgrading from 11 ro 13.2