Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS

Поиск
Список
Период
Сортировка
От Jeevan Chalke
Тема Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS
Дата
Msg-id BANLkTimJWsSxko3HU-qsGnNR4Hk8u5eHvA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Invalid byte sequence for encoding "UTF8", caused due to non wide-char-aware downcase_truncate_identifier() function on WINDOWS  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi Tom,

Issue is on Windows:

If you see in attached failure.out file, (after running failure.sql) we are getting "ERROR:  invalid
byte sequence for encoding "UTF8": 0xe59aff" error. Please note that byte
sequence we got from database is e5 9a ff, where as actual byte sequence for
the wide character '功' is e5 8a 9f.


'功'      ==> UNICODE Character
e5 8a 9f  ==> Original Byte Sequence for the given characters
e5 9a ff  ==> downcase_truncate_identifier() result, which is invalid UTF8 representation, stored in pg_catalog table

While displaying on client, we receive this invalid byte sequence which throws an error. Note that UTF8 characters have predefined character ranges for each byte which is checked in pg_utf8_islegal() function. Here is the code snippet:

==
    a = source[2];
    if (a < 0x80 || a > 0xBF)
        return false;
==
Note that source[2] = ff, which does not fall into the valid range which results in illegal UTF8 character sequence. If you carefully see the original one i.e. 9f, which falls within the range.

since we smash the identifier to lower case using downcase_truncate_identifier() function, the solution is to make this function should be wide-char aware, like LOWER() function functionality.

I see some discussion related to downcase_truncate_identifier() and wide-char aware function, but seems like we lost somewhere.
(http://archives.postgresql.org/pgsql-hackers/2010-11/msg01385.php)
This invalid byte sequence issue seems like a more serious issue, because it might lead e.g to pg_dump failures.

I have tested this on PG9.0 beta4 (one click installers), BTW, we have
observed same with earlier version as well.

Attached is the .sql and its output (run on PG9.0 beta4).

Any thoughts???

Thanks

--
Jeevan B Chalke
Senior Software Engineer, R&D
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 30589500

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: WALInsertLock tuning
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: SIREAD lock versus ACCESS EXCLUSIVE lock