Обсуждение: I cannot insert bengali character in UTF8

Поиск
Список
Период
Сортировка

I cannot insert bengali character in UTF8

От
AI Rumman
Дата:
I am using database with UTF8 and LC_CTYPE set as default value in Postgresql 9.1.
But I cannot insert bengali character in a column.

Query Failed:INSERT into tracker (user_id, module_name, item_id, item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e

Item_summary is a text type column and we can insert japanese character in this field.

Could anybody let me know what is the problem here?

Re: I cannot insert bengali character in UTF8

От
Peter Geoghegan
Дата:
On 20 July 2012 11:30, AI Rumman <rummandba@gmail.com> wrote:
> I am using database with UTF8 and LC_CTYPE set as default value in
> Postgresql 9.1.
> But I cannot insert bengali character in a column.
>
> Query Failed:INSERT into tracker (user_id, module_name, item_id,
> item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
> error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e
>
> Item_summary is a text type column and we can insert japanese character in
> this field.
>
> Could anybody let me know what is the problem here?

Maybe they're not valid Bengali characters? Did you do an
encoding-naive truncation at some point?

My mail client cannot display the latter few characters before the
ellipsis, but can display the first few fine.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: I cannot insert bengali character in UTF8

От
Christian Ullrich
Дата:
* AI Rumman wrote:

> I am using database with UTF8 and LC_CTYPE set as default value in
> Postgresql 9.1.
> But I cannot insert bengali character in a column.
>
> Query Failed:INSERT into tracker (user_id, module_name, item_id,
> item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
> error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e

E0 A6 2E is not valid UTF-8: 11100000 10100110 00101110

The lead byte indicates that the codepoint consists of three bytes,
but only the very next byte is a trail byte (10......). The third
byte is a single character, a period ("."), to be exact.

Setting the MSB on the third byte gives us

11100000 10100110 10101110 = E0 A6 AE

, which is a valid UTF-8 encoding of U+09AE BENGALI LETTER MA.

Check your input data.

--
Christian



Re: I cannot insert bengali character in UTF8

От
AI Rumman
Дата:
WOW. Great informative answer. Thanks.

On Fri, Jul 20, 2012 at 7:11 PM, Christian Ullrich <chris@chrullrich.net> wrote:
* AI Rumman wrote:

I am using database with UTF8 and LC_CTYPE set as default value in
Postgresql 9.1.
But I cannot insert bengali character in a column.

Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e

E0 A6 2E is not valid UTF-8: 11100000 10100110 00101110

The lead byte indicates that the codepoint consists of three bytes,
but only the very next byte is a trail byte (10......). The third
byte is a single character, a period ("."), to be exact.

Setting the MSB on the third byte gives us

11100000 10100110 10101110 = E0 A6 AE

, which is a valid UTF-8 encoding of U+09AE BENGALI LETTER MA.

Check your input data.

--
Christian