Обсуждение: I cannot insert bengali character in UTF8
I am using database with UTF8 and LC_CTYPE set as default value in Postgresql 9.1.
But I cannot insert bengali character in a column.
Query Failed:INSERT into tracker (user_id, module_name, item_id, item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e
Item_summary is a text type column and we can insert japanese character in this field.
Could anybody let me know what is the problem here?
On 20 July 2012 11:30, AI Rumman <rummandba@gmail.com> wrote: > I am using database with UTF8 and LC_CTYPE set as default value in > Postgresql 9.1. > But I cannot insert bengali character in a column. > > Query Failed:INSERT into tracker (user_id, module_name, item_id, > item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB > error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e > > Item_summary is a text type column and we can insert japanese character in > this field. > > Could anybody let me know what is the problem here? Maybe they're not valid Bengali characters? Did you do an encoding-naive truncation at some point? My mail client cannot display the latter few characters before the ellipsis, but can display the first few fine. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
* AI Rumman wrote: > I am using database with UTF8 and LC_CTYPE set as default value in > Postgresql 9.1. > But I cannot insert bengali character in a column. > > Query Failed:INSERT into tracker (user_id, module_name, item_id, > item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB > error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e E0 A6 2E is not valid UTF-8: 11100000 10100110 00101110 The lead byte indicates that the codepoint consists of three bytes, but only the very next byte is a trail byte (10......). The third byte is a single character, a period ("."), to be exact. Setting the MSB on the third byte gives us 11100000 10100110 10101110 = E0 A6 AE , which is a valid UTF-8 encoding of U+09AE BENGALI LETTER MA. Check your input data. -- Christian
WOW. Great informative answer. Thanks.
On Fri, Jul 20, 2012 at 7:11 PM, Christian Ullrich <chris@chrullrich.net> wrote:
* AI Rumman wrote:E0 A6 2E is not valid UTF-8: 11100000 10100110 00101110I am using database with UTF8 and LC_CTYPE set as default value in
Postgresql 9.1.
But I cannot insert bengali character in a column.
Query Failed:INSERT into tracker (user_id, module_name, item_id,
item_summary) values ('1','Leads','353','বাংলা টেস্��...')::ADODB
error::->ERROR: invalid byte sequence for encoding "UTF8": 0xe0a62e
The lead byte indicates that the codepoint consists of three bytes,
but only the very next byte is a trail byte (10......). The third
byte is a single character, a period ("."), to be exact.
Setting the MSB on the third byte gives us
11100000 10100110 10101110 = E0 A6 AE
, which is a valid UTF-8 encoding of U+09AE BENGALI LETTER MA.
Check your input data.
--
Christian