Re: Problems with writing EUC-JP/Unicode to console or file
От | Thomas O'Dowd |
---|---|
Тема | Re: Problems with writing EUC-JP/Unicode to console or file |
Дата | |
Msg-id | 1056365612.2116.308.camel@beast.uwillsee.com обсуждение исходный текст |
Ответ на | Re: Problems with writing EUC-JP/Unicode to console or file (Jean-Christian Imbeault <jc@mega-bucks.co.jp>) |
Список | pgsql-jdbc |
What encoding did you use to put the character into the database? There are some mapping problems still in postgres for some Japanese characters. It depends on which version of Java you are using and where the data is coming from etc. I'm attaching an email I wrote to hackers about this before. Looks like the same problem. Anyway, nothing to do with the driver itself. Cheers, Tom. On Mon, 2003-06-23 at 18:55, Jean-Christian Imbeault wrote: > Csaba Nagy wrote: > > I suspect that your machine's default encoding and the encoding used by > > your Java program doesn't match. > > [snip] > > >i.e. explicitly tell to your Java > > writer code what encoding to use, and explicitly tell to the editor what > > encoding to use when opening the file. Otherwise they'll use their > > default encodings, which might not match. > > Very true. I'll look up how to specify the encoding when writing to > file. I don't know that it is possible when writing to the console though. > > *But* I must point out that I am writing quite a bit of data, in > japanese, to file and the console and *all* of it come out correctly > *except* for that one character ... > > I *will* check into how to specify the encoding but I don't think that > is the problem as everything but the one character comes out out right. > And as I had said, if I hard-code the string to be printed it comes out > right ... only when the string is retrieved from the database does it > come out wrong ... > > Thanks, > > Jean-Christian Imbeault > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match -- Thomas O'Dowd - Got a keitai? Get Nooped! tom@nooper.com - http://nooper.com Hi all, One Japanese character has been causing my head to swim lately. I've finally tracked down the problem to both Java 1.3 and Postgresql. The problem character is namely: utf-16: 0x301C utf-8: 0xE3809C SJIS: 0x8160 EUC_JP: 0xA1C1 Otherwise known as the WAVE DASH character. The confusion stems from a very similar character 0xFF5E (utf-16) or 0xEFBD9E (utf-8) the FULLWIDTH TILDE. Java has just lately (1.4.1) finally fixed their mappings so that 0x301C maps correctly to both the correct SJIS and EUC-JP character. Previously (at least in 1.3.1) they mapped SJIS to 0xFF5E and EUC to 0x301C, causing all sorts of trouble. Postgresql at least picked one of the two characters namely 0xFF5E, so conversions in and out of the database to/from sjis/euc seemed to be working. Problem is when you try to view utf-8 from the database or if you read the data into java (utf-16) and try converting to euc or sjis from there. Anyway, I think postgresql needs to be fixed for this character. In my opinion what needs to be done is to change the mappings... euc-jp -> utf-8 -> euc-jp ====== ======== ====== 0xA1C1 -> 0xE3809C 0xA1C1 sjis -> utf-8 -> sjis ====== ======== ====== 0x8160 -> 0xE3809C 0x8160 As to what to do with the current mapping of 0xEFBD9E (utf-8)? It probably should be removed. Maybe you could keep the mapping back to the sjis/euc characters to help backward compatibility though. I'm not sure what is the correct approach there. If anyone can tell me how to edit the mappings under: src/backend/utils/mb/Unicode/ and rebuild postgres to use them, then I can test this out locally. Looking forward to your replies. Tom. ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org
В списке pgsql-jdbc по дате отправления: