Обсуждение: UTF8

Поиск
Список
Период
Сортировка

UTF8

От
Bakos Sandor
Дата:
Hi !

I get the following exception when I read a simple TXT file in Linux and
try to INSERT to the psql. (8.1.4)

org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding
"UTF8" has no equivalent in "LATIN2"
at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512)
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297)
at
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)

Can someone help me ?

Saca

Re: UTF8

От
Markus Schaber
Дата:
Hi, Bakos,

Bakos Sandor wrote:

> I get the following exception when I read a simple TXT file in Linux and
> try to INSERT to the psql. (8.1.4)
>
> org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding
> "UTF8" has no equivalent in "LATIN2"

This meas that your database is encoded in ISO-LATIN2 charset, and psql
is telling the server the data it sends is UTF-8. The server tries to
convert the UTF-8 Data into LATIN2, but there is a character (whose
UTF8-Sequence is 0xefbfbd) that is not contained in LATIN-2.

Either your file is latin-2 in reality (or even another charset), then
you should tell psql to use the latin-2 encoding.

Or your file really is utf-8, and really contains characters not
contained in latin-2. Then you have two possibilities: Edit the file and
replace those characters with some transcription, or convert your
database to utf-8 encoding (needs a dump&restore).

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: UTF8

От
Oliver Jowett
Дата:
Markus Schaber wrote:
> Hi, Bakos,
>
> Bakos Sandor wrote:
>
>
>>I get the following exception when I read a simple TXT file in Linux and
>>try to INSERT to the psql. (8.1.4)
>>
>>org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding
>>"UTF8" has no equivalent in "LATIN2"
>
>
> This meas that your database is encoded in ISO-LATIN2 charset, and psql
> is telling the server the data it sends is UTF-8. The server tries to
> convert the UTF-8 Data into LATIN2, but there is a character (whose
> UTF8-Sequence is 0xefbfbd) that is not contained in LATIN-2.
>
> Either your file is latin-2 in reality (or even another charset), then
> you should tell psql to use the latin-2 encoding.
>
> Or your file really is utf-8, and really contains characters not
> contained in latin-2. Then you have two possibilities: Edit the file and
> replace those characters with some transcription, or convert your
> database to utf-8 encoding (needs a dump&restore).

Actually, given that that's a Java JDBC exception, there's no 'psql'
client involved at all.

The JDBC driver always uses UTF8 as the client encoding since that maps
easily from the native Java string representation (UCS2) and every
possible Java String can be represented in UTF8. Of course, not every
possible Java string can be represented as LATIN2, which is the cause of
the error.

I would guess that the problem is probably that when *reading* the text
file originally, the wrong encoding is being used to convert the bytes
to Java Strings. If you don't use the right encoding here, then the Java
String you end up with will be garbage.

-O

Re: UTF8

От
Markus Schaber
Дата:
Hi, Oliver,


Oliver Jowett wrote:

> Actually, given that that's a Java JDBC exception, there's no 'psql'
> client involved at all.

Yes, you're right.

So I see two possibilities:

- The input encoding when reading the file into java is wrong.

- The file really contains characters that are not contained in LATIN-2.

HTH,
Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: UTF8

От
Markus Schaber
Дата:
Hi, Bakos,

Please keep the discussion on the list, so others can help or, by
reading the archives, learn.

Bakos Sandor wrote:

> I dont understand because we have a java application which work about a
> year.
> Yesterday we chancge the psql version from 7.4 to 8.1.4 and we get this
> exception.

Ah, it is an interesting information that you updated your system, you
did not tell us about this before.

I can see two possible reasons for this:

- The database encoding changed during the upgrade. (was your old
database encoded in ASCII or utf8?)

- You update the driver as well (newer pgjdbc drivers are more strict
wr/t client encodings).

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: UTF8

От
Marc Herbert
Дата:
Oliver Jowett <oliver@opencloud.com> writes:

> Markus Schaber wrote:
>> Hi, Bakos,
>> Bakos Sandor wrote:
>>
>>>I get the following exception when I read a simple TXT file in Linux and
>>>try to INSERT to the psql. (8.1.4)
>>>
>>>org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding
>>>"UTF8" has no equivalent in "LATIN2"


> I would guess that the problem is probably that when *reading* the
> text file originally, the wrong encoding is being used to convert the
> bytes to Java Strings. If you don't use the right encoding here, then
> the Java String you end up with will be garbage.

Very likely since 0xefbfbd is the... unicode "replacement character"

 http://www.fileformat.info/info/unicode/char/fffd/index.htm

Try printing this file from Java for debugguing.



Re: UTF8

От
Bakos Sandor
Дата:
Hi !

I set the character encoding in the InputStreamReader in my program and
it seem this is resolve my problem.
So thx for all the help.

Saca

Marc Herbert wrote:
>Oliver Jowett <oliver@opencloud.com> writes:
>
>
>>Markus Schaber wrote:
>>
>>>Hi, Bakos,
>>>Bakos Sandor wrote:
>>>
>>>
>>>>I get the following exception when I read a simple TXT file in Linux and
>>>>try to INSERT to the psql. (8.1.4)
>>>>
>>>>org.postgresql.util.PSQLException: ERROR: character 0xefbfbd of encoding
>>>>"UTF8" has no equivalent in "LATIN2"
>>>>
>
>
>
>>I would guess that the problem is probably that when *reading* the
>>text file originally, the wrong encoding is being used to convert the
>>bytes to Java Strings. If you don't use the right encoding here, then
>>the Java String you end up with will be garbage.
>>
>
>Very likely since 0xefbfbd is the... unicode "replacement character"
>
> http://www.fileformat.info/info/unicode/char/fffd/index.htm
>
>Try printing this file from Java for debugguing.
>
>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: Don't 'kill -9' the postmaster
>
>
>