Обсуждение: not valid character for Unicode

Поиск
Список
Период
Сортировка

not valid character for Unicode

От
Adam Witney
Дата:
Hi,

Im trying to upgrade from 7.4 -> 8.1 but it is failing with Unicode
errors. The offending character is the greek character mu (often used
for micro). Here is an offending string "BµG@S" (in case it doesn't
appear in the email, the mu is between the B and the G)

Any ideas why this character is not valid in Unicode?

thanks for any help

adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: not valid character for Unicode

От
Martijn van Oosterhout
Дата:
On Fri, Jun 09, 2006 at 03:59:52PM +0100, Adam Witney wrote:
>
> Hi,
>
> Im trying to upgrade from 7.4 -> 8.1 but it is failing with Unicode
> errors. The offending character is the greek character mu (often used
> for micro). Here is an offending string "BµG@S" (in case it doesn't
> appear in the email, the mu is between the B and the G)
>
> Any ideas why this character is not valid in Unicode?

It's a valid unicode character, it's just you havn't encoded it in
unicode. It's probably in Latin-1. In that case, you need to specify it
in the client encoding...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: not valid character for Unicode

От
Adam Witney
Дата:

Martijn van Oosterhout wrote:
> On Fri, Jun 09, 2006 at 03:59:52PM +0100, Adam Witney wrote:
>> Hi,
>>
>> Im trying to upgrade from 7.4 -> 8.1 but it is failing with Unicode
>> errors. The offending character is the greek character mu (often used
>> for micro). Here is an offending string "BµG@S" (in case it doesn't
>> appear in the email, the mu is between the B and the G)
>>
>> Any ideas why this character is not valid in Unicode?
>
> It's a valid unicode character, it's just you havn't encoded it in
> unicode. It's probably in Latin-1. In that case, you need to specify it
> in the client encoding...

Hi Martijn,

thanks for your quick response.

Ok i am a bit confused by all this encoding stuff... i don't really know
how to encode it in unicode? this is a text string that is extracted
from a text file, i just put it in an INSERT statement.

I have to replace fields with this in it with a valid string that will
load into 8.1, do you know who i would do the conversion?

thanks

adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: not valid character for Unicode

От
Martijn van Oosterhout
Дата:
On Fri, Jun 09, 2006 at 04:17:50PM +0100, Adam Witney wrote:
> > It's a valid unicode character, it's just you havn't encoded it in
> > unicode. It's probably in Latin-1. In that case, you need to specify it
> > in the client encoding...
>
> Hi Martijn,
>
> thanks for your quick response.
>
> Ok i am a bit confused by all this encoding stuff... i don't really know
> how to encode it in unicode? this is a text string that is extracted
> from a text file, i just put it in an INSERT statement.

The database will do the encoding for you, you just have to tell it
what encoding it is. By default it assumes you're using the same
encoding as the backend. So:

# set client_encoding='latin1';

-- Now all my strings are considered to be in latin1

# set client_encoding='sjis';

-- Now my strings are SJIS

# set client_encoding='unicode';

-- Now my strings need to be utf-8

> I have to replace fields with this in it with a valid string that will
> load into 8.1, do you know who i would do the conversion?

The database will do it for you. Note that the client encoding affects
input *and* output. So if you set it to latin1, the database will
convert all strings to latin1 before sending them to you...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: not valid character for Unicode

От
brian ally
Дата:

Adam Witney wrote:
>
> Martijn van Oosterhout wrote:
>
>>On Fri, Jun 09, 2006 at 03:59:52PM +0100, Adam Witney wrote:
>>
>>>Hi,
>>>
>>>Im trying to upgrade from 7.4 -> 8.1 but it is failing with Unicode
>>>errors. The offending character is the greek character mu (often used
>>>for micro). Here is an offending string "BµG@S" (in case it doesn't
>>>appear in the email, the mu is between the B and the G)
>>>
>>>Any ideas why this character is not valid in Unicode?
>>
>>It's a valid unicode character, it's just you havn't encoded it in
>>unicode. It's probably in Latin-1. In that case, you need to specify it
>>in the client encoding...
>
>
> Hi Martijn,
>
> thanks for your quick response.
>
> Ok i am a bit confused by all this encoding stuff... i don't really know
> how to encode it in unicode? this is a text string that is extracted
> from a text file, i just put it in an INSERT statement.
>
> I have to replace fields with this in it with a valid string that will
> load into 8.1, do you know who i would do the conversion?
>

What did you use to extract it from the text file? If you're using some
text editor, ensure that it is set to UTF-8.

brian


Re: not valid character for Unicode

От
"A.M."
Дата:
On Fri, June 9, 2006 11:17 am, Adam Witney wrote:
>

>
> Martijn van Oosterhout wrote:
>
>> On Fri, Jun 09, 2006 at 03:59:52PM +0100, Adam Witney wrote:
>>
>>> Hi,
>>>
>>>
>>> Im trying to upgrade from 7.4 -> 8.1 but it is failing with Unicode
>>> errors. The offending character is the greek character mu (often used
>>> for micro). Here is an offending string "BµG@S" (in case it doesn't
>>> appear in the email, the mu is between the B and the G)
>>>
>>> Any ideas why this character is not valid in Unicode?
>>>
>>
>> It's a valid unicode character, it's just you havn't encoded it in
>> unicode. It's probably in Latin-1. In that case, you need to specify it
>> in the client encoding...
>
> Hi Martijn,
>
>
> thanks for your quick response.
>
> Ok i am a bit confused by all this encoding stuff... i don't really know
> how to encode it in unicode? this is a text string that is extracted from a
> text file, i just put it in an INSERT statement.
>
> I have to replace fields with this in it with a valid string that will
> load into 8.1, do you know who i would do the conversion?

For migration, you should pg_dump- it's not clear from your email whether
you are doing that. If you typed up some sql in Windows which you want to
load into postgres, you might try:
set client_encoding to 'LATIN1';
at the top of your script.

-M


Re: not valid character for Unicode

От
Adam Witney
Дата:

>> I have to replace fields with this in it with a valid string that will
>> load into 8.1, do you know who i would do the conversion?
>
> The database will do it for you. Note that the client encoding affects
> input *and* output. So if you set it to latin1, the database will
> convert all strings to latin1 before sending them to you...

ok, so my current database (7.4.12) is UNICODE, but from psql when i run
this

show client_encoding;
 client_encoding
-----------------
 UNICODE

SELECT identifier from dba_data_base where bioassay_id = 1291 and
identifier ilike '%G@S%';
  identifier
--------------
 BG@S (0A11)

so the mu chatacter is not showing up. So im not sure if the database is
converting the output?

(sorry, i am probably sounding very dim here!)

thanks again for your help

adam







--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: not valid character for Unicode

От
Adam Witney
Дата:

> For migration, you should pg_dump- it's not clear from your email whether
> you are doing that. If you typed up some sql in Windows which you want to
> load into postgres, you might try:
> set client_encoding to 'LATIN1';
> at the top of your script.

yes this was how i spotted the problem. If i pg_dump from 7.4 and then
try to load into 8.1 these characters cause errors.

This data was generated on windows though as you say

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: not valid character for Unicode

От
Martijn van Oosterhout
Дата:
On Fri, Jun 09, 2006 at 04:32:35PM +0100, Adam Witney wrote:
> > The database will do it for you. Note that the client encoding affects
> > input *and* output. So if you set it to latin1, the database will
> > convert all strings to latin1 before sending them to you...
>
> ok, so my current database (7.4.12) is UNICODE, but from psql when i run
> this

<snip>

> SELECT identifier from dba_data_base where bioassay_id = 1291 and
> identifier ilike '%G@S%';
>   identifier
> --------------
>  BG@S (0A11)
>
> so the mu chatacter is not showing up. So im not sure if the database is
> converting the output?

Is the character actually there? Do a length(identifier) on it to see
how many characters there are. When doing an interactive session it's
important that the client_encoding matches your display, otherwise you
might find it dropping characters or messing up in other ways.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: not valid character for Unicode

От
Adam Witney
Дата:

Martijn van Oosterhout wrote:
> On Fri, Jun 09, 2006 at 04:32:35PM +0100, Adam Witney wrote:
>>> The database will do it for you. Note that the client encoding affects
>>> input *and* output. So if you set it to latin1, the database will
>>> convert all strings to latin1 before sending them to you...
>> ok, so my current database (7.4.12) is UNICODE, but from psql when i run
>> this
>
> <snip>
>
>> SELECT identifier from dba_data_base where bioassay_id = 1291 and
>> identifier ilike '%G@S%';
>>   identifier
>> --------------
>>  BG@S (0A11)
>>
>> so the mu chatacter is not showing up. So im not sure if the database is
>> converting the output?
>
> Is the character actually there? Do a length(identifier) on it to see
> how many characters there are. When doing an interactive session it's
> important that the client_encoding matches your display, otherwise you
> might find it dropping characters or messing up in other ways.

yep it is there, when i display the data from the application (PHP) it
shows the character on the web page. Also this causes errors when i dump
from 7.4 and try to load into 8.1 (i've read that the UNICODE checking
became more stringent in 8)

so basically 8.1 won't accept this character... im just not entirely
sure what to do about that?

thanks again for your help

adam

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: not valid character for Unicode

От
Jorge Godoy
Дата:
Em Sábado 10 Junho 2006 05:31, Adam Witney escreveu:
> yep it is there, when i display the data from the application (PHP) it
> shows the character on the web page. Also this causes errors when i dump
> from 7.4 and try to load into 8.1 (i've read that the UNICODE checking
> became more stringent in 8)
>
> so basically 8.1 won't accept this character... im just not entirely
> sure what to do about that?

Are you on a Unix/Linux machine?  You can dump the file there and run "file
dump.sql" to see what type of file it reports.  If it says something other
than a string containing "text" and "utf-8", then you can edit the dump
manually and set the client encoding to whatever it is reported and try
restoring it or you can run "iconv" on the file and see if the conversion to
utf-8 works.

--
Jorge Godoy      <jgodoy@gmail.com>