Обсуждение: multibyte support

Поиск
Список
Период
Сортировка

multibyte support

От
Ma Siva Kumar
Дата:
Running postgresql-7.3.2-3 which came with Red Hat 9.0.

Created a database with unicode encoding (in psql) as below:

create database leatherlink with encoding='unicode'  template=leatherlinkdb;

leatherlinkdb is an existing database with the default encoding SQL_ASCII.

When I insert Chinsese strings into the database, it is taken in and displayed
back properly. But there is an issue:

In a varchar(100) field, about 15 characters fill up the whole space. Looking
at the database entry using psql show the characters in hexadecimel values.

The documentation mentions that version 7.3 and greater have mb support by
default. How to configure the database to accept and store the multibyte
characters?



--
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,



Re: multibyte support

От
Dennis Gearon
Дата:
Ma Siva Kumar wrote:

>Running postgresql-7.3.2-3 which came with Red Hat 9.0.
>
>Created a database with unicode encoding (in psql) as below:
>
>create database leatherlink with encoding='unicode'  template=leatherlinkdb;
>
>leatherlinkdb is an existing database with the default encoding SQL_ASCII.
>
>When I insert Chinsese strings into the database, it is taken in and displayed
>back properly. But there is an issue:
>
>In a varchar(100) field, about 15 characters fill up the whole space. Looking
>at the database entry using psql show the characters in hexadecimel values.
>
>The documentation mentions that version 7.3 and greater have mb support by
>default. How to configure the database to accept and store the multibyte
>characters?
>
>
>
>
>
This is something I've been wondereing about for quite awhile - does
pgsql measure bytes or chars when using UTF for varchars. It looks like
bytes, which is counter intuitive. What are the byte codes for those 15
chars. I think the maximum UTF char's byte lenghty is either 5 or 6
bytes.. Since there are SO many chinese people in the world and Chinese
should either be popluar or getting popular in the comptuer world, I
would have though thta the UTF consotium wold have made Chinese at a
point in the tables that it only required 2,3. or 4 bytes max, and made
obtuse languages up in the 5 to 6 byte part of the table.

--
"You are behaving like a man",
is compliment from an good woman.



Re: multibyte support

От
Ma Siva Kumar
Дата:
On Tuesday 11 Nov 2003 9:02 pm, Dennis Gearon wrote:
> This is something I've been wondereing about for quite awhile - does
> pgsql measure bytes or chars when using UTF for varchars. It looks like
> bytes, which is counter intuitive. What are the byte codes for those 15
> chars. I think the maximum UTF char's byte lenghty is either 5 or 6
> bytes.. Since there are SO many chinese people in the world and Chinese
> should either be popluar or getting popular in the comptuer world, I
> would have though thta the UTF consotium wold have made Chinese at a
> point in the tables that it only required 2,3. or 4 bytes max, and made
> obtuse languages up in the 5 to 6 byte part of the table.

在您的系统中直接获 (entered through html form processed by php script) shows as
在您的系统 when seen with psql. Anything more
than this is rejected for lack of space (the size is varchar(100)

If someone can throw more light on this, I will be grateful.

Best regards


--
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,


Re: multibyte support

От
Tom Lane
Дата:
Ma Siva Kumar <siva@leatherlink.net> writes:
> On Tuesday 11 Nov 2003 9:02 pm, Dennis Gearon wrote:
>> This is something I've been wondereing about for quite awhile - does
>> pgsql measure bytes or chars when using UTF for varchars. It looks like
>> bytes, which is counter intuitive.

The measurement is certainly in characters, in 7.3 and later.  In 7.2 it
was in characters if you'd enabled multibyte.  Once upon a time it was
in bytes, but I don't believe that applies to Ma Siva Kumar's problem.

> 在您的系统中直接获 (entered through html form processed by php script) shows as
> 在您的系统 when seen with psql. Anything more
> than this is rejected for lack of space (the size is varchar(100)

I think there is some confusion between you and the database about
character set encoding.  Double check what the database encoding is
(psql \l will tell you).  And double check what the system thinks the
client-side encoding is ("show client_encoding" and/or \encoding).

            regards, tom lane

Re: multibyte support [Resolved]

От
Ma Siva Kumar
Дата:
On Wednesday 12 Nov 2003 8:40 pm, Tom Lane wrote:

> I think there is some confusion between you and the database about
> character set encoding.  Double check what the database encoding is
> (psql \l will tell you).  And double check what the system thinks the
> client-side encoding is ("show client_encoding" and/or \encoding).

Thanks for the suggestions. In a psql session, \l shows the encoding of the
database as unicode (in Name, Owner, Encoding form) and both \encoding and
show client_encoding; return unicode.

But it turned out that the problem is not with the database, but with the
client application (php). When I entered Chinese characters into the database
through psql client, it IS stored as chinese characters and works as
expected.

This I found out when Mark Rappoport suggested to configure php to handle
multibyte strings. The version of php I run is not handling the multibyte
string entered in the forms properly. I need to recompile php with
--enable-mbstring (http://www.php.net/manual/en/ref.mbstring.php) to solve
the problem.

Thanks everyone for the help.


Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,



Re: multibyte support [Resolved]

От
Ma Siva Kumar
Дата:
[copying to the list, in case someone else faces the similar situation and is
looking for an answer]

On Thursday 13 Nov 2003 10:46 am, you wrote:

> Thank you for this tip. Will you show us the phpinfo() output,
> appropriately edited for security, even the single line in the php
> section, that shows it is using mb strings when you get done, please?

Hello Dennis,

Here is what I did.

1. Recompile php using the src rpm provided with Red Hat (rpmbuild) using the
following

configure --with-pgsql --without-mysql --enable-mbstring --with-apxs2
make
make install

2. Snipptets of phpinfo():

Configure Command  './configure' '--with-pgsql' '--without-mysql'
'--with-apxs2' '--enable-mbstring'

mbstring
Multibyte (Japanese) Support enabled

Directive     Local Value     Master Value
mbstring.detect_order no value no value
mbstring.func_overload 0 0
mbstring.http_input UTF-8 UTF-8
mbstring.http_output UTF-8 UTF-8
mbstring.internal_encoding UTF-8 UTF-8
mbstring.substitute_character no value no value

3. Since we will be using many languages, I set the encoding to UTF-8 here as
well as in the header files of all my scripts. In addition, I set the
default_charset directive in php.ini to UTF-8.

I guess all the above settings may not be necessary. But it works for me :-)

Best regards,


Ma SivaKumar


--
Integrated Management Tools for leather industry
----------------------------------
http://www.leatherlink.net

Ma Siva Kumar,
BSG LeatherLink (P) Ltd,