Обсуждение: questions on multibyte

Поиск
Список
Период
Сортировка

questions on multibyte

От
"Karthikeyan Sundaram"
Дата:
Hi all,

    I am new to postgres.  I have converted my current db from sql_ascii to
unicode.  I followed the instructions give by some site called moodle and it
worked.

    I have some basic questions.

1) How will I insert multibyte from insert statement?  I figured out a way
to insert multibyte, but want general opinion.
2) I want the data to be transferred to warehouse.
3) I login to psql and do a select the multibyte. (I did a plain select).
and push to a flat file so that the warehouse will upload. The data appread
to be scatterd to multiple lines which I am not able to insert into my
warehouse. How will display the multibyte chars from psql command line?
4) Do I need to install some modules or functions or any methods?

   Please advise.

Regards
skarthi
Podbridge Inc

_________________________________________________________________
Turn searches into helpful donations. Make your search count.
http://click4thecause.live.com/search/charity/default.aspx?source=hmemtagline_donation&FORM=WLMTAG


Re: questions on multibyte

От
Michael Fuhr
Дата:
On Thu, Feb 01, 2007 at 07:01:31AM -0800, Karthikeyan Sundaram wrote:
> 1) How will I insert multibyte from insert statement?  I figured out a way
> to insert multibyte, but want general opinion.

If you have UTF-8 data then treat it just like any other text string
and make sure client_encoding is set to 'UTF-8' or one of its aliases.
If the data is in some other encoding then set client_encoding
appropriately and the database will automatically convert the data
to the server's encoding.

> 2) I want the data to be transferred to warehouse.

Do you mean a data warehouse?  What DBMS is the warehouse using?
What character encoding does it use?  UTF-8 or something else?

> 3) I login to psql and do a select the multibyte. (I did a plain select).
> and push to a flat file so that the warehouse will upload. The data appread
> to be scatterd to multiple lines which I am not able to insert into my
> warehouse. How will display the multibyte chars from psql command line?

How did you "push to a flat file"?  Did you use COPY or psql's
\copy?  Does the data contain linefeeds and/or carriage returns?
Does the warehouse or your display understand UTF-8?  If not then
you might need to use client_encoding or a program like iconv or
uconv to convert the data to another encoding.

--
Michael Fuhr

questions on multibyte

От
"Karthikeyan Sundaram"
Дата:
Hi Michael Fuhr,

    Thanks for replying. Here is the information given below.

1) I am using 8.2 version
2) My database ise encoded as UTF-8 and client encoding is as "SET
client_encoding = 'UNICODE';"
3) We have a portal that are proposed to accept multibyte.  We are using PHP
modules.  Do we need to convert to UTF-8 before inserting in PHP?
4) We are using IBM redbrick data warehouse which is also set to UTF-8
5) I am using a select statement with an \o output to a flat file with pipe
as delimeter.  Then I am using a rb_tmu utility supplied by IBM to load into
the redbrick warehouse database.  When I use a select statement where the
column data contains Korean, Japanese and chinese character, they were
display fine in the PGAdmin GUI.  But when I use a select statement to
convert to flat file, then it's give junk chars spread across many lines. My
requirement is one line one record.

   for example
    select address1 from address where create_date > $date_time

   In this the address1 can contain multiple language char set.
   should I use any specific command for UTF-8 so that it can convert and
put into oneline in the output file.

   As I am new to postgres, I don't know much command.  Eventhough I have
good experience in oracle, In oracle, we say NLS_LANG, UTF-8 will put them
in one line.  Is there a similar command
in postgres. I am learning postgres.  Please help

Regards
skarthi



>From: Michael Fuhr <mike@fuhr.org>
>To: Karthikeyan Sundaram <skarthi98@hotmail.com>
>CC: pgsql-admin@postgresql.org
>Subject: Re: [ADMIN] questions on multibyte
>Date: Thu, 1 Feb 2007 19:35:59 -0700
>
>On Thu, Feb 01, 2007 at 07:01:31AM -0800, Karthikeyan Sundaram wrote:
> > 1) How will I insert multibyte from insert statement?  I figured out a
>way
> > to insert multibyte, but want general opinion.
>
>If you have UTF-8 data then treat it just like any other text string
>and make sure client_encoding is set to 'UTF-8' or one of its aliases.
>If the data is in some other encoding then set client_encoding
>appropriately and the database will automatically convert the data
>to the server's encoding.
>
> > 2) I want the data to be transferred to warehouse.
>
>Do you mean a data warehouse?  What DBMS is the warehouse using?
>What character encoding does it use?  UTF-8 or something else?
>
> > 3) I login to psql and do a select the multibyte. (I did a plain
>select).
> > and push to a flat file so that the warehouse will upload. The data
>appread
> > to be scatterd to multiple lines which I am not able to insert into my
> > warehouse. How will display the multibyte chars from psql command line?
>
>How did you "push to a flat file"?  Did you use COPY or psql's
>\copy?  Does the data contain linefeeds and/or carriage returns?
>Does the warehouse or your display understand UTF-8?  If not then
>you might need to use client_encoding or a program like iconv or
>uconv to convert the data to another encoding.
>
>--
>Michael Fuhr
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

_________________________________________________________________
Turn searches into helpful donations. Make your search count.
http://click4thecause.live.com/search/charity/default.aspx?source=hmemtagline_donation&FORM=WLMTAG


Re: questions on multibyte

От
Michael Fuhr
Дата:
On Thu, Feb 01, 2007 at 09:32:22PM -0800, Karthikeyan Sundaram wrote:
> 1) I am using 8.2 version
> 2) My database ise encoded as UTF-8 and client encoding is as "SET
> client_encoding = 'UNICODE';"
> 3) We have a portal that are proposed to accept multibyte.  We are using
> PHP modules.  Do we need to convert to UTF-8 before inserting in PHP?

If you set client_encoding to whatever encoding the data is in then
you shouldn't need to convert it in the application.  See "Automatic
Character Set Conversion Between Server and Client" in the documentation:

http://www.postgresql.org/docs/8.2/interactive/multibyte.html#AEN24099

If the data is in an encoding that PostgreSQL doesn't support then
you'll need to convert it before inserting it.

> 4) We are using IBM redbrick data warehouse which is also set to UTF-8
> 5) I am using a select statement with an \o output to a flat file with pipe
> as delimeter.  Then I am using a rb_tmu utility supplied by IBM to load
> into the redbrick warehouse database.  When I use a select statement where
> the column data contains Korean, Japanese and chinese character, they were
> display fine in the PGAdmin GUI.  But when I use a select statement to
> convert to flat file, then it's give junk chars spread across many lines.
> My requirement is one line one record.

As for multiple lines, are you sure UTF-8 is the problem?  I don't
use pgAdmin much but I notice that it displays only the first line
of multi-line data.  For example:

CREATE TABLE test (id serial, address text);
INSERT INTO test (address) VALUES (E'Line 1\nLine 2\nLine 3');
SELECT * FROM test;

In pgAdmin the SELECT shows only 'Line 1' in the address column.
psql shows multiple lines:

 id | address
----+---------
  1 | Line 1
    : Line 2
    : Line 3
(1 row)

You can get a single line by converting CR and LF to spaces:

SELECT id, regexp_replace(address, E'[\r\n]+', ' ', 'g') AS address FROM test;
 id |       address
----+----------------------
  1 | Line 1 Line 2 Line 3
(1 row)

psql's \o option probably isn't the best choice for exporting data
that will be imported somewhere else.  Try COPY or psql's \copy:

\copy test to test.txt delimiter '|'

This should create a file named test.txt with the following contents:

1|Line 1\nLine 2\nLine 3

However, this output might be problematic if the import process
doesn't understand escapes like \r and \n so you might need to
convert CR and LF to spaces as above.

As for "junk chars," make sure that whatever terminal or application
you're using interprets the data as whatever encoding it's in.  For
example, if you exported the data with client_encoding set to UTF-8
(or one of its aliases like UNICODE) then make sure your display
interprets the data as UTF-8 and knows how to render it.

--
Michael Fuhr