Re: Storing double-byte strings in text fields.

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: Storing double-byte strings in text fields.
Дата
Msg-id 20010217120005S.t-ishii@sra.co.jp
обсуждение исходный текст
Ответ на Storing double-byte strings in text fields.  (Edmund von der Burg <edmund@ecclestoad.co.uk>)
Список pgsql-general
> I am putting together a web site to display a collection of Chinese
> woodblock prints. I want to be able to store double byte values (that is
> to say Big5, Unicode etc encoded) in a text field for things such as the
> artist's name and the title of the print. I have the following questions:
>
> Is this possible using a plain vanilla version of Postgres, ie without the
> multi-lingual support enabled? As I understand it multi-lingual support
> allows me to store table and field names etc in non-ASCII, but doesn't
> really affect what goes into the fields.

As already Tom mentioned, your RPMS based Linux boxes already have
PostgreSQL multi-byte capability enabled.

> Are programs such as pgdump and the COPY method 8bit clean or will they
> mess up the text? I have done some quick trials and it all seems OK but I
> want to be sure before commiting.

I don't see any reason that copy or pg_dump is not 8bit clean.

> If the above is not the case will the multi-lingual support fix my
> problems? I tried it out but had problems with the backend crashing on
> certain queries. I'd also rather not use it as it will be easier to port
> my system to other servers if it just needs a plain vanilla install.

You said you use Big5. That might be the problem. PostgreSQL does not
accept any encoding conficting with ASCII. Certain Big5 characters
include such that second bytes in the ASCII range. In this case you
need to create a database with EUC_TW encoding and set the environment
varible "PGCLIENTENCODING" to BIG5 in your frontend. This will force
the backend to convert Big5 <--> EUC_TW automatically. Oh, you use
PHP4?  then you need to set the environment varible before starting up
Apache if you use PHP4 as a module. Also I suspect you might have
trouble with PHP4. It has a capability called "magic quote", that adds
an escape character (\) to the second byte of Big5 if it's a meta
character. You need to disable it otherwise PostgreSQL will be
confused. In summary you must be very carefull to use Big5 especially
with PHP.

Talking about Unicode, it is safe as long as UTF-8 encoding. UCS-2/4
cannot be used with PostgreSQL. PostgreSQL 7.1 will have the ability
to do an automatic code conversion between UTF-8 and other encodings
including Big5. This might be a good news for you.

Another problems I have seen so far with chinese character sets are
sometimes data produced by chinese applications are badly
broken. Actually PostgreSQL is not so robust against such broken
multi-byte strings. I suspect this may be the reason of the backend
crash you had if above are not apply. I don't know.

> I am currently using Postgresql 7.0.3 on RedHat 6.2 (x86) and also on
> YellowDog 1.2 (PPC). The web server is Apache 1.3.12 with PHP 4.0.x.
--
Tatsuo Ishii

В списке pgsql-general по дате отправления:

Предыдущее
От: adb
Дата:
Сообщение: Re: Rserv question or docs?
Следующее
От: elwood@agouros.de (Konstantinos Agouros)
Дата:
Сообщение: Re: How to use postgres 7.0.3 with -F?