Обсуждение: Re: [GENERAL] postgres & server encodings

Поиск
Список
Период
Сортировка

Re: [GENERAL] postgres & server encodings

От
Tom Lane
Дата:
"Salem Berhanu" <salemb4@hotmail.com> writes:
> What exactly is the SQL_ASCII encoding in postgres?

SQL_ASCII isn't so much an encoding as the declaration that you don't
care about encodings.  That setting simply disables encoding validity
checks and encoding conversions.  The server will take any byte string
clients send it (barring only embedded zero bytes), and store and return
it unchanged.

Since it disables conversions, the notion of converting to another
encoding is pretty much meaningless :-(.

            regards, tom lane

Re: [GENERAL] postgres & server encodings

От
"Joel Fradkin"
Дата:
Not that I am an expert or anything, but my initial data base was SQLASCII
and I did have to convert it to Unicode.
My reasons were we store French characters in our database and the newer
odbc driver was not displaying them correctly coming from SQLASCII, but was
from UNICODE.
I also think that it can affect functions like length and upper, but Tom
knows a ton more then me about this stuff.

I did my initial conversion on 7.4 and the odbc driver at that time had no
issues with SQLASCII displaying the French, but I think in 8.0.1 I started
seeing an issue. The latest version of the driver 8.0.4 seems to be working
well (only up a little over 24 hours thus far).

I wish I had used a unicode data base from the start (7.4 driver was what I
used and it did not like moving from MSSQL to Unicode). I later switched to
.net (npgsql objects) for my conversion and used a encoding object to write
the data correctly.

Joel Fradkin

Wazagua, Inc.
2520 Trailmate Dr
Sarasota, Florida 34243
Tel.  941-753-7111 ext 305

jfradkin@wazagua.com
www.wazagua.com
Powered by Wazagua
Providing you with the latest Web-based technology & advanced tools.
C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc
 This email message is for the use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure or distribution is prohibited.  If you are not the intended
recipient, please contact the sender by reply email and delete and destroy
all copies of the original message, including attachments.




-----Original Message-----
From: pgsql-admin-owner@postgresql.org
[mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Tom Lane
Sent: Tuesday, August 09, 2005 11:59 AM
To: Salem Berhanu
Cc: pgsql-admin@postgresql.org; pgsql-general@postgresql.org
Subject: Re: [ADMIN] [GENERAL] postgres & server encodings

"Salem Berhanu" <salemb4@hotmail.com> writes:
> What exactly is the SQL_ASCII encoding in postgres?

SQL_ASCII isn't so much an encoding as the declaration that you don't
care about encodings.  That setting simply disables encoding validity
checks and encoding conversions.  The server will take any byte string
clients send it (barring only embedded zero bytes), and store and return
it unchanged.

Since it disables conversions, the notion of converting to another
encoding is pretty much meaningless :-(.

            regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Re: [GENERAL] postgres & server encodings

От
Alvaro Herrera
Дата:
On Tue, Aug 09, 2005 at 12:56:37PM -0400, Joel Fradkin wrote:
> Not that I am an expert or anything, but my initial data base was SQLASCII
> and I did have to convert it to Unicode.
> My reasons were we store French characters in our database and the newer
> odbc driver was not displaying them correctly coming from SQLASCII, but was
> from UNICODE.
> I also think that it can affect functions like length and upper, but Tom
> knows a ton more then me about this stuff.
>
> I did my initial conversion on 7.4 and the odbc driver at that time had no
> issues with SQLASCII displaying the French, but I think in 8.0.1 I started
> seeing an issue. The latest version of the driver 8.0.4 seems to be working
> well (only up a little over 24 hours thus far).

A conversion will work fine assuming the data is all encoded using the
same encoding.  So if it's all utf8 ("Unicode") already, you can import
it verbatim into a UTF8 database and it will work fine.  If it's all
Latin-1, you can import into a UTF-8 db using a client_encoding=latin1
during import, or verbatim to a Latin-1 database, and it will also work
fine.  (You of course are expected to be able to figure out what
encoding is the data really in.)

The problem only shows up when you have mixed data -- say, you have two
applications, one website in PHP which inserts data in Latin-1, and a
Windows app which inserts in UTF-8.  In this case your data will be a
mess to fix, and there's no way a single conversion will get it right.
You will have to manually separate the parts that are UTF8 from the
Latin1, and import them separately.  Not a position I'd like to be in.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Coge la flor que hoy nace alegre, ufana. ¿Quién sabe si nacera otra mañana?"

Re: [GENERAL] postgres & server encodings

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> The problem only shows up when you have mixed data -- say, you have two
> applications, one website in PHP which inserts data in Latin-1, and a
> Windows app which inserts in UTF-8.  In this case your data will be a
> mess to fix, and there's no way a single conversion will get it right.
> You will have to manually separate the parts that are UTF8 from the
> Latin1, and import them separately.  Not a position I'd like to be in.

The only helpful tip I can think of is that you can try to import data
into a UTF8 database and see if it gets rejected as badly encoded; this
will at least give you a weak tool to separate what's what.

I'm afraid the reverse direction won't help much --- in single-byte
encodings such as Latin1 there are no encoding errors, and so you can't
do any simple filtering to check in that direction.  In the end you're
going to have to eyeball a lot of data for plausibility :-(

            regards, tom lane

Re: [GENERAL] postgres & server encodings

От
Greg Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> "Salem Berhanu" <salemb4@hotmail.com> writes:
> > What exactly is the SQL_ASCII encoding in postgres?
>
> SQL_ASCII isn't so much an encoding as the declaration that you don't
> care about encodings.

It's too late to consider renaming this SQL_RAW or something like that is it?
It is a huge source of confusion.

Perhaps have a separate "ascii" encoding that checks and complains if any
non-ascii characters are present.

--
greg