Обсуждение: Re: [GENERAL] postgres & server encodings
"Salem Berhanu" <salemb4@hotmail.com> writes: > What exactly is the SQL_ASCII encoding in postgres? SQL_ASCII isn't so much an encoding as the declaration that you don't care about encodings. That setting simply disables encoding validity checks and encoding conversions. The server will take any byte string clients send it (barring only embedded zero bytes), and store and return it unchanged. Since it disables conversions, the notion of converting to another encoding is pretty much meaningless :-(. regards, tom lane
Not that I am an expert or anything, but my initial data base was SQLASCII and I did have to convert it to Unicode. My reasons were we store French characters in our database and the newer odbc driver was not displaying them correctly coming from SQLASCII, but was from UNICODE. I also think that it can affect functions like length and upper, but Tom knows a ton more then me about this stuff. I did my initial conversion on 7.4 and the odbc driver at that time had no issues with SQLASCII displaying the French, but I think in 8.0.1 I started seeing an issue. The latest version of the driver 8.0.4 seems to be working well (only up a little over 24 hours thus far). I wish I had used a unicode data base from the start (7.4 driver was what I used and it did not like moving from MSSQL to Unicode). I later switched to .net (npgsql objects) for my conversion and used a encoding object to write the data correctly. Joel Fradkin Wazagua, Inc. 2520 Trailmate Dr Sarasota, Florida 34243 Tel. 941-753-7111 ext 305 jfradkin@wazagua.com www.wazagua.com Powered by Wazagua Providing you with the latest Web-based technology & advanced tools. C 2004. WAZAGUA, Inc. All rights reserved. WAZAGUA, Inc This email message is for the use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and delete and destroy all copies of the original message, including attachments. -----Original Message----- From: pgsql-admin-owner@postgresql.org [mailto:pgsql-admin-owner@postgresql.org] On Behalf Of Tom Lane Sent: Tuesday, August 09, 2005 11:59 AM To: Salem Berhanu Cc: pgsql-admin@postgresql.org; pgsql-general@postgresql.org Subject: Re: [ADMIN] [GENERAL] postgres & server encodings "Salem Berhanu" <salemb4@hotmail.com> writes: > What exactly is the SQL_ASCII encoding in postgres? SQL_ASCII isn't so much an encoding as the declaration that you don't care about encodings. That setting simply disables encoding validity checks and encoding conversions. The server will take any byte string clients send it (barring only embedded zero bytes), and store and return it unchanged. Since it disables conversions, the notion of converting to another encoding is pretty much meaningless :-(. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings
On Tue, Aug 09, 2005 at 12:56:37PM -0400, Joel Fradkin wrote: > Not that I am an expert or anything, but my initial data base was SQLASCII > and I did have to convert it to Unicode. > My reasons were we store French characters in our database and the newer > odbc driver was not displaying them correctly coming from SQLASCII, but was > from UNICODE. > I also think that it can affect functions like length and upper, but Tom > knows a ton more then me about this stuff. > > I did my initial conversion on 7.4 and the odbc driver at that time had no > issues with SQLASCII displaying the French, but I think in 8.0.1 I started > seeing an issue. The latest version of the driver 8.0.4 seems to be working > well (only up a little over 24 hours thus far). A conversion will work fine assuming the data is all encoded using the same encoding. So if it's all utf8 ("Unicode") already, you can import it verbatim into a UTF8 database and it will work fine. If it's all Latin-1, you can import into a UTF-8 db using a client_encoding=latin1 during import, or verbatim to a Latin-1 database, and it will also work fine. (You of course are expected to be able to figure out what encoding is the data really in.) The problem only shows up when you have mixed data -- say, you have two applications, one website in PHP which inserts data in Latin-1, and a Windows app which inserts in UTF-8. In this case your data will be a mess to fix, and there's no way a single conversion will get it right. You will have to manually separate the parts that are UTF8 from the Latin1, and import them separately. Not a position I'd like to be in. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "Coge la flor que hoy nace alegre, ufana. ¿Quién sabe si nacera otra mañana?"
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > The problem only shows up when you have mixed data -- say, you have two > applications, one website in PHP which inserts data in Latin-1, and a > Windows app which inserts in UTF-8. In this case your data will be a > mess to fix, and there's no way a single conversion will get it right. > You will have to manually separate the parts that are UTF8 from the > Latin1, and import them separately. Not a position I'd like to be in. The only helpful tip I can think of is that you can try to import data into a UTF8 database and see if it gets rejected as badly encoded; this will at least give you a weak tool to separate what's what. I'm afraid the reverse direction won't help much --- in single-byte encodings such as Latin1 there are no encoding errors, and so you can't do any simple filtering to check in that direction. In the end you're going to have to eyeball a lot of data for plausibility :-( regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > "Salem Berhanu" <salemb4@hotmail.com> writes: > > What exactly is the SQL_ASCII encoding in postgres? > > SQL_ASCII isn't so much an encoding as the declaration that you don't > care about encodings. It's too late to consider renaming this SQL_RAW or something like that is it? It is a huge source of confusion. Perhaps have a separate "ascii" encoding that checks and complains if any non-ascii characters are present. -- greg