Обсуждение: convert(USING utf8_to_iso_8859_15) on Windows

Поиск
Список
Период
Сортировка

convert(USING utf8_to_iso_8859_15) on Windows

От
"Pierre Thibaudeau"
Дата:
Is this a documented phenomenon with the "convert" function?  The first result is what's expected:

SELECT convert('Gregoire' USING utf8_to_iso_8859_15);
"Gregoire"

But I don't understand the next result, when I put an acute accent over the first "e":

SELECT convert('Grégoire' USING utf8_to_iso_8859_15);
""

(The output is an empty string.)

Likewise, whenever I enter a string containing non-ASCII characters, the convert function outputs an empty string.  Same results when I change the conversion type from UTF8 to any other encoding which accepts those non-ASCII characters...  (When I try a conversion to an encoding that doesn't accept the characters, I get an error message, and that's normal.)

My setup is as follows:
PostgreSQL 8.2.1 on WindowsXP
The database has UTF8 encoding.
SHOW lc_ctype; gives: "French_Canada.1252"

Is my problem related to Windows' lack of UTF8 support?  I thought those problems were solved with version 8.2...

Re: convert(USING utf8_to_iso_8859_15) on Windows

От
Tom Lane
Дата:
"Pierre Thibaudeau" <pierdeux@gmail.com> writes:
> My setup is as follows:
> PostgreSQL 8.2.1 on WindowsXP
> The database has UTF8 encoding.
> SHOW lc_ctype; gives: "French_Canada.1252"

I'm not sure about any Windows-specific issues, but in general it's a
really bad idea to be using lc_collate or lc_ctype that is incompatible
with the database encoding.

            regards, tom lane

Fwd: convert(USING utf8_to_iso_8859_15) on Windows

От
"Pierre Thibaudeau"
Дата:
Thanks, Tom, for the comment.  (Sorry for emailing directly to you:
pressed "send" too quickly!)

Although that raises further questions:

* Is there a text that documents all that is known about the encoding
issues between PostgreSQL and Windows?  Surely, this is likely to be a
"fairly" widespread issue!  So far, everything I've read had to do
with mysterious bad omens with never specific statements about what's
what, and what can (or cannot) be done to solve the situation
satisfactorily...

* Windows XP does support UTF8, yet it is not possible (as far as I
know) to define one's locale to have anything to do with UTF8
(presumably in the sense that UTF8 isn't an aspect of a specific
locale):  there is no en_US.UTF8 or fr_CA.UTF8 locales, for instance.
But why should this matter?  Say I am entering the data through a
piece of software that works with UTF8, via the ODBC driver.  Say
again that I output the data with another software that expects UTF8,
via the JDBC driver.  Why does it matter that my system should be
localized in another encoding?


2007/1/29, Tom Lane <tgl@sss.pgh.pa.us>:
> "Pierre Thibaudeau" <pierdeux@gmail.com> writes:
> > My setup is as follows:
> > PostgreSQL 8.2.1 on WindowsXP
> > The database has UTF8 encoding.
> > SHOW lc_ctype; gives: "French_Canada.1252"
>
> I'm not sure about any Windows-specific issues, but in general it's a
> really bad idea to be using lc_collate or lc_ctype that is incompatible
> with the database encoding.
>

Re: Fwd: convert(USING utf8_to_iso_8859_15) on Windows

От
Martijn van Oosterhout
Дата:
On Mon, Jan 29, 2007 at 03:46:34PM -0500, Pierre Thibaudeau wrote:
> * Windows XP does support UTF8, yet it is not possible (as far as I
> know) to define one's locale to have anything to do with UTF8
> (presumably in the sense that UTF8 isn't an aspect of a specific
> locale):  there is no en_US.UTF8 or fr_CA.UTF8 locales, for instance.
> But why should this matter?  Say I am entering the data through a
> piece of software that works with UTF8, via the ODBC driver.  Say
> again that I output the data with another software that expects UTF8,
> via the JDBC driver.  Why does it matter that my system should be
> localized in another encoding?

Because postgresql relies on OS support to do things like string
comparison. Since Windows does not support UTF-8 locales, sorting there
with UTF-8 is a bit of a hack, whereas many (some?) Unixes can handle
it natively.

At some point postgresql will know how to do string comparisons itself
ad thus the problem will be solved, but it hasn't happened yet.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения