Обсуждение: Missing Bug-Report #5904?

Поиск
Список
Период
Сортировка

Missing Bug-Report #5904?

От
Torsten Zühlsdorff
Дата:
Hello,

i've written a bugreport which got the ID #5904, but i'm not able to
find it in the mailinglist. Is it lost? Maybe there is a problem with
the Umlaut in my name?

Now for the Problem: There is a problem with the translation of the
english word "March" to the german "März". Instead of "März" i get
"MäRz" (with uppercase "r").

You can reproduce it as follow:
# SET lc_time = "de_DE.UTF-8";
# SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY');  to_char
----------- MäRz 2011

I did not find the translation file for this, so i can't add a patch or
check for other misspellings.

My System:
PostgreSQL 9.0.3
FreeBSD 8.1-RELEASE

Greetings from Germany,
Torsten


Re: Missing Bug-Report #5904?

От
Andres Freund
Дата:
Hi,

On Friday, March 04, 2011 10:16:24 AM Torsten Z=FChlsdorff wrote:
> Now for the Problem: There is a problem with the translation of the=20
> english word "March" to the german "M=E4rz". Instead of "M=E4rz" i get=20
> "M=E4Rz" (with uppercase "r").
>=20
> You can reproduce it as follow:
> # SET lc_time =3D "de_DE.UTF-8";
> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY');
>    to_char
> -----------
>   M=E4Rz 2011
>=20
> I did not find the translation file for this, so i can't add a patch or=
=20
> check for other misspellings.
>=20
> My System:
> PostgreSQL 9.0.3
> FreeBSD 8.1-RELEASE
Thats very likely a problem of your operating systems locales. What spellin=
g=20
does the month have if you construct it with `date` or such?

Andres

Re: Missing Bug-Report #5904?

От
Torsten Zühlsdorff
Дата:
Hello Andres,

>> Now for the Problem: There is a problem with the translation of the
>> english word "March" to the german "März". Instead of "März" i get
>> "MäRz" (with uppercase "r").
>>
>> You can reproduce it as follow:
>> # SET lc_time = "de_DE.UTF-8";
>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY');
>>    to_char
>> -----------
>>   MäRz 2011
>>
>> I did not find the translation file for this, so i can't add a patch or
>> check for other misspellings.
>>
>> My System:
>> PostgreSQL 9.0.3
>> FreeBSD 8.1-RELEASE
> Thats very likely a problem of your operating systems locales. What spelling
> does the month have if you construct it with `date` or such?

Done directly at the bash on the same system:
$ date +%B
March
$  export LC_TIME=de_DE.UTF-8
$ date +%B
März

And in PostgreSQL:
# SET lc_time = "de_DE.UTF-8";
SET
# SELECT to_char(current_date, 'TMMonth YYYY');  to_char
----------- MäRz 2011

I also can reproduce this at a FreeBSD 7.0-STABLE.

Greetings,
Torsten



Re: Missing Bug-Report #5904?

От
Magnus Hagander
Дата:
On Fri, Mar 4, 2011 at 14:34, Torsten Z=FChlsdorff
<foo@meisterderspiele.de> wrote:
> Hello Andres,
>
>>> Now for the Problem: There is a problem with the translation of the
>>> english word "March" to the german "M=E4rz". Instead of "M=E4rz" i get =
"M=E4Rz"
>>> (with uppercase "r").
>>>
>>> You can reproduce it as follow:
>>> # SET lc_time =3D "de_DE.UTF-8";
>>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY');
>>> =A0 to_char
>>> -----------
>>> =A0M=E4Rz 2011
>>>
>>> I did not find the translation file for this, so i can't add a patch or
>>> check for other misspellings.
>>>
>>> My System:
>>> PostgreSQL 9.0.3
>>> FreeBSD 8.1-RELEASE
>>
>> Thats very likely a problem of your operating systems locales. What
>> spelling does the month have if you construct it with `date` or such?
>
> Done directly at the bash on the same system:
> $ date +%B
> March
> $ =A0export LC_TIME=3Dde_DE.UTF-8
> $ date +%B
> M=E4rz
>
> And in PostgreSQL:
> # SET lc_time =3D "de_DE.UTF-8";
> SET
> # SELECT to_char(current_date, 'TMMonth YYYY');
> =A0to_char
> -----------
> =A0M=E4Rz 2011
>
> I also can reproduce this at a FreeBSD 7.0-STABLE.

IIRC, the FreeBSD locales at least used to be pretty much broken for
UTF8. Can you try and see if you get the same problem in a non-UTF8
locale?

--=20
=A0Magnus Hagander
=A0Me: http://www.hagander.net/
=A0Work: http://www.redpill-linpro.com/

Re: Missing Bug-Report #5904?

От
Torsten Zühlsdorff
Дата:
Hello,

>>>> Now for the Problem: There is a problem with the translation of the
>>>> english word "March" to the german "März". Instead of "März" i get "MäRz"
>>>> (with uppercase "r").
>>>>
>>>> You can reproduce it as follow:
>>>> # SET lc_time = "de_DE.UTF-8";
>>>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY');
>>>>   to_char
>>>> -----------
>>>>  MäRz 2011
>>>>
>>>> I did not find the translation file for this, so i can't add a patch or
>>>> check for other misspellings.
>>>>
>>>> My System:
>>>> PostgreSQL 9.0.3
>>>> FreeBSD 8.1-RELEASE
>>> Thats very likely a problem of your operating systems locales. What
>>> spelling does the month have if you construct it with `date` or such?
>> Done directly at the bash on the same system:
>> $ date +%B
>> March
>> $  export LC_TIME=de_DE.UTF-8
>> $ date +%B
>> März
>>
>> And in PostgreSQL:
>> # SET lc_time = "de_DE.UTF-8";
>> SET
>> # SELECT to_char(current_date, 'TMMonth YYYY');
>>  to_char
>> -----------
>>  MäRz 2011
>>
>> I also can reproduce this at a FreeBSD 7.0-STABLE.
>
> IIRC, the FreeBSD locales at least used to be pretty much broken for
> UTF8. Can you try and see if you get the same problem in a non-UTF8
> locale?

Doesn't work proper at my bash, even the dirty way:
$ export LC_ALL=de_DE.ISO8859-1
$ export LC_PAPER=de_DE.ISO8859-1
$ export LC_ADDRESS=de_DE.ISO8859-1
$ export LC_MONETARY=de_DE.ISO8859-1
$ export LC_NUMERIC=de_DE.ISO8859-1
$ export LC_TELEPHONE=de_DE.ISO8859-1
$ export LC_MESSAGES=de_DE.ISO8859-1
$ export LC_IDENTIFICATION=de_DE.ISO8859-1
$ export LC_COLLATE=de_DE.ISO8859-1
$ export LANG=de_DE.ISO8859-1
$ export LC_MEASUREMENT=de_DE.ISO8859-1
$ export XTERM_LOCALE=de_DE.ISO8859-1
$ export LANGUAGE=de_DE.ISO8859-1:de
$ export LC_CTYPE=de_DE.ISO8859-1
$ export LC_TIME=de_DE.ISO8859-1
$ export LC_NAME=de_DE.ISO8859-1
$ export LC_ALL=de_DE.ISO8859-1
$  date +%B
M�z

I can't figure out, why the umlaut is not displayed correctly.

In PostgreSQL it looks interesting:
# SET lc_time = "de_DE.ISO8859-1";
SET
0.3.impos=# SELECT to_char(current_date, 'TMMonth YYYY'); to_char
---------- MRz 2011
(1 Zeile)

The missing Umlaut could be an error of the bash, but the upperase "r"
is still there.

Greetings,
Torsten


Re: Missing Bug-Report #5904?

От
Tom Lane
Дата:
Torsten Zühlsdorff <foo@meisterderspiele.de> writes:
> Now for the Problem: There is a problem with the translation of the
> english word "March" to the german "M�rz". Instead of "M�rz" i get
> "M�Rz" (with uppercase "r").

> You can reproduce it as follow:
> # SET lc_time = "de_DE.UTF-8";
> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY');
>    to_char
> -----------
>   M�Rz 2011

I can reproduce the above when the database encoding is not UTF8 or
lc_ctype isn't a UTF8 locale.  The reason is that TMMonth implies
applying an initcap transformation to the month name retrieved from
the locale library.  The only way initcap will make the right choice
of what to do with the "r" is if it thinks that � is a letter.
Which it won't if the encoding is wrong or lc_ctype isn't set to
classify � as a letter.  This does not seem like a bug to me
though, just misconfiguration.
        regards, tom lane


Re: Missing Bug-Report #5904?

От
Torsten Zühlsdorff
Дата:
Hello Tom,

>> Now for the Problem: There is a problem with the translation of the
>> english word "March" to the german "März". Instead of "März" i get
>> "MäRz" (with uppercase "r").
>
>> You can reproduce it as follow:
>> # SET lc_time = "de_DE.UTF-8";
>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY');
>>    to_char
>> -----------
>>   MäRz 2011
>
> I can reproduce the above when the database encoding is not UTF8 or
> lc_ctype isn't a UTF8 locale.  The reason is that TMMonth implies
> applying an initcap transformation to the month name retrieved from
> the locale library.  The only way initcap will make the right choice
> of what to do with the "r" is if it thinks that ä is a letter.
> Which it won't if the encoding is wrong or lc_ctype isn't set to
> classify ä as a letter.  This does not seem like a bug to me
> though, just misconfiguration.

Hm... encoding of the database is UTF8. The lc_ctype is 'C'.
Maybe this may be a misconfiguration, but is there another way to get it
work right than recreating the complete database with another locale?

But don't that mean, that the translation of the timestamp to languages
with other umlauts should also be wrong. For example to "fr_FR.UTF-8"?

Greetings from Germany,
Torsten


Re: Missing Bug-Report #5904?

От
Tom Lane
Дата:
Torsten Zühlsdorff <foo@meisterderspiele.de> writes:
>>> # SET lc_time = "de_DE.UTF-8";
>>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY');
>>> to_char
>>> -----------
>>> M�Rz 2011

>> I can reproduce the above when the database encoding is not UTF8 or
>> lc_ctype isn't a UTF8 locale.

> Hm... encoding of the database is UTF8. The lc_ctype is 'C'.

Right, that was the same case I checked.  In C locale, � is not a
letter, so you get the above from the initcap transformation.

> But don't that mean, that the translation of the timestamp to languages
> with other umlauts should also be wrong. For example to "fr_FR.UTF-8"?

Possibly, I haven't checked.  If they have any month names with
non-ASCII characters in the middle, they'd see the same thing.
You would certainly also get undesirable results from TMMONTH, since
it wouldn't know how to uppercase �.  In my view none of this is
a Postgres bug --- the correct fix is to use locale settings that
correspond to the behavior you want.
        regards, tom lane


Re: Missing Bug-Report #5904?

От
Torsten Zühlsdorff
Дата:
Hello,

>>>> # SET lc_time = "de_DE.UTF-8";
>>>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY');
>>>> to_char
>>>> -----------
>>>> MäRz 2011
>
>>> I can reproduce the above when the database encoding is not UTF8 or
>>> lc_ctype isn't a UTF8 locale.
>
>> Hm... encoding of the database is UTF8. The lc_ctype is 'C'.
>
> Right, that was the same case I checked.  In C locale, ä is not a
> letter, so you get the above from the initcap transformation.
>
>> But don't that mean, that the translation of the timestamp to languages
>> with other umlauts should also be wrong. For example to "fr_FR.UTF-8"?
>
> Possibly, I haven't checked.  If they have any month names with
> non-ASCII characters in the middle, they'd see the same thing.
> You would certainly also get undesirable results from TMMONTH, since
> it wouldn't know how to uppercase ä.  In my view none of this is
> a Postgres bug --- the correct fix is to use locale settings that
> correspond to the behavior you want.

Hm... in my point of view it's a bug, but not necessarily a PG bug. My
desired result is the correct translated output in different languages.
Now i know that this is not possible, because i have to use the correct
lc_ctype for the entire database, which can't be changed after the
database-creation.
The only work-around seems to be to handle the translation myself.
That's very ugly and makes the use of TMMonth pointless, if you have to
take care of the result-output before you use the database.

Thanks to all for your time and help,
Torsten