Re: OK, that's one LOCALE bug report too many...
От | Peter Eisentraut |
---|---|
Тема | Re: OK, that's one LOCALE bug report too many... |
Дата | |
Msg-id | Pine.LNX.4.21.0011242345230.791-100000@peter.localdomain обсуждение исходный текст |
Ответ на | Re: OK, that's one LOCALE bug report too many... (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: OK, that's one LOCALE bug report too many...
|
Список | pgsql-hackers |
Tom Lane writes: > >> Also, since "LC_COLLATE=en_US" seems to misbehave rather spectacularly > >> on recent RedHat releases, I propose that initdb change "en_US" to "C" > >> if it finds that setting. (Are there any platforms where there are > >> non-bogus differences between the two?) > > > There *should* be differences and it is definitely not okay to mix them > > up. > > I have now received positive proof that en_US sort order on RedHat is > broken. For example, it asserts > '/root/' < '/root0' > but > '/root/t' > '/root0' > I defy you to find anyone in the US who will say that that is a > reasonable definition of string collation. That's certainly very odd, but Unixware does this too, so it's probably some sort of standard. And a few other European/Latin locales I tried also do this. But here's another example of why C and en_US are different. peter ~$ cat foo Delta écrire Beta alpha gamma peter ~$ LC_COLLATE=C sort foo Beta Delta alpha gamma écrire peter ~$ LC_COLLATE=en_US sort foo alpha Beta Delta écrire gamma The C locale sorts strictly by character code. But in the en_US locale the accented letter is put into a "natural" position, and the upper and lower case letters are grouped together. Intuitively, the en_US order is in which you'd look up things in a dictionary. This also explains (to me at least) the example you have above: When you look up words in a dictionary you ignore "funny characters". My American Heritage Dictionary explains: : Entries are listed in alphabetical order without taking into account : spaces or hyphens. So at least this concept isn't that far out. > Do you think there are cases where setlocale(,NULL) will give back > "POSIX" rather than "C"? We can certainly test for either. I know there are (old) systems that reject LANG=C as invalid locale, but I don't know what setlocale returns there. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
В списке pgsql-hackers по дате отправления: