Обсуждение: Issues with german 'Umlaute'
Hello everybody, I recently found a problem with sorting german 'Umlaute' . I hope the encoding of this mail works ;-) : Postgres puts Umlaute (i.e., ÄäÖöÜü) at the very end of the Alphabet, and this is not the way it should be. I didn't check for the special Character 'ß', but its probably similar. The canonical sort order for Umlaute is to treat them as two characters, like this: ä -> ae ö -> oe ü -> ue ß -> ss ( and the same for upper case 'ÄÖÜ'. 'ß' does not have an upper case ) Well, I guess this might be difficult to implement and might have quite an impact on performance. The solution I know from other databases consists of inserting ä after a, ö after o, ü after u and ß after s. Afaik this is generally accepted. upper() does not handle Umlaute correctly as well. It leaves äöü unchanged instead of converting them to upper case. All this happens with a database created with encoding ='latin1'. If there are better results with a different encoding (I didn't try it yet), I'd suggest adding some information about this in the documentation. Thanks for your work, N.Erichsen -- HSH Soft-und Hardware Vertriebs GmbH Rudolf-Diesel-Straße 2 - 16321 Lindenberg Tel. (030) 94004 - 509 Fax (030) 94004 - 400
Nicolaus Erichsen <nico.erichsen@hsh-berlin.com> writes:
> I recently found a problem with sorting german 'Umlaute' .
Sounds like you did not set the right locale when creating the database.
You need to be careful to run initdb with LANG (or LC_ALL or at least
LC_COLLATE) set to what you want, probably "de_DE".
> All this happens with a database created with encoding ='latin1'.
Encoding is not the issue, locale is.
regards, tom lane
Tom Lane wrote: > > Nicolaus Erichsen <nico.erichsen@hsh-berlin.com> writes: > > I recently found a problem with sorting german 'Umlaute' . > > Sounds like you did not set the right locale when creating > the database. > You need to be careful to run initdb with LANG (or LC_ALL or at least > LC_COLLATE) set to what you want, probably "de_DE". > > > All this happens with a database created with encoding ='latin1'. > > Encoding is not the issue, locale is. Then what about having German, English, Italian and French words in the same database? Shall we create four databases and place each language in a separate one? Iavor -- Iavor Raytchev very small technologies (a company of CEE Solutions) www.verysmall.org