Re: Thoughts on multiple simultaneous code page support
От | Randall Parker |
---|---|
Тема | Re: Thoughts on multiple simultaneous code page support |
Дата | |
Msg-id | 01501518836812@mail.nls.net обсуждение исходный текст |
Ответ на | Thoughts on multiple simultaneous code page support ("Randall Parker" <randall@nls.net>) |
Ответы |
Re: Thoughts on multiple simultaneous code page support
|
Список | pgsql-hackers |
On Thu, 22 Jun 2000 11:17:14 +1000, Giles Lean wrote: > >> 1) Make the entire database Unicode >> ... >> It also makes sorting and indexing take more time. > >Mentioned in my other email, but what collation order were you >proposing to use? Binary might be OK for unique keys but that doesn't >help you for '<', '>' etc. To use Unicode on a field that can have indexes defined on it does require one single big collation order table that determines the relative order of all the characters in Unicode. Surely there must be a standard for this that is part of the Unicode spec? Or part of ISO/IEC 10646 spec? One optimization doable on this would be to allow the user to declare tothe RDBMS what subset of Unicode he is going to use. So, for instance, someone who is only handling European languages might just say he wants to use 8859-1 thru 8859-9. Or a Japanese company might throw in some more code pages but still not bring in code pages for languages for which they do not create manuals. That would make the collation table _much_ smaller. I don't know anything about the collation order of Asian character sets. My guess though is that each in toto is either greater or lesser than the various Euro pages. Though the non- shifted part of Shift-JIS would be equal to its ASCII equivalents. >My expectation (not the same as I'd like to see, necessarily, and not >that my opinion counts -- I'm not a developer) would be that each >database have a locale, and that this locale's collation order be used >for indexing, LIKE, '<', '>' etc. Characters like '<' and '>' already have standard collation orders vis a vis the other parts of ASCII. I doubt these things vary by locale. But maybe I'm wrong. >If you want to store data from >multiple human languages using a locale that has Unicode for its >character set would be appropriate/necessary. So you are saying that the same characters can have a different collation order when they appear in different locales even if they have the same encoding in all of them? If so, then Unicode is really not a locale. Its an encoding but it is not a locale. >Regards, > >Giles >
В списке pgsql-hackers по дате отправления: