Re: [PATCHES] Postgres-6.3.2 locale patch
От | Thomas G. Lockhart |
---|---|
Тема | Re: [PATCHES] Postgres-6.3.2 locale patch |
Дата | |
Msg-id | 3576B81F.D222AD6A@alumni.caltech.edu обсуждение исходный текст |
Ответ на | Re: [PATCHES] Postgres-6.3.2 locale patch ("Jose' Soares Da Silva" <sferac@bo.nettuno.it>) |
Ответы |
Re: [PATCHES] Postgres-6.3.2 locale patch
(Oleg Broytmann <phd@comus.ru>)
Re: [PATCHES] Postgres-6.3.2 locale patch (t-ishii@sra.co.jp) |
Список | pgsql-hackers |
> > Sounds interesting idea... But before going into discussion, Let me > > make clarify what "character sets" means. > > An "encoding" is a way to represent set of charactser sets in > > computers. > > I think SQL92 uses a term "character set" as encoding. I have found the SQL92 terminology confusing, because they do not seem to make the nice clear distinction between encoding and collation sequence which you have pointed out. I suppose that there can be an issue of visual appearance of an alphabet for different locales also. afaik, SQL92 uses the term "character set" to mean an encoding with an implicit collation sequence. SQL92 allows alternate collation sequences to be specified for a "character set" when it can be made meaningful. I would propose to implement VARCHAR(length) WITH CHARACTER SET setname as a type with a type name of, for example, "VARSETNAME". This type would have the comparison functions and operators which implement collation sequences. I would propose to implement VARCHAR(length) WITH CHARACTER SET setname COLLATION collname as a type with a name of, for example, "VARCOLLNAME". For the EUC-jp encoding, "collname" could be "Korean" or "Japanese" so the type name would become "varkorean" or "varjapanese". Don't know for sure yet whether this is adequate, but other possibilities can be used if necessary. When a database is created, it can be specified with a default character set/collation sequence for the database; this would correspond to the NCHAR/NVARCHAR/NTEXT types. We could implement a SET NATIONAL CHARACTER SET = 'language'; command to determine the default character set for the session when NCHAR is used. The SQL92 technique for specifying an encoding/collation sequence in a literal string is _language 'string' so for example to specify a string in the French language (implying an encoding, collation, and representation?) you would use _FRENCH 'string' > > I would be able to help you in the Japanese part. For Chinese and > > Korean, I'm going to find volunteers in the local PostgreSQL mailing > > list I'm running if necessary. > > I may help with Italian, Spanish and Portuguese. Great, and perhaps Oleg could help test with Cyrillic (I assume I can steal code from the existing "CYR_LOCALE" blocks in the Postgres backend). > > Collation sequences for EUC_JP? How nice it would be! One of a > > problem for collation sequences for multi-byte encodings is the > > sequence might become huge. Seems you have a solution for that. > > Please let me know more details. Um, no, I just assume we can find a solution :/ I'd like to implement the infrastructure in the Postgres parser to allow multiple encodings/collations, and then see where we are. As I mentioned, this would be done for v6.4 as a transparent add-on, so that existing capabilities are not touched or damaged. Implementing everything for some European languages (with the 1-byte Latin-1 encoding?) may be easiest, but the Asian languages might be more fun :) - Tom
В списке pgsql-hackers по дате отправления: