Re: UTF8 or Unicode
| От | Karel Zak | 
|---|---|
| Тема | Re: UTF8 or Unicode | 
| Дата | |
| Msg-id | 1108459323.4044.171.camel@petra обсуждение исходный текст | 
| Ответ на | Re: UTF8 or Unicode (Bruce Momjian <pgman@candle.pha.pa.us>) | 
| Ответы | Re: UTF8 or Unicode | 
| Список | pgsql-hackers | 
On Mon, 2005-02-14 at 22:05 -0500, Bruce Momjian wrote: > Abhijit Menon-Sen wrote: > > At 2005-02-14 21:14:54 -0500, pgman@candle.pha.pa.us wrote: > > > > > > Should our multi-byte encoding be referred to as UTF8 or Unicode? > > > > The *encoding* should certainly be referred to as UTF-8. Unicode is a > > character set, not an encoding; Unicode characters may be encoded with > > UTF-8, among other things. > > > > (One might think of a charset as being a set of integers representing > > characters, and an encoding as specifying how those integers may be > > converted to bytes.) > > > > > I know UTF8 is a type of unicode but do we need to rename anything > > > from Unicode to UTF8? > > > > I don't know. I'll go through the documentation to see if I can find > > anything that needs changing. > > I looked at encoding.sgml and that mentions Unicode, and then UTF8 as an > acronym. I am wondering if we need to make UTF8 first and Unicode > second. Does initdb accept UTF8 as an encoding? in PG: unicode = utf8 = utf-8 Our internal routines in src/backend/utils/mb/encnames.c accept all synonyms. The "official" internal PG name for UTF-8 is "UNICODE" :-( It's historical reason that UTF8 = UNICODE, because there was "UNICODE" first. It's same like "WIN" for WIN1251 (in sources it's marked as "_dirty_ alias")... I think initdb uses pg_char_to_encoding() from src/backend/utils/mb/encnames.c and it should be accept all aliases. Karel -- Karel Zak <zakkr@zf.jcu.cz>
В списке pgsql-hackers по дате отправления: