Re: Re: Big 7.1 open items

Поиск
Список
Период
Сортировка
От Randall Parker
Тема Re: Re: Big 7.1 open items
Дата
Msg-id MPG.13b4559da89d333c989813@news.west.net
обсуждение исходный текст
Ответ на Re: Re: Big 7.1 open items  (Thomas Lockhart <lockhart@alumni.caltech.edu>)
Список pgsql-hackers
Thomas,

A few (hopefully relevant) comments regarding character sets, code pages, 
I18N, and all that:

1) I've seen databases (DB2 if memory serves) that allowed the client 
side to declare itself to the database back-end engine as being in a 
particular code page. For instance, one could have a CP850 Latin-1 client 
and an ISO 8859-1 database. The database engine did appropriate 
translations in both directions.  
2) Mixing code pages in a single column and then having the database 
engine support it is not trivial. Either each CHAR/VARCHAR would have to 
have some code page settable per row (eg either as a separate column or 
as something like mycolumnname.encoding).   Even if you could handle all that you'd still be faced with the issue 
is collating sequence. Each individual code page will have a collating 
sequence. But how do you collate across code pages? There'd be letters 
that were only in a single code page. Plus, it gets messy because with, 
for instance, a simple umlauted a that occurs in CP850, CP1252, and ISO 
8859-1 (and likely in other code pages as well). That letter is really 
the same letter in all those code pages and should treated as such when 
sorting. 

3) I think it is more important for a database to support lots of 
languages in the stored data than in the field names and table names. If 
a programmer has to deal with A-Za-z for naming identifiers and that 
perseon is Korean or Japanese then that is certain is an imposition on 
them. But its a far far bigger imposition if that programmer can't build 
a database that will store the letters of his national language and sort 
and index and search them in convenient ways. 

4) The real solution to the multiple code page dilemma is Unicode.   Yes, its more space. But the can of worms of
dealingwith multiple 
 
code pages in a column is really no fun and the result is not great. 
BTDTHTTS.

5) The problem with enforcing   I've built a database in DB2 where particular columns in it contained 
data from many different code pages (each row had a code page field as 
well as a text field). For some applications that is okay if that field 
is not going to be part of an index.   However, if a database is going to be defined as being in a particular 
code page, and if the database engine is going to reject characters that 
are not recognized as part of that code page then you can't play the sort 
of game I just described _unless_ there is a different datatype that is 
similar to CHAR/VARCHAR but for which the RDBMS does not enforce code 
page legality on each character. Otherwise you choose some code page for 
a column, you go merrily stuffing in all sorts of rows in all sorts of 
code pages, and then along come some character that is of a value that is 
not a value for some other character in the code page that the RDBMS 
thinks it is. 

Anyway, I've done lots of I18N database stuff and hopefully a few of my 
comments will be useful to the assembled brethren <g>.

In news:<3948E4D7.A3B722E9@alumni.caltech.edu>, 
lockhart@alumni.caltech.edu says...
> One issue: I can see (or imagine ;) how we can use the Postgres type
> system to manage multiple character sets. But allowing arbitrary
> character sets in, say, table names forces us to cope with allowing a
> mix of character sets in a single column of a system table. afaik this
> general capability is not mandated by SQL9x (the SQL_TEXT character set
> is used for all system resources??). Would it be acceptable to have a
> "default database character set" which is allowed to creep into the
> pg_xxx tables? Even that seems to be a difficult thing to accomplish at
> the moment (we'd need to get some of the text manipulation functions
> from the catalogs, not from hardcoded references as we do now).
> 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Roberto João Lopes Garcia
Дата:
Сообщение: Is this list up??
Следующее
От: Pierre-Louis Malatray
Дата:
Сообщение: ODBC driver problem ??