> But how are you going to tell a genuine "type" from a character set? And
> you might have to have three types for each charset. There'd be a lot of
> redundancy and confusion regarding the input and output functions and
> other pg_type attributes. No doubt there's something to be learned from
> the type system, but character sets have different properties -- like
> characters(!), collation rules, encoding "translations" and what not.
> There is no doubt also need for different error handling. So I think that
> just dumping every character set into pg_type is not a good idea. That's
> almost equivalent to having separate types for char(6), char(7), etc.
>
> Instead, I'd suggest that character sets become separate objects. A
> character entity would carry around its character set in its header
> somehow. Consider a string concatenation function, being invoked with two
> arguments of the same exotic character set. Using the type system only
> you'd have to either provide a function signature for all combinations of
> characters sets or you'd have to cast them up to SQL_TEXT, concatenate
> them and cast them back to the original charset. A smarter concatentation
> function instead might notice that both arguments are of the same
> character set and simply paste them together right there.
Intersting idea. But what about collations? SQL allows to assign a
collation different from the default one to a character set on the
fly. Should we make collations as separate obejcts as well?
> Here are a couple of "items" I keep wondering about:
>
> * To what extend would we be able to use the operating systems locale
> facilities? Besides the fact that some systems are deficient or broken one
> way or another, POSIX really doesn't provide much besides "given two
> strings, which one is greater", and then only on a per-process basis.
> We'd really need more that, see also LIKE indexing issues, and indexing in
> general.
Correct. I'd suggest completely getting ride of OS's locale.
> * Client support: A lot of language environments provide pretty smooth
> Unicode support these days, e.g., Java, Perl 5.6, and I think that C99 has
> also made some strides. So while "we can store stuff in any character set
> you want" is great, it's really no good if it doesn't work transparently
> with the client interfaces. At least something to keep in mind.
Do you suggest that we should convert everyting into Unicode and store
them into DB?
--
Tatsuo Ishii