On 7/2/06, Agent M <agentm@themactionfaction.com> wrote:
> Certain Japanese characters cannot make a reliable round-trip through
> Unicode. ICU uses UTF-16 as its store, so the Japanese folks won't be
> happy with an ICU-only solution. However, it would still be of great
Could you explain what you mean and what's special with those characters?
> benefit to allow ICU to handle as much as possible, leaving the string
> encodings to the encoding experts.
>
> At the very least, it would be great to have ICU to handle encoding on
> a per-column basis (perhaps extending the text datatype with encoding
> info). Perhaps this would be a decent stopgap solution? The backend
> protocol would also need a version bump- currently, it converts all
> strings to a single encoding.
Could you give an example of what that would look like in your opinion?
I was thinking more along the lines of a setting in pg_hba.conf where
the server uses or does not use something like ICU...at least as an
intermediate solution.
Adding a "LOCALE" clause to a column definition (similar to the
"ENCODING" clause of the "CREATE DATABASE" statement) would solve most
(not all) problems with a default locale.
There still might be some non-deterministic behaviour with operations
between strings in different locales but it's far from a showstopper.
t.n.a.