On Fri, Sep 08, 2006 at 12:57:29PM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> >> AFAICT, most of the useful operations work on UChar, which is uint16:
> >> http://icu.sourceforge.net/apiref/icu4c/umachine_8h.html#6bb9fad572d65b30=
> > 5324ef288165e2ac
> > Oh, you're confusing UCS-2 with UTF-16,
> Ah, you're right, I did misunderstand that. However, it's still
> apparently the case that ICU works mostly with UTF16 and handles other
> encodings only via conversion to UTF16. That's a pretty serious
> mismatch with our needs --- we'll end up converting to UTF16 all the
> time. We're certainly not going to change to using UTF16 as the actual
> native string representation inside the backend, both because of the
> space penalty and incompatibility with tools like bison.
I think I've been involved in a discussion like this in the past. Was
it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding
means that UTF-8 applications are at a disadvantage when using the
library. UTF-16 is considered more efficient to work with for everybody
except ASCII users. :-)
No opinion on the matter though. Changing PostgreSQL to UTF-16 would
be an undertaking... :-)
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness
bindthem...
http://mark.mielke.cc/