On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote:
> mark@mark.mielke.cc wrote:
> > I think I've been involved in a discussion like this in the past. Was
> > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding
> > means that UTF-8 applications are at a disadvantage when using the
> > library. UTF-16 is considered more efficient to work with for everybody
> > except ASCII users. :-)
> Uh, is it? By whom? And why?
The authors of the library in question? Java? Anybody whose primary
alphabet isn't LATIN1 based? :-)
Only ASCII values store more space efficiently in UTF-8. All values
over 127 store more space efficiently using UTF-16. UTF-16 is easier
to process. UTF-8 requires too many bit checks with single character
offsets. I'm not an expert - I had this question before a year or two
ago, and read up on the ideas of experts.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness
bindthem...
http://mark.mielke.cc/