Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence

Поиск
Список
Период
Сортировка
От Steven Schlansker
Тема Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence
Дата
Msg-id 34C92DEC-CD89-403C-BB6D-B21012233F0F@trumpet.io
обсуждение исходный текст
Ответ на Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Aug 19, 2010, at 3:24 PM, Tom Lane wrote:
> Steven Schlansker <steven@trumpet.io> writes:
>> 
>> I'm not at all experienced with character encodings so I could
>> be totally off base, but isn't it wrong to ever call isspace(0x85), 
>> whatever the result may be, given that the actual character is 0xCF85?
>> (U+03C5, GREEK SMALL LETTER UPSILON)
> 
> We generally assume that in server-safe encodings, the ctype.h functions
> will behave sanely on any single-byte value.  You can argue the wisdom
> of that, but deciding to change that policy would be a rather massive
> code change; I'm not excited about going that direction.

Fair enough.  I presume there are no "server-safe encodings" for which
a multibyte sequence 0x XX20 would be valid - which would break anyway
(as the second byte looks like a real space)

> You need a setlocale() call, else the program acts as though it's in C
> locale regardless of environment.

Sigh.  I hate C sometimes. :-p

Anyway, it looks like this is actually a BSD bug which got copy +
pasted into Apple's Darwin source -

http://lists.freebsd.org/pipermail/freebsd-i18n/2007-September/000157.html

I have a couple of contacts at Apple so I'll see if there's any interest in
backporting a fix, but I wouldn't hope for it to happen quickly if at all...

Thanks for taking a look into fixing this, I hope you guys can reach
consensus on how to get it fixed :)

Best,
Steven Schlansker


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Steven Schlansker
Дата:
Сообщение: Re: [BUGS] COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence
Следующее
От: tomas@tuxteam.de
Дата:
Сообщение: Re: CommitFest 2009-07: Yay, Kevin! Thanks, reviewers!