Re: Mac OS: invalid byte sequence for encoding "UTF8"

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Mac OS: invalid byte sequence for encoding "UTF8"
Дата
Msg-id 17166.1455145239@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Mac OS: invalid byte sequence for encoding "UTF8"  (Larry Rosenman <ler@lerctr.org>)
Ответы Re: Mac OS: invalid byte sequence for encoding "UTF8"  (Larry Rosenman <ler@lerctr.org>)
Список pgsql-hackers
Larry Rosenman <ler@lerctr.org> writes:
> On 2016-02-10 16:19, Tom Lane wrote:
>> I looked into the OS X sources, and found that indeed you are right:
>> *scanf processes the input a byte at a time, and applies isspace() to
>> each byte separately, even when the locale is such that that's a
>> clearly insane thing to do.  Since this code was derived from FreeBSD,
>> FreeBSD has or once had the same issue.  (A look at the freebsd project
>> on github says it still does, assuming that's the authoritative repo.)
>> Not sure about other BSDen.

> Definitive FreeBSD Sources:
> https://svnweb.freebsd.org/base/

Ah, thanks for the link.  I'm not totally sure which branch is most
current, but at least on this one, it's still clearly wrong:
https://svnweb.freebsd.org/base/stable/10/lib/libc/stdio/vfscanf.c?revision=291336&view=markup
convert_string(), which handles %s, applies isspace() to individual bytes
regardless of locale.  convert_wstring(), which handles %ls, does it more
intelligently ... but as I said upthread, relying on %ls would just give
us a different set of portability problems.

It looks like Artur's patch is indeed what we need to do, along with
looking around for other *scanf() uses that are vulnerable.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Larry Rosenman
Дата:
Сообщение: Re: Mac OS: invalid byte sequence for encoding "UTF8"
Следующее
От: Larry Rosenman
Дата:
Сообщение: Re: Mac OS: invalid byte sequence for encoding "UTF8"