Re: Mac OS: invalid byte sequence for encoding "UTF8"
От | Tom Lane |
---|---|
Тема | Re: Mac OS: invalid byte sequence for encoding "UTF8" |
Дата | |
Msg-id | 17166.1455145239@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Mac OS: invalid byte sequence for encoding "UTF8" (Larry Rosenman <ler@lerctr.org>) |
Ответы |
Re: Mac OS: invalid byte sequence for encoding "UTF8"
|
Список | pgsql-hackers |
Larry Rosenman <ler@lerctr.org> writes: > On 2016-02-10 16:19, Tom Lane wrote: >> I looked into the OS X sources, and found that indeed you are right: >> *scanf processes the input a byte at a time, and applies isspace() to >> each byte separately, even when the locale is such that that's a >> clearly insane thing to do. Since this code was derived from FreeBSD, >> FreeBSD has or once had the same issue. (A look at the freebsd project >> on github says it still does, assuming that's the authoritative repo.) >> Not sure about other BSDen. > Definitive FreeBSD Sources: > https://svnweb.freebsd.org/base/ Ah, thanks for the link. I'm not totally sure which branch is most current, but at least on this one, it's still clearly wrong: https://svnweb.freebsd.org/base/stable/10/lib/libc/stdio/vfscanf.c?revision=291336&view=markup convert_string(), which handles %s, applies isspace() to individual bytes regardless of locale. convert_wstring(), which handles %ls, does it more intelligently ... but as I said upthread, relying on %ls would just give us a different set of portability problems. It looks like Artur's patch is indeed what we need to do, along with looking around for other *scanf() uses that are vulnerable. regards, tom lane
В списке pgsql-hackers по дате отправления: