Re: Mac OS: invalid byte sequence for encoding "UTF8"

Поиск
Список
Период
Сортировка
От Artur Zakirov
Тема Re: Mac OS: invalid byte sequence for encoding "UTF8"
Дата
Msg-id 56BB3D95.7030502@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Mac OS: invalid byte sequence for encoding "UTF8"  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Mac OS: invalid byte sequence for encoding "UTF8"  (Teodor Sigaev <teodor@sigaev.ru>)
Re: Mac OS: invalid byte sequence for encoding "UTF8"  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 09.02.2016 20:13, Tom Lane wrote:
> I do not like this patch much.  It is basically "let's stop using sscanf()
> because it seems to have a bug on one platform".  There are at least two
> things wrong with that approach:
>
> 1. By my count there are about 80 uses of *scanf() in our code.  Are we
> going to replace every one of them with hand-rolled code?  If not, why
> is only this instance vulnerable?  How can we know whether future uses
> will have a problem?

It seems that *scanf() with %s format occures only here:
- check.c - get_bin_version()
- server.c - get_major_server_version()
- filemap.c - isRelDataFile()
- pg_backup_directory.c - _LoadBlobs()
- xlog.c - do_pg_stop_backup()
- mac.c - macaddr_in()
I think here sscanf() do not works with the UTF-8 characters. And 
probably this is only spell.c issue.

I agree that previous patch is wrong. Instead of using new 
parse_ooaffentry() function maybe better to use sscanf() with %ls 
format. The %ls format is used to read a wide character string.

>
> 2. We're not being very good citizens of the software universe if we
> just install a hack in Postgres rather than nagging Apple to fix the
> bug at its true source.
>
> I think the appropriate next step to take is to dig into the OS X
> sources (see http://www.opensource.apple.com, I think probably the
> relevant code is in the Libc package) and identify exactly what is
> causing the misbehavior.  That would both allow an informed answer
> to point #1 and greatly increase the odds of getting action on a
> bug report to Apple.  Even if we end up applying this patch verbatim,
> I think we need that information first.
>
>             regards, tom lane
>

I think this is not a bug. It is a normal behavior. In Mac OS sscanf() 
with the %s format reads the string one character at a time. The size of 
letter 'х' is 2. And sscanf() separate it into two wrong characters.

In conclusion, I think in spell.c should be used sscanf() with %ls format.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thom Brown
Дата:
Сообщение: Re: Optimization for updating foreign tables in Postgres FDW
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Updated backup APIs for non-exclusive backups