Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters
Дата
Msg-id 20190212041819.GK1475@paquier.xyz
обсуждение исходный текст
Ответ на Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Ramanarayana <raam.soft@gmail.com>)
Ответы Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Ramanarayana <raam.soft@gmail.com>)
Список pgsql-hackers
On Tue, Feb 12, 2019 at 02:27:31AM +0530, Ramanarayana wrote:
> I tested the script in python 2.7 and it works perfect. The problem is in
> python 3.7(and may be only in windows as you were not getting the issue)
> and I was getting the following error
>
> UnicodeEncodeError: 'charmap' codec can't encode character '\u0100' in
> position 0: character maps to <undefined>
>
>  I went through the python script and found that the stdout encoding is set
> to utf-8 only  if python version is <=2.
>
> I have made the same change for python version 3 as well. Please find the
> patch for the same.Let me know if it makes sense

Isn't that because Windows encoding becomes cp1252, utf16 or such?
FWIW, on Debian SID with Python 3.7, I get the correct output, and no
diffs on HEAD.  Perhaps it would make sense to use open() on the
different files with encoding='utf-8' to avoid any kind of problems?
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Connection slots reserved for replication
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [PATCH] xlogreader: do not read a file block twice