Re: BUG #15548: Unaccent does not remove combining diacritical characters

Поиск
Список
Период
Сортировка
От Ramanarayana
Тема Re: BUG #15548: Unaccent does not remove combining diacritical characters
Дата
Msg-id CAKm4Xs7CBuCW_XQtrVX6ThwSMiL0WK7Cj3nZx2Jymb9eJ=YdMQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Hugh Ranalli <hugh@whtc.ca>)
Ответы Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters  (Michael Paquier <michael@paquier.xyz>)
Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Hugh Ranalli <hugh@whtc.ca>)
Список pgsql-hackers
Hi Hugh,

I tested the script in python 2.7 and it works perfect. The problem is in python 3.7(and may be only in windows as you were not getting the issue) and I was getting the following error 

UnicodeEncodeError: 'charmap' codec can't encode character '\u0100' in position 0: character maps to <undefined>

 I went through the python script and found that the stdout encoding is set to utf-8 only  if python version is <=2. 

I have made the same change for python version 3 as well. Please find the patch for the same.Let me know if it makes sense

Regards,
Ram.

On Tue, 12 Feb 2019 at 00:50, Hugh Ranalli <hugh@whtc.ca> wrote:

On Sun, 10 Feb 2019 at 15:07, raam narayana <raam.soft@gmail.com> wrote:
Hi,

After the latest commit in master branch, I was trying to test the python script. Ironically I still see that the output from the script is completely different from the unaccent.rules file content. Am I missing anything.My testing includes the following

Downloaded the following files

http://unicode.org/Public/8.0.0/ucd/UnicodeData.txt

http://unicode.org/cldr/trac/export/14746/tags/release-34/common/transforms/Latin-ASCII.xml

Executed the below python script

python generate_unaccent_rules.py --unicode-data-file UnicodeData.txt --latin-ascii-file  Latin-ASCII.xml > unaccent.rules

I am using python 3.7.1 and running on Windows 10 Platform

The new status of this patch is: Needs review

Hi Raam,
I just ran generate_unaccent_rules.py under two environments, using the data files given above :
  - Python 3.4.3  on Linux Mint 17.3 (equivalent to Ubuntu 14.04)
  - Python 3.6.7 on Ubuntu 18.04

In both cases, the output was identical to that generated by the program under Python 2.7. So yes, more information would help. Unfortunately I don't have a Windows Python environment readily available, but could set one up if I had to.

Thanks,
Hugh


--
Cheers
Ram 4.0
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Следующее
От: Julien Rouhaud
Дата:
Сообщение: Re: Inadequate executor locking of indexes