Re: [HACKERS] Extra Vietnamese unaccent rules
От | Michael Paquier |
---|---|
Тема | Re: [HACKERS] Extra Vietnamese unaccent rules |
Дата | |
Msg-id | CAB7nPqR5GQgJtfckvrX8+kcYd-EV3Y_+Kq_VJRBP2dFGBwDGKQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] Extra Vietnamese unaccent rules (Man Trieu <man.trieu@gmail.com>) |
Ответы |
Re: [HACKERS] Extra Vietnamese unaccent rules
(Dang Minh Huong <kakalot49@gmail.com>)
|
Список | pgsql-hackers |
On Wed, Jun 7, 2017 at 1:06 AM, Man Trieu <man.trieu@gmail.com> wrote: > 2017-06-07 0:31 GMT+09:00 Bruce Momjian <bruce@momjian.us>: >> >> On Wed, Jun 7, 2017 at 12:10:25AM +0900, Dang Minh Huong wrote: >> > > On Jun 4, 29 Heisei, at 00:48, Bruce Momjian <bruce@momjian.us> wrote: >> > >>>> Shouldn't you use "or is_letter_with_marks()", instead of "or >> > >>>> len(...) >> > >>>>> 1"? Your test might catch something that isn't based on a >> > >>>>> 'letter' >> > >>>> (according to is_plain_letter). Otherwise this looks pretty good >> > >>>> to >> > >>>> me. Please add it to the next commitfest. >> > >>> >> > >>> Thanks for confirm, sir. >> > >>> I will add it to the next CF soon. >> > >> >> > >> Sorry for lately response. I attach the update patch. >> > > >> > > Uh, there is no patch attached. >> > > >> > >> > Sorry sir, reattach the patch. >> > I also added it to the next CF and set reviewers to Thomas Munro. Could >> > you confirm for me. >> >> There seems to be a problem. I can't see a patch dated 2017-06-07 on >> the commitfest page: >> >> https://commitfest.postgresql.org/14/1161/ >> >> I added the thread but there was no change. (I think the thread was >> already present.) It appears it is not seeing this patch as the latest >> patch. >> >> Does anyone know why this is happening? > > May be due to my Mac's mailer? Sorry but I try one more time to attach the > patch by webmail. I have finally been able to look at this patch. (Surprised to see that generate_unaccent_rules.py is inconsistent on MacOS, runs fine on Linux). def get_plain_letter(codepoint, table): """Return the base codepoint without marks.""" if is_letter_with_marks(codepoint, table): - return table[codepoint.combining_ids[0]] + if len(table[codepoint.combining_ids[0]].combining_ids) > 1: + # Recursive to find the plain letter + return get_plain_letter(table[codepoint.combining_ids[0]],table) + elif is_plain_letter(table[codepoint.combining_ids[0]]): + return table[codepoint.combining_ids[0]] + else: + return None elif is_plain_letter(codepoint): return codepoint else: - raise "mu" + return None The code paths returning None should not be reached, so I would suggest adding an assertion instead. Callers of get_plain_letter would blow up on None, still that would make future debugging harder. def is_letter_with_marks(codepoint, table): - """Returns true for plain letters combined with one or more marks.""" + """Returns true for letters combined with one or more marks.""" # See http://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values return len(codepoint.combining_ids) > 1 and \ - is_plain_letter(table[codepoint.combining_ids[0]]) and \ + (is_plain_letter(table[codepoint.combining_ids[0]]) or\ + is_letter_with_marks(table[codepoint.combining_ids[0]],table)) and \ all(is_mark(table[i]) for i in codepoint.combining_ids[1:] This was already hard to follow, and this patch makes its harder. I think that the thing should be refactored with multiple conditions. if is_letter_with_marks(codepoint, table): - charactersSet.add((codepoint.id, + if get_plain_letter(codepoint, table) <> None: + charactersSet.add((codepoint.id, This change is not necessary as a letter with marks is not a plain character anyway. Testing with characters having two accents, the results are produced as wanted. I am attaching an updated patch with all those simplifications. Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Ryan MurphyДата:
Сообщение: Re: [HACKERS] CommitFest 2017-09 - How do I know what commit to applypatches to
Следующее
От: Michael PaquierДата:
Сообщение: Re: [HACKERS] Small patch for pg_basebackup argument parsing