Re: [HACKERS] Extra Vietnamese unaccent rules

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] Extra Vietnamese unaccent rules
Дата
Msg-id CAB7nPqR5GQgJtfckvrX8+kcYd-EV3Y_+Kq_VJRBP2dFGBwDGKQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Extra Vietnamese unaccent rules  (Man Trieu <man.trieu@gmail.com>)
Ответы Re: [HACKERS] Extra Vietnamese unaccent rules  (Dang Minh Huong <kakalot49@gmail.com>)
Список pgsql-hackers
On Wed, Jun 7, 2017 at 1:06 AM, Man Trieu <man.trieu@gmail.com> wrote:
> 2017-06-07 0:31 GMT+09:00 Bruce Momjian <bruce@momjian.us>:
>>
>> On Wed, Jun  7, 2017 at 12:10:25AM +0900, Dang Minh Huong wrote:
>> > > On Jun 4, 29 Heisei, at 00:48, Bruce Momjian <bruce@momjian.us> wrote:
>> > >>>> Shouldn't you use "or is_letter_with_marks()", instead of "or
>> > >>>> len(...)
>> > >>>>> 1"?  Your test might catch something that isn't based on a
>> > >>>>> 'letter'
>> > >>>> (according to is_plain_letter).  Otherwise this looks pretty good
>> > >>>> to
>> > >>>> me.  Please add it to the next commitfest.
>> > >>>
>> > >>> Thanks for confirm, sir.
>> > >>> I will add it to the next CF soon.
>> > >>
>> > >> Sorry for lately response. I attach the update patch.
>> > >
>> > > Uh, there is no patch attached.
>> > >
>> >
>> > Sorry sir, reattach the patch.
>> > I also added it to the next CF and set reviewers to Thomas Munro. Could
>> > you confirm for me.
>>
>> There seems to be a problem.  I can't see a patch dated 2017-06-07 on
>> the commitfest page:
>>
>>         https://commitfest.postgresql.org/14/1161/
>>
>> I added the thread but there was no change.  (I think the thread was
>> already present.)  It appears it is not seeing this patch as the latest
>> patch.
>>
>> Does anyone know why this is happening?
>
> May be due to my Mac's mailer? Sorry but I try one more time to attach the
> patch by webmail.

I have finally been able to look at this patch.
(Surprised to see that generate_unaccent_rules.py is inconsistent on
MacOS, runs fine on Linux).

 def get_plain_letter(codepoint, table):
     """Return the base codepoint without marks."""
     if is_letter_with_marks(codepoint, table):
-        return table[codepoint.combining_ids[0]]
+        if len(table[codepoint.combining_ids[0]].combining_ids) > 1:
+            # Recursive to find the plain letter
+            return get_plain_letter(table[codepoint.combining_ids[0]],table)
+        elif is_plain_letter(table[codepoint.combining_ids[0]]):
+            return table[codepoint.combining_ids[0]]
+        else:
+            return None
     elif is_plain_letter(codepoint):
         return codepoint
     else:
-        raise "mu"
+        return None
The code paths returning None should not be reached, so I would
suggest adding an assertion instead. Callers of get_plain_letter would
blow up on None, still that would make future debugging harder.

 def is_letter_with_marks(codepoint, table):
-    """Returns true for plain letters combined with one or more marks."""
+    """Returns true for letters combined with one or more marks."""
     # See http://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values
     return len(codepoint.combining_ids) > 1 and \
-           is_plain_letter(table[codepoint.combining_ids[0]]) and \
+           (is_plain_letter(table[codepoint.combining_ids[0]]) or\
+            is_letter_with_marks(table[codepoint.combining_ids[0]],table))
and \
            all(is_mark(table[i]) for i in codepoint.combining_ids[1:]
This was already hard to follow, and this patch makes its harder. I
think that the thing should be refactored with multiple conditions.

             if is_letter_with_marks(codepoint, table):
-                charactersSet.add((codepoint.id,
+                if get_plain_letter(codepoint, table) <> None:
+                    charactersSet.add((codepoint.id,
This change is not necessary as a letter with marks is not a plain
character anyway.

Testing with characters having two accents, the results are produced
as wanted. I am attaching an updated patch with all those
simplifications. Thoughts?
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ryan Murphy
Дата:
Сообщение: Re: [HACKERS] CommitFest 2017-09 - How do I know what commit to applypatches to
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] Small patch for pg_basebackup argument parsing