Re: [HACKERS] Extra Vietnamese unaccent rules

Поиск
Список
Период
Сортировка
От Dang Minh Huong
Тема Re: [HACKERS] Extra Vietnamese unaccent rules
Дата
Msg-id 7a813796-80c5-aa93-8772-bbddf2f6a10f@gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Extra Vietnamese unaccent rules  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: [HACKERS] Extra Vietnamese unaccent rules  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 2017/07/05 15:28, Michael Paquier wrote:
> I have finally been able to look at this patch.

Thanks for reviewing and the new version of the patch.
> (Surprised to see that generate_unaccent_rules.py is inconsistent on
> MacOS, runs fine on Linux).
>
>   def get_plain_letter(codepoint, table):
>       """Return the base codepoint without marks."""
>       if is_letter_with_marks(codepoint, table):
> -        return table[codepoint.combining_ids[0]]
> +        if len(table[codepoint.combining_ids[0]].combining_ids) > 1:
> +            # Recursive to find the plain letter
> +            return get_plain_letter(table[codepoint.combining_ids[0]],table)
> +        elif is_plain_letter(table[codepoint.combining_ids[0]]):
> +            return table[codepoint.combining_ids[0]]
> +        else:
> +            return None
>       elif is_plain_letter(codepoint):
>           return codepoint
>       else:
> -        raise "mu"
> +        return None
> The code paths returning None should not be reached, so I would
> suggest adding an assertion instead. Callers of get_plain_letter would
> blow up on None, still that would make future debugging harder.
>
>   def is_letter_with_marks(codepoint, table):
> -    """Returns true for plain letters combined with one or more marks."""
> +    """Returns true for letters combined with one or more marks."""
>       # See http://www.unicode.org/reports/tr44/tr44-14.html#General_Category_Values
>       return len(codepoint.combining_ids) > 1 and \
> -           is_plain_letter(table[codepoint.combining_ids[0]]) and \
> +           (is_plain_letter(table[codepoint.combining_ids[0]]) or\
> +            is_letter_with_marks(table[codepoint.combining_ids[0]],table))
> and \
>              all(is_mark(table[i]) for i in codepoint.combining_ids[1:]
> This was already hard to follow, and this patch makes its harder. I
> think that the thing should be refactored with multiple conditions.
>
>               if is_letter_with_marks(codepoint, table):
> -                charactersSet.add((codepoint.id,
> +                if get_plain_letter(codepoint, table) <> None:
> +                    charactersSet.add((codepoint.id,
> This change is not necessary as a letter with marks is not a plain
> character anyway.
>
> Testing with characters having two accents, the results are produced
> as wanted. I am attaching an updated patch with all those
> simplifications. Thoughts?

Thanks, so pretty. The patch is fine to me.

---
Thanks and best regards,
Dang Minh Huong

---
このEメールはアバスト アンチウイルスによりウイルススキャンされています。
https://www.avast.com/antivirus




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ryan Murphy
Дата:
Сообщение: [HACKERS] Re: Error-like LOG when connecting with SSL for passwordauthentication
Следующее
От: Teodor Sigaev
Дата:
Сообщение: [HACKERS] Suspicious place in heap_prepare_freeze_tuple()