Re: multibyte charater set in levenshtein function

Поиск
Список
Период
Сортировка
От Itagaki Takahiro
Тема Re: multibyte charater set in levenshtein function
Дата
Msg-id AANLkTiksZIMvgthlIbu5S30OXA3Q2QnUNM0XUAr9ESQ_@mail.gmail.com
обсуждение исходный текст
Ответ на multibyte charater set in levenshtein function  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: multibyte charater set in levenshtein function  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
Hi, I'm reviewing "Multibyte charater set in levenshtein function" patch.
https://commitfest.postgresql.org/action/patch_view?id=304

The main logic seems to be good, but I have some comments about
the coding style and refactoring.

* levenshtein_internal() and levenshtein_less_equal_internal() are very similar. Can you merge the code? We can always
useless_equal_internal() if the overhead is ignorable. Did you compare them? 

* There are many "if (!multibyte_encoding)" in levenshtein_internal(). How about split the function into two funcs for
single-bytechars and multi-byte chars? (ex. levenshtein_internal_mb() ) Or, we can always use multi-byte version if the
overheadis small. 

* I prefer a struct rather than an array.  "4 * m" and "3 * m" look like magic numbers for me. Could you name the
entrieswith definition of a struct?   /*    * For multibyte encoding we'll also store array of lengths of    *
charactersand array with character offsets in first string    * in order to avoid great number of pg_mblen calls.    */
 prev = (int *) palloc(4 * m * sizeof(int)); 

* There are some compiler warnings. Avoid them if possible.
fuzzystrmatch.c: In function ‘levenshtein_less_equal_internal’:
fuzzystrmatch.c:428: warning: ‘char_lens’ may be used uninitialized in
this function
fuzzystrmatch.c:428: warning: ‘offsets’ may be used uninitialized in
this function
fuzzystrmatch.c:430: warning: ‘curr_right’ may be used uninitialized
in this function
fuzzystrmatch.c: In function ‘levenshtein_internal’:
fuzzystrmatch.c:222: warning: ‘char_lens’ may be used uninitialized in
this function

* Coding style: Use "if (m == 0)" instead of "if (!m)" when the type
of 'm' is an integer.

* Need to fix the caution in docs.
http://developer.postgresql.org/pgdocs/postgres/fuzzystrmatch.html
| Caution: At present, fuzzystrmatch does not work well with
| multi-byte encodings (such as UTF-8).
but now levenshtein supports multi-byte encoding!  We should
mention which function supports mbchars not to confuse users.

* (Not an issue for the patch, but...) Could you rewrite PG_GETARG_TEXT_P, VARDATA, and VARSIZE to PG_GETARG_TEXT_PP,
VARDATA_ANY,and VARSIZE_ANY_EXHDR? Unpacking versions make the core a bit faster. 

--
Itagaki Takahiro


В списке pgsql-hackers по дате отправления:

Предыдущее
От: KaiGai Kohei
Дата:
Сообщение: Re: Reworks of DML permission checks
Следующее
От: Tom Lane
Дата:
Сообщение: WIP patch: pass outer-relation Vars as parameters to indexscans