Re: multibyte charater set in levenshtein function

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: multibyte charater set in levenshtein function
Дата	21 июля 2010 г. 11:44:05
Msg-id	AANLkTiltvmu_iNzyl5ZHBKrdSgVA2yNoDpdgZfy9wOxA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: multibyte charater set in levenshtein function (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: multibyte charater set in levenshtein function
Список	pgsql-hackers

Дерево обсуждения

On Wed, Jul 21, 2010 at 5:54 AM, Robert Haas <robertmhaas@gmail.com> wrote:

This patch still needs some work. It includes a bunch of stylistic
changes that aren't relevant to the purpose of the patch. There's no
reason that I can see to change the existing levenshtein_internal
function to take text arguments instead of char *, or to change !m to
m == 0 in existing code, or to change the whitespace in the comments
of that function. All of those need to be reverted before we can
consider committing this.

I changed arguments of function from char * to text * in order to avoid text_to_cstring call. Same benefit can be achived by replacing char * with char * and length.
I changed !m to m == 0 because Itagaki asked me to make it conforming coding style. Do you think there is no reason to fix coding style in existing code?

There is a huge amount of duplicated code here. I think it makes
sense to have the multibyte version of the function be separate, but
perhaps we can merge the less-than-or-equal to bits into the main
code, so that we only have two copies instead of four. Perhaps we
can't just add a max_d argument max_distance to levenshtein_internal;
and if this value is >=0 then it represents the max allowable
distance, but if it is <0 then there is no limit. Sure, that might
slow down the existing code a bit, but it might not be significant.
I'd at least like to see some numbers showing that it is significant
before we go to this much trouble.

In these case we should add many checks of max_d in levenshtein_internal function which make code more complex.
Actually, we can merge all four functions into one function. But such function will have many checks about multibyte encoding and max_d. So, I see four cases here:
1) one function with checks for multibyte encoding and max_d
2) two functions with checks for multibyte encoding
3) two functions with checks for max_d
4) four separate functions
If you prefer case number 3 you should argue your position little more.

The code doesn't follow the project coding style. Braces must be
uncuddled. Comment paragraphs will be reflowed unless they begin and
end with ------. Function definitions should have the type
declaration on one line and the function name at the start of the
next.

Freeing memory with pfree is likely a waste of time; is there any
reason not to just rely on the memory context reset, as the original
coding did?

Ok, I'll fix this things.

I think we might need to remove the acknowledgments section from this
code. If everyone who touches this code adds their name, we're
quickly going to have a mess. If we're not going to remove the
acknowledgments section, then please add my name, too, because I've
already patched this code once...

In that case I think we can leave original acknowledgments section.

----
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: multibyte charater set in levenshtein function