Permute underscore separated components of columns before fuzzy matching

Поиск
Список
Период
Сортировка
От Arne Roland
Тема Permute underscore separated components of columns before fuzzy matching
Дата
Msg-id 555aa9ea1043489bae7a41962877bd04@index.de
обсуждение исходный текст
Ответы Re: Permute underscore separated components of columns before fuzzy matching  (Mikhail Gribkov <youzhick@gmail.com>)
Список pgsql-hackers
Hello,

we have the great fuzzy string match, that comes up with suggestions in the case of a typo of a column name.

Since underscores are the de facto standard of separating words, it would also make sense to also generate suggestions, if the order of words gets mixed up. Example: If the user types timstamp_entry instead of entry_timestamp the suggestion shows up.

The attached patch does that for up to three segments, that are separated by underscores. The permutation of two segments is treated the same way a wrongly typed char would be.

The permutation is skipped, if the typed column name contains more than 6 underscores to prevent a meaningful (measured on my development machine) slowdown, if the user types to many underscores. In terms of underscores m and the length of the individual strings n_att and n_col the trivial upper bound is O(n_att * n_col * m^2). Considering, that strings with a lot of underscores have a bigger likelihood of being long as well, I simply decided to add it. I still wonder a bit whether it should be disabled entirely (as this patch does) or only the swap-three sections part as the rest would bound by O(n_att * n_col * m). But the utility of only swapping two sections seems a bit dubious to me, if I have 7 or more of them.

To me this patch seems simple (if string handling in C can be called that way) and self contained. Despite my calculations above, it resides in a non performance critical piece of code. I think of it as a quality of life thing.
Let me know what you think. Thank you!

Regards
Arne

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: [PATCH] Expand character set for ltree labels
Следующее
От: Jacob Champion
Дата:
Сообщение: Re: RFC: logical publication via inheritance root?