Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)
Дата
Msg-id AANLkTi=bzNBFhCVc+D=L5BP+BjHKYqd9PATBA5QMQQHF@mail.gmail.com
обсуждение исходный текст
Ответ на Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
I'll try to illustrate different approaches in examples. <br /><br /><span style="font-family: courier
new,monospace;">     k  i  t  t  e  n<br />   0  1  2  3  4  5  6<br />s  1  1  2  3  4  5  6<br />i  2  2  1  2  3  4 
5<br/> t  3  3  2  1  2  3  4<br />t  4  4  3  2  1  2  3<br />i  5  5  4  3  2  2  3<br />n  6  6  5  4  3  3  2<br
/>g 7  7  6  5  4  4  3<br /><br /></span>The approach mentioned in Wikipedia limits filling of the matrix by
diagonals,which are in k-distance from main diagonal (diagonal which contain left top cell). All other cell is
guaranteedto have value greater than k. This approach can be extended to the case of costs different from 1, in this
caselimit diagonals will be closer to main diagonal proportional to the costs. Here is example of such limited matrix
withk = 3.<br /><span style="font-family: courier new,monospace;"><br />      k  i  t  t  e  n<br />   0  1  2 
3        <br />s  1  1  2  3  4      <br />i  2  2  1  2  3  4   <br />t  3  3  2  1  2  3  4<br />t     4  3  2  1  2 
3<br/>i        4  3  2  2  3<br /> n           4  3  3  2<br />g              4  4  3<br /><br /></span>The first idea
ofmy approach is to use actual cell values to limit matrix filling. For each row the range of columns where cell values
arenot greater than k is stored. And in the next row only cells are caclucated, which use values of cells from previous
rowin stored range. Here is example of this approach with k = 3. There are slightly less filled cells but calculation
aremore complicated than in previoud approach.<br /><span style="font-family: courier new,monospace;"><br />      k  i 
t t  e  n<br />   0  1  2  3         <br />s  1  1  2  3         <br />i  2  2  1  2  3      <br />t  3  3  2  1  2 
3  <br />t        3  2  1  2  3<br />i           3  2  2  3<br /> n              3  3  2<br />g                    3<br
/><br/></span>The second idea is to make values in matrix possible greater. I analyze what exactly is matrix in this
case.It is sum of original matrix, which represent distances between prefixes of strings, and matrix, which represent
costof insertions or deletions for moving to diagonal, which containing bottom right cell. There is an example of
secondmatrix summand:<br /><span style="font-family: courier new,monospace;"><br /></span><span style="font-family:
couriernew,monospace;">      k  i  t  t  e  n<br />    1  2  3  4  5  6  7<br /> s  0  1  2  3  4  5  6<br /> i  1  0 
1 2  3  4  5<br /> t  2  1  0  1  2  3  4<br /> t  3  2  1  0  1  2  3<br /> i  4  3  2  1  0  1  2<br /> n  5  4  3 
2 1  0  1<br /> g  6  5  4  3  2  1  0<br /></span><br />And an example of resulting matrix:<br /><br /><span
style="font-family:courier new,monospace;">      k  i  t  t  e  n<br />   1  3  5  7  9  11 13<br />s  1  2  4  6  8 
1012<br />i  3  2  2  4  6  8  10<br /> t  5  4  2  2  4  6  8<br />t  7  6  4  2  2  4  6<br />i  9  8  6  4  2  3 
5<br/>n  11 10 8  6  4  3  3<br />g  13 12 10 8  6  5  3<br /></span><br />The resulting matrix saves important
propertyof original matrix, that cell value always greater or equal than values, which are used for it's calculation.
That'swhy we can use idea about matrix filling limiting for this matrix, but values in this matrix are greater and it
allowto avoid filling bigger part of matrix. There is an example of matrix filling limiting for this matrix:<br /><span
style="font-family:courier new,monospace;"><br />      k  i  t  t  e  n<br />   1  3                <br />s  1 
2                <br />i     2  2               <br />t        2  2           <br />t           2  2      <br
/>i             2  3   <br /> n                 3  3<br /> g                    3<br /></span><br />----<br />With best
regards,<br/>Alexander Korotkov.<br /> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marios Vodas
Дата:
Сообщение: Re: gist access methods parameter types
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Improving prep_buildtree used in VPATH builds