Обсуждение: BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results
BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results
От
PG Bug reporting form
Дата:
The following bug has been logged on the website:
Bug reference: 18654
Logged by: bjdev
Email address: bjdev.gthb@laposte.net
PostgreSQL version: 15.4
Operating system: Ubuntu 22.04.5 LTS
Description:
Hi,
The extension fuzzystrmatch propose an implementation of levenshtein
function.
There is one version with costs parameters
levenshtein(text source, text target, int ins_cost, int del_cost, int
sub_cost) returns int
But if we use this function with parameters other than 1 (the default) the
result is incorrect
SELECT levenshtein('horses','shorse',1,1,1) => 2 (correct)
SELECT levenshtein('horses','shorse',100,10,1) => 101 (INCORRECT)
The correct result is 6 (all the letter have to be substitute and it's not
possible to have a lower score with others operations)
Here, it's easy to verify manually but you can check that using python
implementation
from Levenshtein import distance
distance("horses","shorse",weights=(100,10,1))
# => 6
SELECT levenshtein('horses','shorse',1,10,100) => 12 (INCORRECT)
The correct result is 11 (insert "s" first (+1) and remove last "s"(+10)
Here, it's easy to verify manually but you can check that using python
implementation
from Levenshtein import distance
distance("horses","shorse",weights=(1,10,100))
# => 11
SELECT levenshtein('horses','shorse',1,10,1) => 2 (INCORRECT)
The correct result is 6
you can check that using python implementation
from Levenshtein import distance
distance("horses","shorse",weights=(1,10,1))
# => 6
The use of cost parameters of the levenshtein function is therefore not
possible, which is a shame.
Regards
PG Bug reporting form <noreply@postgresql.org> writes:
> But if we use this function with parameters other than 1 (the default) the
> result is incorrect
> SELECT levenshtein('horses','shorse',1,1,1) => 2 (correct)
> SELECT levenshtein('horses','shorse',100,10,1) => 101 (INCORRECT)
> The correct result is 6 (all the letter have to be substitute and it's not
> possible to have a lower score with others operations)
Hmm, works for me:
u8=# create extension fuzzystrmatch;
CREATE EXTENSION
u8=# SELECT levenshtein('horses','shorse',1,1,1);
levenshtein
-------------
2
(1 row)
u8=# SELECT levenshtein('horses','shorse',100,10,1);
levenshtein
-------------
6
(1 row)
I confess bafflement about why you're getting wrong answers.
You seem to be using a slightly out of date Postgres, but
none of this code has changed meaningfully since about 2016.
Maybe you hit a compiler bug? Where did you get this copy
of Postgres from --- or if you built it yourself, what build
options did you use?
regards, tom lane
Re: BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results
От
bjdev.gthb@laposte.net
Дата:
Hi Tom,
Thank you for this feedback,
Indeed, it works correctly. The problem comes from the other side of the keyboard...
I went a little fast and did not check the length of my input strings which contained "invisible" characters.
After cleaning up the results obtained are as expected.
Really sorry for this false alarm
Regards
Thank you for this feedback,
Indeed, it works correctly. The problem comes from the other side of the keyboard...
I went a little fast and did not check the length of my input strings which contained "invisible" characters.
After cleaning up the results obtained are as expected.
Really sorry for this false alarm
Regards
Hmm, works for me:
u8=# create extension fuzzystrmatch;
CREATE EXTENSION
u8=# SELECT levenshtein('horses','shorse',1,1,1);
levenshtein
-------------
2
(1 row)
u8=# SELECT levenshtein('horses','shorse',100,10,1);
levenshtein
-------------
6
(1 row)
I confess bafflement about why you're getting wrong answers.
You seem to be using a slightly out of date Postgres, but
none of this code has changed meaningfully since about 2016.
Maybe you hit a compiler bug? Where did you get this copy
of Postgres from --- or if you built it yourself, what build
options did you use?
regards, tom lane
u8=# create extension fuzzystrmatch;
CREATE EXTENSION
u8=# SELECT levenshtein('horses','shorse',1,1,1);
levenshtein
-------------
2
(1 row)
u8=# SELECT levenshtein('horses','shorse',100,10,1);
levenshtein
-------------
6
(1 row)
I confess bafflement about why you're getting wrong answers.
You seem to be using a slightly out of date Postgres, but
none of this code has changed meaningfully since about 2016.
Maybe you hit a compiler bug? Where did you get this copy
of Postgres from --- or if you built it yourself, what build
options did you use?
regards, tom lane