Consider Spaces in pg_trgm for Better Similarity

Поиск

Список

Период

Сортировка

От	Igal @ Lucee.org
Тема	Consider Spaces in pg_trgm for Better Similarity
Дата	29 января 2018 г. 11:56:26
Msg-id	fb93cee1-6020-1b8c-4dad-e7f9741db497@lucee.org обсуждение исходный текст
Список	pgsql-general

Дерево обсуждения

Is there a way to consider white space in tri-grams? That would allow for better matches of phrases.

For example, currently "one two three" and "three two one" would generate the same tri-grams ({ o, t, on, th, tw,ee ,hre,ne ,one,ree,thr,two,wo }), and the distance of "one two four" will be the same for both of them. The query:

SELECT phrase ,input ,similarity(t1.phrase, t2.input) ,word_similarity(t1.phrase, t2.input)FROM (values('one two three'),('three two one')) t1(phrase) ,(values('one two four')) t2(input);

Returns:

phrase |input |similarity |word_similarity |--------------|-------------|------------|----------------|one two three |one two four |0.444444448 |0.615384638 |three two one |one two four |0.444444448 |0.615384638 |

But surely "one two four" is more similar to "one two three" than to "three two one".

Any thoughts?

Igal Sapir
Lucee Core Developer
Lucee.org

В списке pgsql-general по дате отправления:

Предыдущее

От: Thiemo Kellner
Дата: 29 января 2018 г., 11:03:06
Сообщение: Re: FW: Setting up streaming replication problems

Следующее

От: Rob Sargent
Дата: 29 января 2018 г., 12:02:40
Сообщение: Re: Downsides of liberally using CREATE TEMP TABLE ... ON COMMIT DROP

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Consider Spaces in pg_trgm for Better Similarity

Предыдущее

Следующее