Обсуждение: [GENERAL] Use full text to rank results higher if they are "closer hit"
Hello,
I think it's easier to explain my question with example code:
------------------------
CREATE TABLE t ( s VARCHAR );
CREATE TABLE
INSERT INTO t VALUES ('hello'), ('hello world');
INSERT 0 2
SELECT * FROM t;
s
-------------
hello
hello world
(2 rows)
SELECT s, ts_rank(vector, query) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;
s | rank
-------------+-----------
hello | 0.0607927
hello world | 0.0607927
(2 rows)
------------------------
Here both 'hello' and 'hello world' are ranked equally highly when
searching with 'hello'. What I'm wondering is, is there a way within
postgres to have it match higher to just 'hello' than 'hello world'?
I.e. something like it slightly down-weights extraneous terms? Of course
in general I don't know the query or the field strings ahead of time.
Thanks for any help!
Cheers,
Thomas
On 14.02.2017 18:35, Thomas Nyberg wrote:
>
> Here both 'hello' and 'hello world' are ranked equally highly when
> searching with 'hello'. What I'm wondering is, is there a way within
> postgres to have it match higher to just 'hello' than 'hello world'?
> I.e. something like it slightly down-weights extraneous terms? Of course
> in general I don't know the query or the field strings ahead of time.
>
> Thanks for any help!
>
> Cheers,
> Thomas
>
>
Hello,
try the query:
SELECT s, ts_rank(vector, query) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;
s | rank
-------------+-----------
hello | 0.0607927
hello world | 0.0303964
(2 rows)
And read about **normalization** in [1]
https://www.postgresql.org/docs/current/static/textsearch-controls.html
--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
On 14.02.2017 18:57, Artur Zakirov wrote:
>
> Hello,
>
> try the query:
>
> SELECT s, ts_rank(vector, query) AS rank
> FROM t, to_tsvector(s) vector, to_tsquery('hello') query
> WHERE query @@ vector;
> s | rank
> -------------+-----------
> hello | 0.0607927
> hello world | 0.0303964
> (2 rows)
Sorry, the query is:
SELECT s, ts_rank(vector, query, 2) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;
--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
On 02/14/2017 07:35 AM, Thomas Nyberg wrote:
> Hello,
>
> I think it's easier to explain my question with example code:
>
> ------------------------
> CREATE TABLE t ( s VARCHAR );
> CREATE TABLE
>
> INSERT INTO t VALUES ('hello'), ('hello world');
> INSERT 0 2
>
> SELECT * FROM t;
> s
> -------------
> hello
> hello world
> (2 rows)
>
> SELECT s, ts_rank(vector, query) AS rank
> FROM t, to_tsvector(s) vector, to_tsquery('hello') query
> WHERE query @@ vector;
> s | rank
> -------------+-----------
> hello | 0.0607927
> hello world | 0.0607927
> (2 rows)
> ------------------------
>
> Here both 'hello' and 'hello world' are ranked equally highly when
> searching with 'hello'. What I'm wondering is, is there a way within
> postgres to have it match higher to just 'hello' than 'hello world'?
> I.e. something like it slightly down-weights extraneous terms? Of course
> in general I don't know the query or the field strings ahead of time.
Some digging around found this:
https://www.postgresql.org/docs/9.6/static/textsearch-controls.html#TEXTSEARCH-RANKING
Setting a normalization of 1:
test=# SELECT s, ts_rank(vector, query, 1) AS rank
FROM t, to_tsvector(s) vector, to_tsquery('hello') query
WHERE query @@ vector;
s | rank
-------------+-----------
hello | 0.0607927
hello world | 0.0383559
>
> Thanks for any help!
>
> Cheers,
> Thomas
>
>
--
Adrian Klaver
adrian.klaver@aklaver.com
Thomas Nyberg <tomuxiong@gmx.com> writes:
> Here both 'hello' and 'hello world' are ranked equally highly when
> searching with 'hello'. What I'm wondering is, is there a way within
> postgres to have it match higher to just 'hello' than 'hello world'?
> I.e. something like it slightly down-weights extraneous terms? Of course
> in general I don't know the query or the field strings ahead of time.
Read the documentation for ts_rank --- there's a normalization option
for that.
I'd also suggest you might prefer using ts_rank_cd, which is supposed to
penalize cases where the matching words aren't close together.
regards, tom lane
Excellent great info! To save the extra mailing list pings, thanks to _everyone_ this is exactly what I was looking for. Cheers, Thomas