Обсуждение: Ranking search results using multiple fields in PostgreSQL fulltext search

Поиск
Список
Период
Сортировка

Ranking search results using multiple fields in PostgreSQL fulltext search

От
Gaini Rajeshwar
Дата:
Hi,
is there a way to rank the search results based on multiple fields in postgreSQL?
For example,
i have title, abstract, summary, body as fields/columns in my database. When user searches on title, i want to rank the results based on title field as well as summary field, where importance(summary) > importance(title). But the results should be exactly matching the terms in "title" rather than "title" OR "summary"


Re: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Raymond O'Donnell
Дата:
On 12/10/2009 14:16, Gaini Rajeshwar wrote:
> Hi,
> is there a way to rank the search results based on multiple fields in
> postgreSQL?
> For example,
> i have *title*, *abstract*, *summary*, *body* as fields/columns in my
> database. When user searches on *title*, i want to rank the results based on
> *title* field as well as *summary* field, where importance(summary) >
> importance(title). But the results should be exactly matching the terms in
> "title" rather than "title" OR "summary"
>

Hoe do you define "importance"?

You can order the results alphabetically/numerically on as many columns
as you like - for example,

    select ....
    where title = ....
    order by title, summary

will order the results on title first, then summary... though you
probably know this and I'm misunderstanding what you need.

A little more detail will help.

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

Re: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Raymond O'Donnell
Дата:
On 12/10/2009 14:23, Raymond O'Donnell wrote:

> Hoe do you define "importance"?

Whoops... for "Hoe" read "How".... :-)

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

Re: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Gaini Rajeshwar
Дата:
I think you misunderstood my question. let me give clear idea about what i need.

I am using PostgreSQL fulltext search (tsearch2) feature to implement searching on database. From readings i came to know that we can give weights to different fields in database something like this:

setweight(to_tsvector(title),'A')

Where 'A' is weight given to field title. i can give weights to other fields in the same way. Where the weights 'A', 'B', 'C', 'D' are in will be in the following order A > B > C > D according to defalut fulltext search configuration.

We can rank the search results using ts_rank function something like this,

ts_rank(tsv_title,ts_query('this is my search text'))

But, i want to rank these reults not only based on just title, but also using other fields like summary etc.
Is there a way around to do this?

On Mon, Oct 12, 2009 at 6:53 PM, Raymond O'Donnell <rod@iol.ie> wrote:
On 12/10/2009 14:16, Gaini Rajeshwar wrote:
> Hi,
> is there a way to rank the search results based on multiple fields in
> postgreSQL?
> For example,
> i have *title*, *abstract*, *summary*, *body* as fields/columns in my
> database. When user searches on *title*, i want to rank the results based on
> *title* field as well as *summary* field, where importance(summary) >
> importance(title). But the results should be exactly matching the terms in
> "title" rather than "title" OR "summary"
>

Hoe do you define "importance"?

You can order the results alphabetically/numerically on as many columns
as you like - for example,

   select ....
   where title = ....
   order by title, summary

will order the results on title first, then summary... though you
probably know this and I'm misunderstanding what you need.

A little more detail will help.

Ray.

------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------

Re: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Ivan Sergio Borgonovo
Дата:
On Mon, 12 Oct 2009 18:46:02 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:

> Hi,
> is there a way to rank the search results based on multiple fields
> in postgreSQL?
> For example,
> i have *title*, *abstract*, *summary*, *body* as fields/columns in
> my database. When user searches on *title*, i want to rank the
> results based on *title* field as well as *summary* field, where
> importance(summary) > importance(title). But the results should be
> exactly matching the terms in "title" rather than "title" OR
> "summary"

http://www.postgresql.org/docs/current/interactive/textsearch-controls.html

Basically, as you can read in the docs:
- you create a ts_vector concatenating and giving a weight the
  various fields.
- then you compare your ts_vector with
  plainto_tsquery(config, yourinput) @@
  yourpreviouslycomputedts_vector

and order by ts_rank(yourpreviouslycomputedts_vector, yourinput)
(or ts_rank_cd)


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Ivan Sergio Borgonovo
Дата:
On Mon, 12 Oct 2009 19:11:01 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:

> I think you misunderstood my question. let me give clear idea
> about what i need.
>
> I am using PostgreSQL fulltext search (tsearch2) feature to
> implement searching on database. From readings i came to know that
> we can give weights to different fields in database something like
> this:
>
> *setweight(to_tsvector(title),'A')*
>
> Where 'A' is weight given to field title. i can give weights to
> other fields in the same way. Where the weights 'A', 'B', 'C', 'D'
> are in will be in the following order *A > B > C > D* according to
> defalut fulltext search configuration.
>
> We can rank the search results using ts_rank function something
> like this,
>
> *ts_rank(tsv_title,ts_query('this is my search text'))*
> **
> But, i want to rank these reults not only based on just title, but
> also using other fields like summary etc.
> Is there a way around to do this?

if you concatenate your fields with different weight in the *same*
ts_vector, ranking will take into account your weight...

Someone more knowledgeable than me chose how to use weight to give a
reasonable ranking.

Of course if you've field a, b and c and you want to search in a and
b only, you'll have to concatenate just a and b.

If you need different assortment in fields groups... you'll have to
add some extra redundancy if you plan to store precomputed
ts_vectors for each record.

If you need to search "separately" in different fields
(eg. title ~ 'gino' AND summary ~ 'pino')
you just need to weight the input query as well

inputquery := setweight(cfg, inputtitle, 'A', '&');
inputquery := inputquery && setweight(cfg, inputsummary, 'B', '&');

...



--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Gaini Rajeshwar
Дата:
Ivan,
If i create a tsvector as you mentioned with concatenation operator, my search query will search in any of these fields which are concatenated in my tsvector.
For example, if i create tsvector like this,
UPDATE document_table SET search_col = 
  setweight(to_tsvector(coalesce(title,'')), 'A') ||
  setweight(to_tsvector(coalesce(summary,'')), 'B'));

and do a query like this
select title, ts_rank(search_col, to_tsquery('this is my text search') AS rank
FROM search_col @@ to_tsvector('this & is & my & text & search')
ORDER BY rank DESC
the above query will search in title and summary and will give me the results. But i dont want in that way.When a user wants to search in title, it should just search in title but the results should be ranked based on title and summary field.
On Mon, Oct 12, 2009 at 7:16 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Mon, 12 Oct 2009 18:46:02 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:

> Hi,
> is there a way to rank the search results based on multiple fields
> in postgreSQL?
> For example,
> i have *title*, *abstract*, *summary*, *body* as fields/columns in
> my database. When user searches on *title*, i want to rank the
> results based on *title* field as well as *summary* field, where
> importance(summary) > importance(title). But the results should be
> exactly matching the terms in "title" rather than "title" OR
> "summary"

http://www.postgresql.org/docs/current/interactive/textsearch-controls.html

Basically, as you can read in the docs:
- you create a ts_vector concatenating and giving a weight the
 various fields.
- then you compare your ts_vector with
 plainto_tsquery(config, yourinput) @@
 yourpreviouslycomputedts_vector

and order by ts_rank(yourpreviouslycomputedts_vector, yourinput)
(or ts_rank_cd)


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Ivan Sergio Borgonovo
Дата:
On Mon, 12 Oct 2009 19:26:55 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:

> Ivan,
> If i create a tsvector as you mentioned with concatenation
> operator, my search query will search in any of these fields which
> are concatenated in my tsvector.
> For example, if i create tsvector like this,
> UPDATE document_table SET search_col =
>   setweight(to_tsvector(coalesce(title,'')), 'A') ||
>   setweight(to_tsvector(coalesce(summary,'')), 'B'));
>
> and do a query like this
> select title, ts_rank(search_col, to_tsquery('this is my text
> search') AS rank
> FROM search_col @@ to_tsvector('this & is & my & text & search')
> ORDER BY rank DESC
> the above query will search in title and summary and will give me
> the results. But i dont want in that way.When a user wants to
> search in title, it should just search in title but the results
> should be ranked based on * title* and *summary* field.

Search *just* in title specifying the weight in the input query and
rank on title and summary.

/*
-- somewhere else in your code...
search_col := setweight(cfg, title, 'A', '&');
search_col := search_col && setweight(cfg, summary, 'B', '&');
*/


select rank(search_col, to_tsquery(inputtitle)) as rank
-- rank on both if search_col just contains title and summary
...
where search_col @@ setweight(cfg, inputtitle, 'A', '&')
-- return just matching title
order by ts_rank(...)

is it what you need?

This is just one of the possible way to rank something...

otherwise: really understand how rank is computed, keep
columns/ts_vector separated, compute rank for each column and pass
the result to some magic function that will compute a "cumulative"
ranking...
Or you could write your own ts_rank... but I tend to trust Oleg and
common practice with pg rather than inventing my own ranking
function.

Right now ts_rank* are black boxes for me. I envisioned I may enjoy
some finer tuning on ranking... but currently they really do a good
job.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Gaini Rajeshwar
Дата:


On Mon, Oct 12, 2009 at 7:26 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Mon, 12 Oct 2009 19:11:01 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:

> I think you misunderstood my question. let me give clear idea
> about what i need.
>
> I am using PostgreSQL fulltext search (tsearch2) feature to
> implement searching on database. From readings i came to know that
> we can give weights to different fields in database something like
> this:
>
> *setweight(to_tsvector(title),'A')*
>
> Where 'A' is weight given to field title. i can give weights to
> other fields in the same way. Where the weights 'A', 'B', 'C', 'D'
> are in will be in the following order *A > B > C > D* according to
> defalut fulltext search configuration.
>
> We can rank the search results using ts_rank function something
> like this,
>
> *ts_rank(tsv_title,ts_query('this is my search text'))*
> **
> But, i want to rank these reults not only based on just title, but
> also using other fields like summary etc.
> Is there a way around to do this?

if you concatenate your fields with different weight in the *same*
ts_vector, ranking will take into account your weight...
yes, ranking will take into account. but how can we specify just one field at the time of searching and specify all the fields at the time of ranking?
 
Someone more knowledgeable than me chose how to use weight to give a
 
reasonable ranking.

Of course if you've field a, b and c and you want to search in a and
b only, you'll have to concatenate just a and b.

If you need different assortment in fields groups... you'll have to
add some extra redundancy if you plan to store precomputed
ts_vectors for each record.

If you need to search "separately" in different fields
(eg. title ~ 'gino' AND summary ~ 'pino')
you just need to weight the input query as well

inputquery := setweight(cfg, inputtitle, 'A', '&');
inputquery := inputquery && setweight(cfg, inputsummary, 'B', '&');
I didn't understand why did u use '&'  operator in setweight function. is that going to help in any way?

...



--
Ivan Sergio Borgonovo
http://www.webthatworks.it


setweight detailed doc was: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Ivan Sergio Borgonovo
Дата:
On Mon, 12 Oct 2009 20:02:16 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:

> > inputquery := setweight(cfg, inputtitle, 'A', '&');
> > inputquery := inputquery && setweight(cfg, inputsummary, 'B',
> > '&');

> I didn't understand why did u use '&'  operator in setweight
> function. is that going to help in any way?

I don't understand it either...
I just copied and pasted from a working function I wrote long ago.

select setweight('pg_catalog.english', 'java', 'A', '&');

I can't remember what was the meaning of that '&' and I can't find
the docs.

Could someone point me to a more detailed doc that explain in more
details setweight?

thanks


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: Ranking search results using multiple fields in PostgreSQL fulltext search

От
Gaini Rajeshwar
Дата:


On Mon, Oct 12, 2009 at 8:02 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Mon, 12 Oct 2009 19:26:55 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:

> Ivan,
> If i create a tsvector as you mentioned with concatenation
> operator, my search query will search in any of these fields which
> are concatenated in my tsvector.
> For example, if i create tsvector like this,
> UPDATE document_table SET search_col =
>   setweight(to_tsvector(coalesce(title,'')), 'A') ||
>   setweight(to_tsvector(coalesce(summary,'')), 'B'));
>
> and do a query like this
> select title, ts_rank(search_col, to_tsquery('this is my text
> search') AS rank
> FROM search_col @@ to_tsvector('this & is & my & text & search')
> ORDER BY rank DESC
> the above query will search in title and summary and will give me
> the results. But i dont want in that way.When a user wants to
> search in title, it should just search in title but the results
> should be ranked based on * title* and *summary* field.

Search *just* in title specifying the weight in the input query and
rank on title and summary.

/*
-- somewhere else in your code...
search_col := setweight(cfg, title, 'A', '&');
search_col := search_col && setweight(cfg, summary, 'B', '&');
*/


select rank(search_col, to_tsquery(inputtitle)) as rank
-- rank on both if search_col just contains title and summary
...
where search_col @@ setweight(cfg, inputtitle, 'A', '&')
-- return just matching title
order by ts_rank(...)
Yes, it is true.but there is bit difficulty in using this method to my application. As i want to rank results based on many fields, if i concatenate all these fields into search_col, it can be a problematic. It will be problematic, because PostgreSQL by default supports 256 positions for lexeme and 1MB for ts_vector() size. If i concatenate in this way, then it can be a very much lossy, and my ranking may not be perfect.
Instead of that way, i am just wondering if i can specify manually more than one fields in the ts_rank() function itself, rather than specifying search_col which is prepared by contactenating other fields.
I hope, i am clear from my side. let me know if i am not making sense.
 
is it what you need?
 

This is just one of the possible way to rank something...

otherwise: really understand how rank is computed, keep
columns/ts_vector separated, compute rank for each column and pass
the result to some magic function that will compute a "cumulative"
ranking...
Or you could write your own ts_rank... but I tend to trust Oleg and
common practice with pg rather than inventing my own ranking
function.

Right now ts_rank* are black boxes for me. I envisioned I may enjoy
some finer tuning on ranking... but currently they really do a good
job.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general