Обсуждение: Ranking search results using multiple fields in PostgreSQL fulltext search
Hi,
is there a way to rank the search results based on multiple fields in postgreSQL?
For example,
i have title, abstract, summary, body as fields/columns in my database. When user searches on title, i want to rank the results based on title field as well as summary field, where importance(summary) > importance(title). But the results should be exactly matching the terms in "title" rather than "title" OR "summary"
Re: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Raymond O'Donnell
Дата:
On 12/10/2009 14:16, Gaini Rajeshwar wrote: > Hi, > is there a way to rank the search results based on multiple fields in > postgreSQL? > For example, > i have *title*, *abstract*, *summary*, *body* as fields/columns in my > database. When user searches on *title*, i want to rank the results based on > *title* field as well as *summary* field, where importance(summary) > > importance(title). But the results should be exactly matching the terms in > "title" rather than "title" OR "summary" > Hoe do you define "importance"? You can order the results alphabetically/numerically on as many columns as you like - for example, select .... where title = .... order by title, summary will order the results on title first, then summary... though you probably know this and I'm misunderstanding what you need. A little more detail will help. Ray. ------------------------------------------------------------------ Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland rod@iol.ie Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals ------------------------------------------------------------------
Re: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Raymond O'Donnell
Дата:
On 12/10/2009 14:23, Raymond O'Donnell wrote: > Hoe do you define "importance"? Whoops... for "Hoe" read "How".... :-) Ray. ------------------------------------------------------------------ Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland rod@iol.ie Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals ------------------------------------------------------------------
Re: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Gaini Rajeshwar
Дата:
I think you misunderstood my question. let me give clear idea about what i need.
I am using PostgreSQL fulltext search (tsearch2) feature to implement searching on database. From readings i came to know that we can give weights to different fields in database something like this:
setweight(to_tsvector(title),'A')
Where 'A' is weight given to field title. i can give weights to other fields in the same way. Where the weights 'A', 'B', 'C', 'D' are in will be in the following order A > B > C > D according to defalut fulltext search configuration.
We can rank the search results using ts_rank function something like this,
ts_rank(tsv_title,ts_query('this is my search text'))
But, i want to rank these reults not only based on just title, but also using other fields like summary etc.
Is there a way around to do this?
On Mon, Oct 12, 2009 at 6:53 PM, Raymond O'Donnell <rod@iol.ie> wrote:
Hoe do you define "importance"?On 12/10/2009 14:16, Gaini Rajeshwar wrote:
> Hi,
> is there a way to rank the search results based on multiple fields in
> postgreSQL?
> For example,
> i have *title*, *abstract*, *summary*, *body* as fields/columns in my
> database. When user searches on *title*, i want to rank the results based on
> *title* field as well as *summary* field, where importance(summary) >
> importance(title). But the results should be exactly matching the terms in
> "title" rather than "title" OR "summary"
>
You can order the results alphabetically/numerically on as many columns
as you like - for example,
select ....
where title = ....
order by title, summary
will order the results on title first, then summary... though you
probably know this and I'm misunderstanding what you need.
A little more detail will help.
Ray.
------------------------------------------------------------------
Raymond O'Donnell, Director of Music, Galway Cathedral, Ireland
rod@iol.ie
Galway Cathedral Recitals: http://www.galwaycathedral.org/recitals
------------------------------------------------------------------
Re: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Ivan Sergio Borgonovo
Дата:
On Mon, 12 Oct 2009 18:46:02 +0530 Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote: > Hi, > is there a way to rank the search results based on multiple fields > in postgreSQL? > For example, > i have *title*, *abstract*, *summary*, *body* as fields/columns in > my database. When user searches on *title*, i want to rank the > results based on *title* field as well as *summary* field, where > importance(summary) > importance(title). But the results should be > exactly matching the terms in "title" rather than "title" OR > "summary" http://www.postgresql.org/docs/current/interactive/textsearch-controls.html Basically, as you can read in the docs: - you create a ts_vector concatenating and giving a weight the various fields. - then you compare your ts_vector with plainto_tsquery(config, yourinput) @@ yourpreviouslycomputedts_vector and order by ts_rank(yourpreviouslycomputedts_vector, yourinput) (or ts_rank_cd) -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Ivan Sergio Borgonovo
Дата:
On Mon, 12 Oct 2009 19:11:01 +0530 Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote: > I think you misunderstood my question. let me give clear idea > about what i need. > > I am using PostgreSQL fulltext search (tsearch2) feature to > implement searching on database. From readings i came to know that > we can give weights to different fields in database something like > this: > > *setweight(to_tsvector(title),'A')* > > Where 'A' is weight given to field title. i can give weights to > other fields in the same way. Where the weights 'A', 'B', 'C', 'D' > are in will be in the following order *A > B > C > D* according to > defalut fulltext search configuration. > > We can rank the search results using ts_rank function something > like this, > > *ts_rank(tsv_title,ts_query('this is my search text'))* > ** > But, i want to rank these reults not only based on just title, but > also using other fields like summary etc. > Is there a way around to do this? if you concatenate your fields with different weight in the *same* ts_vector, ranking will take into account your weight... Someone more knowledgeable than me chose how to use weight to give a reasonable ranking. Of course if you've field a, b and c and you want to search in a and b only, you'll have to concatenate just a and b. If you need different assortment in fields groups... you'll have to add some extra redundancy if you plan to store precomputed ts_vectors for each record. If you need to search "separately" in different fields (eg. title ~ 'gino' AND summary ~ 'pino') you just need to weight the input query as well inputquery := setweight(cfg, inputtitle, 'A', '&'); inputquery := inputquery && setweight(cfg, inputsummary, 'B', '&'); ... -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Gaini Rajeshwar
Дата:
Ivan,
If i create a tsvector as you mentioned with concatenation operator, my search query will search in any of these fields which are concatenated in my tsvector.
For example, if i create tsvector like this,
UPDATE document_table SET search_col =
setweight(to_tsvector(coalesce(title,'')), 'A') ||
setweight(to_tsvector(coalesce(summary,'')), 'B'));
setweight(to_tsvector(coalesce(title,'')), 'A') ||
setweight(to_tsvector(coalesce(summary,'')), 'B'));
and do a query like this
select title, ts_rank(search_col, to_tsquery('this is my text search') AS rank
FROM search_col @@ to_tsvector('this & is & my & text & search')
ORDER BY rank DESC
the above query will search in title and summary and will give me the results. But i dont want in that way.When a user wants to search in title, it should just search in title but the results should be ranked based on title and summary field.
On Mon, Oct 12, 2009 at 7:16 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
http://www.postgresql.org/docs/current/interactive/textsearch-controls.htmlOn Mon, 12 Oct 2009 18:46:02 +0530
Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote:
> Hi,
> is there a way to rank the search results based on multiple fields
> in postgreSQL?
> For example,
> i have *title*, *abstract*, *summary*, *body* as fields/columns in
> my database. When user searches on *title*, i want to rank the
> results based on *title* field as well as *summary* field, where
> importance(summary) > importance(title). But the results should be
> exactly matching the terms in "title" rather than "title" OR
> "summary"
Basically, as you can read in the docs:
- you create a ts_vector concatenating and giving a weight the
various fields.
- then you compare your ts_vector with
plainto_tsquery(config, yourinput) @@
yourpreviouslycomputedts_vector
and order by ts_rank(yourpreviouslycomputedts_vector, yourinput)
(or ts_rank_cd)
--
Ivan Sergio Borgonovo
http://www.webthatworks.it
Re: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Ivan Sergio Borgonovo
Дата:
On Mon, 12 Oct 2009 19:26:55 +0530 Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote: > Ivan, > If i create a tsvector as you mentioned with concatenation > operator, my search query will search in any of these fields which > are concatenated in my tsvector. > For example, if i create tsvector like this, > UPDATE document_table SET search_col = > setweight(to_tsvector(coalesce(title,'')), 'A') || > setweight(to_tsvector(coalesce(summary,'')), 'B')); > > and do a query like this > select title, ts_rank(search_col, to_tsquery('this is my text > search') AS rank > FROM search_col @@ to_tsvector('this & is & my & text & search') > ORDER BY rank DESC > the above query will search in title and summary and will give me > the results. But i dont want in that way.When a user wants to > search in title, it should just search in title but the results > should be ranked based on * title* and *summary* field. Search *just* in title specifying the weight in the input query and rank on title and summary. /* -- somewhere else in your code... search_col := setweight(cfg, title, 'A', '&'); search_col := search_col && setweight(cfg, summary, 'B', '&'); */ select rank(search_col, to_tsquery(inputtitle)) as rank -- rank on both if search_col just contains title and summary ... where search_col @@ setweight(cfg, inputtitle, 'A', '&') -- return just matching title order by ts_rank(...) is it what you need? This is just one of the possible way to rank something... otherwise: really understand how rank is computed, keep columns/ts_vector separated, compute rank for each column and pass the result to some magic function that will compute a "cumulative" ranking... Or you could write your own ts_rank... but I tend to trust Oleg and common practice with pg rather than inventing my own ranking function. Right now ts_rank* are black boxes for me. I envisioned I may enjoy some finer tuning on ranking... but currently they really do a good job. -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Gaini Rajeshwar
Дата:
On Mon, Oct 12, 2009 at 7:26 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Mon, 12 Oct 2009 19:11:01 +0530> I think you misunderstood my question. let me give clear ideaif you concatenate your fields with different weight in the *same*
> about what i need.
>
> I am using PostgreSQL fulltext search (tsearch2) feature to
> implement searching on database. From readings i came to know that
> we can give weights to different fields in database something like
> this:
>
> *setweight(to_tsvector(title),'A')*
>
> Where 'A' is weight given to field title. i can give weights to
> other fields in the same way. Where the weights 'A', 'B', 'C', 'D'
> are in will be in the following order *A > B > C > D* according to
> defalut fulltext search configuration.
>
> We can rank the search results using ts_rank function something
> like this,
>
> *ts_rank(tsv_title,ts_query('this is my search text'))*
> **
> But, i want to rank these reults not only based on just title, but
> also using other fields like summary etc.
> Is there a way around to do this?
ts_vector, ranking will take into account your weight...
yes, ranking will take into account. but how can we specify just one field at the time of searching and specify all the fields at the time of ranking?
Someone more knowledgeable than me chose how to use weight to give a
reasonable ranking.
Of course if you've field a, b and c and you want to search in a and
b only, you'll have to concatenate just a and b.
If you need different assortment in fields groups... you'll have to
add some extra redundancy if you plan to store precomputed
ts_vectors for each record.
If you need to search "separately" in different fields
(eg. title ~ 'gino' AND summary ~ 'pino')
you just need to weight the input query as well
inputquery := setweight(cfg, inputtitle, 'A', '&');
inputquery := inputquery && setweight(cfg, inputsummary, 'B', '&');
I didn't understand why did u use '&' operator in setweight function. is that going to help in any way?
...
setweight detailed doc was: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Ivan Sergio Borgonovo
Дата:
On Mon, 12 Oct 2009 20:02:16 +0530 Gaini Rajeshwar <raja.rajeshwar2006@gmail.com> wrote: > > inputquery := setweight(cfg, inputtitle, 'A', '&'); > > inputquery := inputquery && setweight(cfg, inputsummary, 'B', > > '&'); > I didn't understand why did u use '&' operator in setweight > function. is that going to help in any way? I don't understand it either... I just copied and pasted from a working function I wrote long ago. select setweight('pg_catalog.english', 'java', 'A', '&'); I can't remember what was the meaning of that '&' and I can't find the docs. Could someone point me to a more detailed doc that explain in more details setweight? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: Ranking search results using multiple fields in PostgreSQL fulltext search
От
Gaini Rajeshwar
Дата:
On Mon, Oct 12, 2009 at 8:02 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:
On Mon, 12 Oct 2009 19:26:55 +0530> Ivan,Search *just* in title specifying the weight in the input query and
> If i create a tsvector as you mentioned with concatenation
> operator, my search query will search in any of these fields which
> are concatenated in my tsvector.
> For example, if i create tsvector like this,
> UPDATE document_table SET search_col =
> setweight(to_tsvector(coalesce(title,'')), 'A') ||
> setweight(to_tsvector(coalesce(summary,'')), 'B'));
>
> and do a query like this
> select title, ts_rank(search_col, to_tsquery('this is my text
> search') AS rank
> FROM search_col @@ to_tsvector('this & is & my & text & search')
> ORDER BY rank DESC
> the above query will search in title and summary and will give me
> the results. But i dont want in that way.When a user wants to
> search in title, it should just search in title but the results
> should be ranked based on * title* and *summary* field.
rank on title and summary.
/*
-- somewhere else in your code...
search_col := setweight(cfg, title, 'A', '&');
search_col := search_col && setweight(cfg, summary, 'B', '&');
*/
select rank(search_col, to_tsquery(inputtitle)) as rank
-- rank on both if search_col just contains title and summary
...
where search_col @@ setweight(cfg, inputtitle, 'A', '&')
-- return just matching title
order by ts_rank(...)
Yes, it is true.but there is bit difficulty in using this method to my application. As i want to rank results based on many fields, if i concatenate all these fields into search_col, it can be a problematic. It will be problematic, because PostgreSQL by default supports 256 positions for lexeme and 1MB for ts_vector() size. If i concatenate in this way, then it can be a very much lossy, and my ranking may not be perfect.
Instead of that way, i am just wondering if i can specify manually more than one fields in the ts_rank() function itself, rather than specifying search_col which is prepared by contactenating other fields.
I hope, i am clear from my side. let me know if i am not making sense.
is it what you need?
This is just one of the possible way to rank something...
otherwise: really understand how rank is computed, keep
columns/ts_vector separated, compute rank for each column and pass
the result to some magic function that will compute a "cumulative"
ranking...
Or you could write your own ts_rank... but I tend to trust Oleg and
common practice with pg rather than inventing my own ranking
function.
Right now ts_rank* are black boxes for me. I envisioned I may enjoy
some finer tuning on ranking... but currently they really do a good
job.
----
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general