Обсуждение: Using FULLTEXT search with different weights for various fields

Поиск
Список
Период
Сортировка

Using FULLTEXT search with different weights for various fields

От
Mike Christensen
Дата:
I'm trying to make some improvements to my search results by taking
advantage of Postgres' setweight function, but am having a lot of
problems getting a query to run..  Here's the query that I run now
(I've removed some parts that just make it more complicated than you
need to be bothered with)..

SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating FROM Recipes R
WHERE (to_tsvector('english', title || ' ' || coalesce(description,
'') || ' ' || coalesce(steps, '')) @@ plainto_tsquery(:search))
ORDER BY R.Rating DESC LIMIT 100;

:search will be something the user types in, such as "cookies eggs".
This runs well, since I have an index on that vector expression.
However, I'd like to do the following:

title should have a weight of A.

description should have a weight of B.

steps should have a weight of C.

I've tried a few things based on the documentation at
http://www.postgresql.org/docs/8.4/static/textsearch-controls.html,
but am not having a lot of luck.  Here's what I've come up with some
far:

SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating,
   setweight(to_tsvector(title), 'A') ||
   setweight(to_tsvector(coalesce(description, '')), 'B') ||
   setweight(to_tsvector(coalesce(steps, '')), 'C') as vector,
   ts_rank_cd(vector, query) as rank
FROM Recipes R,
   plainto_tsquery('cookies eggs') query
WHERE
 vector @@ query
ORDER BY rank desc LIMIT 100;

This doesn't work due to the error: column 'vector' does not exist,
which I guess is a valid point the way the query is parsed.  The
examples basically assume you have a column called textsearch which is
a tsvector, and you use a trigger or something to update it.  I'm
trying to avoid modifying my table schema for now.  Is there a way to
express this query without pre-computing the tsvector on the table?
Also, is having a tsvector in the table basically the standard
approach and something I should just get used to doing?  Maybe I can
use a view that computers the tsvector and index that?  Sorry, this is
probably a totally brain dead fulltext question, but I'm new to this
whole concept.  I make pretty web pages, and am not as smart as the
people on this list.

Mike

Re: Using FULLTEXT search with different weights for various fields

От
Arjen Nienhuis
Дата:
The simple answer is this:

SELECT 
  R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating,
  setweight(to_tsvector(title), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(coalesce(steps, '')), 'C') as vector,
  ts_rank_cd(
      setweight(to_tsvector(title), 'A') ||
  
    setweight(to_tsvector(coalesce(description, '')), 'B') ||
      setweight(to_tsvector(coalesce(steps, '')), 'C')
      ,
      query
  ) as rank
FROM Recipes R,
  plainto_tsquery('cookies eggs') query
WHERE
 vector @@ query
ORDER BY rank desc LIMIT 100;

In the end I declared a function. Then you get:

CREATE INDEX ... ON f(title, description, steps);

SELECT C
  R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating,
  f(title, description, steps) as vector,
  ts_rank_cd(
f(title, description, steps)query) as rank
FROM Recipes R,
  plainto_tsquery('cookies eggs') query
WHERE
 vector @@ query
ORDER BY rank desc LIMIT 100;

On Thu, Aug 26, 2010 at 10:44 AM, Mike Christensen <mike@kitchenpc.com> wrote:
I'm trying to make some improvements to my search results by taking
advantage of Postgres' setweight function, but am having a lot of
problems getting a query to run..  Here's the query that I run now
(I've removed some parts that just make it more complicated than you
need to be bothered with)..

SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating FROM Recipes R
WHERE (to_tsvector('english', title || ' ' || coalesce(description,
'') || ' ' || coalesce(steps, '')) @@ plainto_tsquery(:search))
ORDER BY R.Rating DESC LIMIT 100;

:search will be something the user types in, such as "cookies eggs".
This runs well, since I have an index on that vector expression.
However, I'd like to do the following:

title should have a weight of A.

description should have a weight of B.

steps should have a weight of C.

I've tried a few things based on the documentation at
http://www.postgresql.org/docs/8.4/static/textsearch-controls.html,
but am not having a lot of luck.  Here's what I've come up with some
far:

SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating,
  setweight(to_tsvector(title), 'A') ||
  setweight(to_tsvector(coalesce(description, '')), 'B') ||
  setweight(to_tsvector(coalesce(steps, '')), 'C') as vector,
  ts_rank_cd(vector, query) as rank
FROM Recipes R,
  plainto_tsquery('cookies eggs') query
WHERE
 vector @@ query
ORDER BY rank desc LIMIT 100;

This doesn't work due to the error: column 'vector' does not exist,
which I guess is a valid point the way the query is parsed.  The
examples basically assume you have a column called textsearch which is
a tsvector, and you use a trigger or something to update it.  I'm
trying to avoid modifying my table schema for now.  Is there a way to
express this query without pre-computing the tsvector on the table?
Also, is having a tsvector in the table basically the standard
approach and something I should just get used to doing?  Maybe I can
use a view that computers the tsvector and index that?  Sorry, this is
probably a totally brain dead fulltext question, but I'm new to this
whole concept.  I make pretty web pages, and am not as smart as the
people on this list.

Mike

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Using FULLTEXT search with different weights for various fields

От
Mike Christensen
Дата:
Thanks, the function idea is a good one.  It makes the query look a lot better.

My next question is are there any obvious advantages of making the
tsvector part of the table and using a trigger to update it (or
refresh the vectors every night or something).  Thanks!

Mike

On Thu, Aug 26, 2010 at 3:16 AM, Arjen Nienhuis <a.g.nienhuis@gmail.com> wrote:
> The simple answer is this:
> SELECT
>   R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating,
>   setweight(to_tsvector(title), 'A') ||
>   setweight(to_tsvector(coalesce(description, '')), 'B') ||
>   setweight(to_tsvector(coalesce(steps, '')), 'C') as vector,
>   ts_rank_cd(
>       setweight(to_tsvector(title), 'A') ||
>       setweight(to_tsvector(coalesce(description, '')), 'B') ||
>       setweight(to_tsvector(coalesce(steps, '')), 'C')
>       ,
>       query
>   ) as rank
> FROM Recipes R,
>   plainto_tsquery('cookies eggs') query
> WHERE
>  vector @@ query
> ORDER BY rank desc LIMIT 100;
>
> In the end I declared a function. Then you get:
> CREATE INDEX ... ON f(title, description, steps);
> SELECT C
>   R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating,
>   f(title, description, steps) as vector,
>   ts_rank_cd(f(title, description, steps), query) as rank
> FROM Recipes R,
>   plainto_tsquery('cookies eggs') query
> WHERE
>  vector @@ query
> ORDER BY rank desc LIMIT 100;
> On Thu, Aug 26, 2010 at 10:44 AM, Mike Christensen <mike@kitchenpc.com>
> wrote:
>>
>> I'm trying to make some improvements to my search results by taking
>> advantage of Postgres' setweight function, but am having a lot of
>> problems getting a query to run..  Here's the query that I run now
>> (I've removed some parts that just make it more complicated than you
>> need to be bothered with)..
>>
>> SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating FROM
>> Recipes R
>> WHERE (to_tsvector('english', title || ' ' || coalesce(description,
>> '') || ' ' || coalesce(steps, '')) @@ plainto_tsquery(:search))
>> ORDER BY R.Rating DESC LIMIT 100;
>>
>> :search will be something the user types in, such as "cookies eggs".
>> This runs well, since I have an index on that vector expression.
>> However, I'd like to do the following:
>>
>> title should have a weight of A.
>>
>> description should have a weight of B.
>>
>> steps should have a weight of C.
>>
>> I've tried a few things based on the documentation at
>> http://www.postgresql.org/docs/8.4/static/textsearch-controls.html,
>> but am not having a lot of luck.  Here's what I've come up with some
>> far:
>>
>> SELECT R.RecipeId, R.Title, R.Description, R.ImageUrl, R.Rating,
>>   setweight(to_tsvector(title), 'A') ||
>>   setweight(to_tsvector(coalesce(description, '')), 'B') ||
>>   setweight(to_tsvector(coalesce(steps, '')), 'C') as vector,
>>   ts_rank_cd(vector, query) as rank
>> FROM Recipes R,
>>   plainto_tsquery('cookies eggs') query
>> WHERE
>>  vector @@ query
>> ORDER BY rank desc LIMIT 100;
>>
>> This doesn't work due to the error: column 'vector' does not exist,
>> which I guess is a valid point the way the query is parsed.  The
>> examples basically assume you have a column called textsearch which is
>> a tsvector, and you use a trigger or something to update it.  I'm
>> trying to avoid modifying my table schema for now.  Is there a way to
>> express this query without pre-computing the tsvector on the table?
>> Also, is having a tsvector in the table basically the standard
>> approach and something I should just get used to doing?  Maybe I can
>> use a view that computers the tsvector and index that?  Sorry, this is
>> probably a totally brain dead fulltext question, but I'm new to this
>> whole concept.  I make pretty web pages, and am not as smart as the
>> people on this list.
>>
>> Mike
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>