Обсуждение: Question about fulltext search and to_tsvector function

Поиск
Список
Период
Сортировка

Question about fulltext search and to_tsvector function

От
Mike Christensen
Дата:
Okay this one's driving me crazy.

Should there be any difference between the following queries:

SELECT R.Title FROM Recipes R
WHERE (to_tsvector('english', title || description) @@
plainto_tsquery('Cake'));

and

SELECT R.Title FROM Recipes R
WHERE (to_tsvector('english', description || title) @@
plainto_tsquery('Cake'));


This query should search for the word 'Cake' in either the description
OR the title field, correct?  The order should not matter.  However, in
the first query I get 6 results and in the second I get 9.

Furthermore, if I do:

SELECT R.Title FROM Recipes R
WHERE (to_tsvector('english', title) @@ plainto_tsquery('Cake'));

I get 10 results and if I do:

SELECT R.Title FROM Recipes R
WHERE (to_tsvector('english', title || coalesce(description, '')) @@
plainto_tsquery('Cake'));

I get 7 results.  Shouldn't the first query result in a subset of the
second query?  The first query returns several rows the second one
doesn't return, and vice-versa!  This function is completely confusing me!!

I would greatly appreciate anyone who could explain exactly how this
works.  It's most likely something stupid I'm doing that is the result
of me not getting enough sleep..

Mike

Re: Question about fulltext search and to_tsvector function

От
Tom Lane
Дата:
Mike Christensen <imaudi@comcast.net> writes:
> Okay this one's driving me crazy.
> Should there be any difference between the following queries:

> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', title || description) @@
> plainto_tsquery('Cake'));

> and

> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', description || title) @@
> plainto_tsquery('Cake'));

Maybe it would be better to do description || ' ' || title ?
I suspect that 'Cake' is getting run together with some other
word in some of your rows ...

            regards, tom lane

Re: Question about fulltext search and to_tsvector function

От
Mike Christensen
Дата:
Hi guys -

I half figured this out..  The problem is the vector is concatenating
the fields together..  If I have a title of "Iced Lemon Cake" and the
description is "Tangy and sweet!", then the tsvector looks like this:

"'ice':1 'lemon':2 'sweet':5 'caketangi':3"

"Cake" and "Tangy" get parsed as "caketangi" heheh..

So my questions are:

1) What's the proper syntax for this query, and
2) What's the proper syntax to index the vector (I can probably figured
this out once I have the first answer)..

Thanks!!

Mike



Mike Christensen wrote:
> Okay this one's driving me crazy.
>
> Should there be any difference between the following queries:
>
> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', title || description) @@
> plainto_tsquery('Cake'));
>
> and
>
> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', description || title) @@
> plainto_tsquery('Cake'));
>
>
> This query should search for the word 'Cake' in either the description
> OR the title field, correct?  The order should not matter.  However,
> in the first query I get 6 results and in the second I get 9.
>
> Furthermore, if I do:
>
> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', title) @@ plainto_tsquery('Cake'));
>
> I get 10 results and if I do:
>
> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', title || coalesce(description, '')) @@
> plainto_tsquery('Cake'));
>
> I get 7 results.  Shouldn't the first query result in a subset of the
> second query?  The first query returns several rows the second one
> doesn't return, and vice-versa!  This function is completely confusing
> me!!
>
> I would greatly appreciate anyone who could explain exactly how this
> works.  It's most likely something stupid I'm doing that is the result
> of me not getting enough sleep..
>
> Mike
>

Re: Question about fulltext search and to_tsvector function

От
Mike Christensen
Дата:
That was it!

So this is the standard way of doing this?  It seems a bit messy to me, but I've gotten kinda used to Postgresql being almost annoyingly logical <g>

So here's my new query:

SELECT R.RecipeId, R.Title, R.Description FROM Recipes R
WHERE (to_tsvector('english', title || ' ' || coalesce(description, '') || ' ' || coalesce(credit, '') || ' ' || coalesce(steps, '')) @@ plainto_tsquery('cake'));

and my new index:

CREATE INDEX idx_recipes_fullsearch
  ON recipes
  USING gin
  (to_tsvector('english'::regconfig, ((title::text || ' ' || COALESCE(description, ''::character varying)::text) || ' ' || COALESCE(credit, ''::character varying)::text) || ' ' || COALESCE(steps, ''::text)));


This is exactly what you would do as well?  I just want to make sure this isn't considered a hack..

Tom Lane wrote:
Mike Christensen <imaudi@comcast.net> writes: 
Okay this one's driving me crazy.
Should there be any difference between the following queries:   
 
SELECT R.Title FROM Recipes R
WHERE (to_tsvector('english', title || description) @@ 
plainto_tsquery('Cake'));   
 
and   
 
SELECT R.Title FROM Recipes R
WHERE (to_tsvector('english', description || title) @@ 
plainto_tsquery('Cake'));   
Maybe it would be better to do description || ' ' || title ?
I suspect that 'Cake' is getting run together with some other
word in some of your rows ...
		regards, tom lane
 

Re: Question about fulltext search and to_tsvector function

От
Tom Lane
Дата:
Mike Christensen <imaudi@comcast.net> writes:
> This is exactly what you would do as well?  I just want to make sure
> this isn't considered a hack..

It certainly works.  What the docs seem to recommend though is
concatenating the *output* of to_tsvector applied to each field,
along the lines of

   setweight(to_tsvector('pg_catalog.english', coalesce(new.title,'')), 'A') ||
   setweight(to_tsvector('pg_catalog.english', coalesce(new.body,'')), 'D');

(cf. section 12.4.3 in the 8.3 docs).  I'm not sure about whether this
is better or worse from a micro-optimization speed point of view, but
the reason to do it this way is you can attach different weights to
words from different fields, as illustrated above.  That can come in
handy in future searching.

(Note that || applied to tsvectors isn't really just simple
concatenation, but that's the operator name that was chosen.)

            regards, tom lane

Re: Question about fulltext search and to_tsvector function

От
Oleg Bartunov
Дата:
Mike,

1. always separate attributes by blank unless you sure words can span different
   columns, e.g., title||' '||description
2. use coalesce to avoid unexpected NULL string

Oleg

On Thu, 16 Oct 2008, Mike Christensen wrote:

> Okay this one's driving me crazy.
>
> Should there be any difference between the following queries:
>
> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', title || description) @@
> plainto_tsquery('Cake'));
>
> and
>
> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', description || title) @@
> plainto_tsquery('Cake'));
>
>
> This query should search for the word 'Cake' in either the description OR the
> title field, correct?  The order should not matter.  However, in the first
> query I get 6 results and in the second I get 9.
>
> Furthermore, if I do:
>
> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', title) @@ plainto_tsquery('Cake'));
>
> I get 10 results and if I do:
>
> SELECT R.Title FROM Recipes R
> WHERE (to_tsvector('english', title || coalesce(description, '')) @@
> plainto_tsquery('Cake'));
>
> I get 7 results.  Shouldn't the first query result in a subset of the second
> query?  The first query returns several rows the second one doesn't return,
> and vice-versa!  This function is completely confusing me!!
>
> I would greatly appreciate anyone who could explain exactly how this works.
> It's most likely something stupid I'm doing that is the result of me not
> getting enough sleep..
>
> Mike
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83