Обсуждение: Question about fulltext search and to_tsvector function
Okay this one's driving me crazy. Should there be any difference between the following queries: SELECT R.Title FROM Recipes R WHERE (to_tsvector('english', title || description) @@ plainto_tsquery('Cake')); and SELECT R.Title FROM Recipes R WHERE (to_tsvector('english', description || title) @@ plainto_tsquery('Cake')); This query should search for the word 'Cake' in either the description OR the title field, correct? The order should not matter. However, in the first query I get 6 results and in the second I get 9. Furthermore, if I do: SELECT R.Title FROM Recipes R WHERE (to_tsvector('english', title) @@ plainto_tsquery('Cake')); I get 10 results and if I do: SELECT R.Title FROM Recipes R WHERE (to_tsvector('english', title || coalesce(description, '')) @@ plainto_tsquery('Cake')); I get 7 results. Shouldn't the first query result in a subset of the second query? The first query returns several rows the second one doesn't return, and vice-versa! This function is completely confusing me!! I would greatly appreciate anyone who could explain exactly how this works. It's most likely something stupid I'm doing that is the result of me not getting enough sleep.. Mike
Mike Christensen <imaudi@comcast.net> writes: > Okay this one's driving me crazy. > Should there be any difference between the following queries: > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', title || description) @@ > plainto_tsquery('Cake')); > and > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', description || title) @@ > plainto_tsquery('Cake')); Maybe it would be better to do description || ' ' || title ? I suspect that 'Cake' is getting run together with some other word in some of your rows ... regards, tom lane
Hi guys - I half figured this out.. The problem is the vector is concatenating the fields together.. If I have a title of "Iced Lemon Cake" and the description is "Tangy and sweet!", then the tsvector looks like this: "'ice':1 'lemon':2 'sweet':5 'caketangi':3" "Cake" and "Tangy" get parsed as "caketangi" heheh.. So my questions are: 1) What's the proper syntax for this query, and 2) What's the proper syntax to index the vector (I can probably figured this out once I have the first answer).. Thanks!! Mike Mike Christensen wrote: > Okay this one's driving me crazy. > > Should there be any difference between the following queries: > > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', title || description) @@ > plainto_tsquery('Cake')); > > and > > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', description || title) @@ > plainto_tsquery('Cake')); > > > This query should search for the word 'Cake' in either the description > OR the title field, correct? The order should not matter. However, > in the first query I get 6 results and in the second I get 9. > > Furthermore, if I do: > > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', title) @@ plainto_tsquery('Cake')); > > I get 10 results and if I do: > > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', title || coalesce(description, '')) @@ > plainto_tsquery('Cake')); > > I get 7 results. Shouldn't the first query result in a subset of the > second query? The first query returns several rows the second one > doesn't return, and vice-versa! This function is completely confusing > me!! > > I would greatly appreciate anyone who could explain exactly how this > works. It's most likely something stupid I'm doing that is the result > of me not getting enough sleep.. > > Mike >
That was it!
So this is the standard way of doing this? It seems a bit messy to me, but I've gotten kinda used to Postgresql being almost annoyingly logical <g>
So here's my new query:
SELECT R.RecipeId, R.Title, R.Description FROM Recipes R
WHERE (to_tsvector('english', title || ' ' || coalesce(description, '') || ' ' || coalesce(credit, '') || ' ' || coalesce(steps, '')) @@ plainto_tsquery('cake'));
and my new index:
CREATE INDEX idx_recipes_fullsearch
ON recipes
USING gin
(to_tsvector('english'::regconfig, ((title::text || ' ' || COALESCE(description, ''::character varying)::text) || ' ' || COALESCE(credit, ''::character varying)::text) || ' ' || COALESCE(steps, ''::text)));
This is exactly what you would do as well? I just want to make sure this isn't considered a hack..
Tom Lane wrote:
So this is the standard way of doing this? It seems a bit messy to me, but I've gotten kinda used to Postgresql being almost annoyingly logical <g>
So here's my new query:
SELECT R.RecipeId, R.Title, R.Description FROM Recipes R
WHERE (to_tsvector('english', title || ' ' || coalesce(description, '') || ' ' || coalesce(credit, '') || ' ' || coalesce(steps, '')) @@ plainto_tsquery('cake'));
and my new index:
CREATE INDEX idx_recipes_fullsearch
ON recipes
USING gin
(to_tsvector('english'::regconfig, ((title::text || ' ' || COALESCE(description, ''::character varying)::text) || ' ' || COALESCE(credit, ''::character varying)::text) || ' ' || COALESCE(steps, ''::text)));
This is exactly what you would do as well? I just want to make sure this isn't considered a hack..
Tom Lane wrote:
Mike Christensen <imaudi@comcast.net> writes:Okay this one's driving me crazy. Should there be any difference between the following queries:SELECT R.Title FROM Recipes R WHERE (to_tsvector('english', title || description) @@ plainto_tsquery('Cake'));andSELECT R.Title FROM Recipes R WHERE (to_tsvector('english', description || title) @@ plainto_tsquery('Cake'));Maybe it would be better to do description || ' ' || title ? I suspect that 'Cake' is getting run together with some other word in some of your rows ... regards, tom lane
Mike Christensen <imaudi@comcast.net> writes: > This is exactly what you would do as well? I just want to make sure > this isn't considered a hack.. It certainly works. What the docs seem to recommend though is concatenating the *output* of to_tsvector applied to each field, along the lines of setweight(to_tsvector('pg_catalog.english', coalesce(new.title,'')), 'A') || setweight(to_tsvector('pg_catalog.english', coalesce(new.body,'')), 'D'); (cf. section 12.4.3 in the 8.3 docs). I'm not sure about whether this is better or worse from a micro-optimization speed point of view, but the reason to do it this way is you can attach different weights to words from different fields, as illustrated above. That can come in handy in future searching. (Note that || applied to tsvectors isn't really just simple concatenation, but that's the operator name that was chosen.) regards, tom lane
Mike, 1. always separate attributes by blank unless you sure words can span different columns, e.g., title||' '||description 2. use coalesce to avoid unexpected NULL string Oleg On Thu, 16 Oct 2008, Mike Christensen wrote: > Okay this one's driving me crazy. > > Should there be any difference between the following queries: > > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', title || description) @@ > plainto_tsquery('Cake')); > > and > > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', description || title) @@ > plainto_tsquery('Cake')); > > > This query should search for the word 'Cake' in either the description OR the > title field, correct? The order should not matter. However, in the first > query I get 6 results and in the second I get 9. > > Furthermore, if I do: > > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', title) @@ plainto_tsquery('Cake')); > > I get 10 results and if I do: > > SELECT R.Title FROM Recipes R > WHERE (to_tsvector('english', title || coalesce(description, '')) @@ > plainto_tsquery('Cake')); > > I get 7 results. Shouldn't the first query result in a subset of the second > query? The first query returns several rows the second one doesn't return, > and vice-versa! This function is completely confusing me!! > > I would greatly appreciate anyone who could explain exactly how this works. > It's most likely something stupid I'm doing that is the result of me not > getting enough sleep.. > > Mike > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83