Обсуждение: Identifying which column matches a full text search
Hi all,
The following example is given in the Postgres 8.3 manual regarding how to create a single ts_vector column for two existing columns:
ALTER TABLE pgweb ADD COLUMN textsearchable_index_col tsvector;
UPDATE pgweb SET textsearchable_index_col =
to_tsvector('english', coalesce(title,'') || coalesce(body,''));
Then we create a GIN index to speed up the search:
CREATE INDEX textsearch_idx ON pgweb USING gin(textsearchable_index_col);
Now we are ready to perform a fast full text search:
SELECT title
FROM pgweb
WHERE textsearchable_index_col @@ to_tsquery('create & table')
ORDER BY last_mod_date DESC LIMIT 10;
Using this approach. Is there any way of retrieving which of the original two columns the match was found in?
Any help would be much appreciated,
Ryan
Ryan Wallace wrote:
>
> UPDATE pgweb SET textsearchable_index_col =
> to_tsvector('english', coalesce(title,'') || coalesce(body,''));
> WHERE textsearchable_index_col @@ to_tsquery('create & table')
> Using this approach. Is there any way of retrieving which of the original
> two columns the match was found in?
Afraid not - you're not indexing two columns, you're indexing one:
textsearchable_index_col.
You can add up to four weights to a tsvector though, typically for
title/body matching. See chapter 12.3 for details.
Failing that, where I've had many (a dozen) different sources but want
to search them all I've built a textsearch_blocks table with columns to
identify the source and have triggers that keep it up to date.
-- Richard Huxton Archonet Ltd
Richard Huxton wrote:
>
> Failing that, where I've had many (a dozen) different sources but want
> to search them all I've built a textsearch_blocks table with columns to
> identify the source and have triggers that keep it up to date.
Once you've built the text search blocks table, how do you search it? Do you
perform
twelve separate queries or can you just do one?
Ryan
Ryan Wallace wrote:
>
> UPDATE pgweb SET textsearchable_index_col =
> to_tsvector('english', coalesce(title,'') || coalesce(body,''));
> WHERE textsearchable_index_col @@ to_tsquery('create & table')
> Using this approach. Is there any way of retrieving which of the original
> two columns the match was found in?
Afraid not - you're not indexing two columns, you're indexing one:
textsearchable_index_col.
You can add up to four weights to a tsvector though, typically for
title/body matching. See chapter 12.3 for details.
Failing that, where I've had many (a dozen) different sources but want
to search them all I've built a textsearch_blocks table with columns to
identify the source and have triggers that keep it up to date.
-- Richard Huxton Archonet Ltd
No virus found in this incoming message.
Checked by AVG - http://www.avg.com
Version: 8.0.138 / Virus Database: 270.5.6/1579 - Release Date: 7/29/2008
6:43 AM
Ryan Wallace wrote: > Richard Huxton wrote: >> Failing that, where I've had many (a dozen) different sources but want >> to search them all I've built a textsearch_blocks table with columns to >> identify the source and have triggers that keep it up to date. > > Once you've built the text search blocks table, how do you search it? Do you > perform > twelve separate queries or can you just do one? OK, you have a table something like: fulltext_blocks ( section varchar(32), itemid int4, words tsvector,them PRIMARY KEY (section, itemid) ) Now assume two of the things I search are "news" and "faqs". I'm assuming they've both got a simple serial pkey - if not, "itemid" above needs to be text and you'll have to cast. For each target table (news, faqs) add a trigger that updates fulltext_blocks appropriately. This can include weighting title and body of a news article. Then, search the fulltext_blocks table, optionally filtering by section. If you're going to have lots of results put the ids into a (perhapd temporary) results-table. Then join your results back to the original tables with the appropriate UNION (if you need to - it might be you fetch results one at a time elsewhere in your app). SELECT n.id, n.title, n.body FROM news n JOIN results r ON n.id=r.id WHERE r.section='news' UNION ALL SELECT f.id, f,question, f.answer FROM faqs f JOIN results r ON f.id=r.id WHERE r.section='faqs' ; You'll probably want to set ownership/permissions on the triggers / fulltext_blocks table so you can't accidentally update it directly. In mine I even had a "documents" section which relied on an external cron-driven script to strip the first 32k of text out of uploaded documents (pdf,word) in addition to user-supplied metadata (title, summary). Note - this is basically simulating what we could do if you could index a view. The fulltext_blocks table is nothing more than a materialised view. HTH -- Richard Huxton Archonet Ltd