Обсуждение: Identifying which column matches a full text search

Поиск
Список
Период
Сортировка

Identifying which column matches a full text search

От
Ryan Wallace
Дата:

Hi all,

 

The following example is given in the Postgres 8.3 manual regarding how to create a single ts_vector column for two existing columns:

 

ALTER TABLE pgweb ADD COLUMN textsearchable_index_col tsvector;

UPDATE pgweb SET textsearchable_index_col =

     to_tsvector('english', coalesce(title,'') || coalesce(body,''));

Then we create a GIN index to speed up the search:

CREATE INDEX textsearch_idx ON pgweb USING gin(textsearchable_index_col);

Now we are ready to perform a fast full text search:

SELECT title

FROM pgweb

WHERE textsearchable_index_col @@ to_tsquery('create & table')

ORDER BY last_mod_date DESC LIMIT 10;

 

Using this approach. Is there any way of retrieving which of the original two columns the match was found in?

 

Any help would be much appreciated,

Ryan

Re: Identifying which column matches a full text search

От
Richard Huxton
Дата:
Ryan Wallace wrote:
> 
> UPDATE pgweb SET textsearchable_index_col =
>      to_tsvector('english', coalesce(title,'') || coalesce(body,''));

> WHERE textsearchable_index_col @@ to_tsquery('create & table')

> Using this approach. Is there any way of retrieving which of the original
> two columns the match was found in?

Afraid not - you're not indexing two columns, you're indexing one: 
textsearchable_index_col.

You can add up to four weights to a tsvector though, typically for 
title/body matching. See chapter 12.3 for details.

Failing that, where I've had many (a dozen) different sources but want 
to search them all I've built a textsearch_blocks table with columns to 
identify the source and have triggers that keep it up to date.

--   Richard Huxton  Archonet Ltd


Re: Identifying which column matches a full text search

От
Ryan Wallace
Дата:
Richard Huxton wrote:
>
> Failing that, where I've had many (a dozen) different sources but want 
> to search them all I've built a textsearch_blocks table with columns to 
> identify the source and have triggers that keep it up to date.

Once you've built the text search blocks table, how do you search it? Do you
perform
twelve separate queries or can you just do one?

Ryan

Ryan Wallace wrote:
> 
> UPDATE pgweb SET textsearchable_index_col =
>      to_tsvector('english', coalesce(title,'') || coalesce(body,''));

> WHERE textsearchable_index_col @@ to_tsquery('create & table')

> Using this approach. Is there any way of retrieving which of the original
> two columns the match was found in?

Afraid not - you're not indexing two columns, you're indexing one: 
textsearchable_index_col.

You can add up to four weights to a tsvector though, typically for 
title/body matching. See chapter 12.3 for details.

Failing that, where I've had many (a dozen) different sources but want 
to search them all I've built a textsearch_blocks table with columns to 
identify the source and have triggers that keep it up to date.

--   Richard Huxton  Archonet Ltd
No virus found in this incoming message.
Checked by AVG - http://www.avg.com 
Version: 8.0.138 / Virus Database: 270.5.6/1579 - Release Date: 7/29/2008
6:43 AM



Re: Identifying which column matches a full text search

От
Richard Huxton
Дата:
Ryan Wallace wrote:
> Richard Huxton wrote:
>> Failing that, where I've had many (a dozen) different sources but want 
>> to search them all I've built a textsearch_blocks table with columns to 
>> identify the source and have triggers that keep it up to date.
> 
> Once you've built the text search blocks table, how do you search it? Do you
> perform
> twelve separate queries or can you just do one?

OK, you have a table something like:

fulltext_blocks (  section  varchar(32),  itemid   int4,  words    tsvector,them  PRIMARY KEY (section, itemid)
)

Now assume two of the things I search are "news" and "faqs". I'm 
assuming they've both got a simple serial pkey - if not, "itemid" above 
needs to be text and you'll have to cast.

For each target table (news, faqs) add a trigger that updates 
fulltext_blocks appropriately. This can include weighting title and body 
of a news article.

Then, search the fulltext_blocks table, optionally filtering by section. 
If you're going to have lots of results put the ids into a (perhapd 
temporary) results-table. Then join your results back to the original 
tables with the appropriate UNION (if you need to - it might be you 
fetch results one at a time elsewhere in your app).

SELECT n.id, n.title, n.body
FROM news n JOIN results r ON n.id=r.id
WHERE r.section='news'
UNION ALL
SELECT f.id, f,question, f.answer
FROM faqs f JOIN results r ON f.id=r.id
WHERE r.section='faqs'
;

You'll probably want to set ownership/permissions on the triggers / 
fulltext_blocks table so you can't accidentally update it directly.

In mine I even had a "documents" section which relied on an external 
cron-driven script to strip the first 32k of text out of uploaded 
documents (pdf,word) in addition to user-supplied metadata (title, summary).

Note - this is basically simulating what we could do if you could index 
a view. The fulltext_blocks table is nothing more than a materialised view.

HTH

--   Richard Huxton  Archonet Ltd