Re: Full Text Search 101?

Поиск
Список
Период
Сортировка
От Aurynn Shaw
Тема Re: Full Text Search 101?
Дата
Msg-id 4AFC7E4A.7080305@commandprompt.com
обсуждение исходный текст
Ответ на Full Text Search 101?  (Jonathan <jharahush@gmail.com>)
Список pgsql-general
Hi Jonathan;

 >
 > I posted to this forum once before and was able to receive help.
 > Thanks again!
 >
 > I'm trying to implement full text search capabilities.  Basically, I
 > have a very simple "data catalog" type of website (http://
 > gis.drcog.org/datacatalog), where the user can type in a word or words
 > to search for records that match the criteria.  I have a table with a
 > few fields, but I want to be able to search/index two fields -- the
 > "name" of the dataset and the "description."  (or more, if I can index
 > fields from other tables too, that would be great).  I'd like to be
 > able to use a full text search to rank the results in terms of
 > relevance.

You can set up an index per table, but I've found that having a single
vector table with bridge tables to the data I want to search works well.
This would be in the style of:

create table vectors (
  id serial primary key,
  vector tsvector not null
);

create table interesting_data (
  id serial primary key,
  textual text not null
);

CREATE TABLE vector_to_interesting (
  v_id int not null references vectors(id),
  i_id int not null references interesting_data(id)
);

And then

SELECT id.*
   FROM interesting_data itd,
        vectors v,
        vector_to_interesting itv
  WHERE v.id = itv.v_id
    AND itv.i_id = itd.id
    AND v.vector @@ to_tsquery('searchpattern');

Allowing for some procedures around that that return multiple row types,
to the client software.

 >
 > What is the best way to handle this?  I've read through the PostgreSQL
 > documentation and don't quite understand it although I'm trying to
 > understand and am 'playing around' with this on a development server.
 > Is the default text search configuration enough? I've tried setting my
 > own configuration but get errors about not finding .dict dictionary
 > files?  I have a default install of PostgreSQL 8.4.
 >
 > Also, I've created a tsvector column and created indexes, but it
 > didn't seem to delete stop words from the indexes.  Should the stop
 > words be indexed?

The default "english" configuration in PostgreSQL should have done this.
Building your own configuration tends towards being a bit more advanced,
and "english" should suit most needs.

 >
 > Also, I don't quite understand how to create indexes but rank certain
 > words as being more important than others, for instance, maybe having
 > words that come from the "name" column carrying more importance than
 > words coming from the "description" column.

For part of this, there's ts_rank(tsvector, tsquery), as well as the
relative weighting system in the textsearch modules. Unfortunately, I
don't have much experience with the relative weighting, but, ts_rank()
should get you partly there.

I also found an interesting page on relative weighting @
http://www.postgresonline.com/special_feature.php?sf_name=postgresql83tsearch_cheatsheet&outputformat=html

that might be useful for you.

 > Finally, I'd like
 > "watersheds" to come up when someone searches for "water" so I don't
 > think I have this configured properly, because this record is not
 > returned.

Well, there's two ways to achieve this.
The first is to manually the stopwords, which is getting into the custom
configuration and a bit on the more advanced side.
The other is that in PG8.4, you can do

to_tsquery('water:*')

which will tell the search parser to do a partial match, which would
return "watershed", in this instance.
My testing has shown it to be a bit slower (30ms, vs 15ms for a
non-partial search), but not egregiously slow.

 >
 > Is there a good tutorial or maybe documentation that is a bit easier
 > to understand?  I know my database is far from complicated but I can't
 > seem to find a configuration that works well.  When I try ranking my
 > results, most of the results end up with the same rank, so something
 > must be wrong?
 >
 > Can someone point me in the right direction?
 >
 > Thanks for the help.  I appreciate it.
 >

Hope this helps. :)

Regards,
--
Aurynn Shaw

The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

ashaw@commandprompt.com

В списке pgsql-general по дате отправления:

Предыдущее
От: Michael Nacos
Дата:
Сообщение: Re: Full Text Search 101?
Следующее
От: Hans-Juergen Schoenig
Дата:
Сообщение: Re: [pgeu-general] pgday.eu