FTI, paged, ranked searching and efficiency.

Поиск
Список
Период
Сортировка
От Paul
Тема FTI, paged, ranked searching and efficiency.
Дата
Msg-id 3A36DC72@operamail.com
обсуждение исходный текст
Список pgsql-sql
Hello,

This is going to be a bit long, I hope some of you will take the
trouble to read it :)

I am building a search engine for a section of a (PHP based) website.
I wish the user to be able to a number of words in the search, and the
search results to be ranked by the number of times words occur (both
different words and the same word occuring multiple times are good).

My (simplified) table structure is this:
======        Table "entry_fti"Attribute |    Type     | Modifier
-----------+-------------+----------string    | varchar(25) |id        | oid         |
Index: entry_fti_string_idx
                                  Table "entry"      Attribute       | Type  |                    Modifier
-----------------------+-------+----------------------------------------------
-entry_id              |integer| not null default 
nextval('entry_id_seq'::text)entry_name            |text   |entry_description_html|text   |entry_image_id
|integer|not null default 0entry_tn_image_id     |integer| not null default 0entry_live            |boolean| not null
default't'
 
Index: entry_pkey
                              Table "image"Attribute  |    Type     |                    Modifier
------------+-------------+------------------------------------------------image_id   | integer     | not null default
nextval('image_id_seq'::text)image_name| varchar(32) |height     | integer     | not nullwidth      | integer     | not
null
Indices:  image_pkey
======

And my (simplified) query looks like this:
======
SELECT   COUNT(entry_fti.id) AS rating,        entry.entry_name AS name,        entry.entry_id AS id,
entry.entry_description_htmlAS description_html,        image.image_name AS thumb1_name,        image.height AS
thumb1_height,       image.width AS thumb1_width
 
FROM     entry, entry_fti, image
WHERE    entry_fti.id=entry.oid AND    entry.entrytn_image_id=image.image_id AND    entry.entry_live = 't'::bool AND
(        entry_fti.string ~'^word1'         OR         entry_fti.string ~'^word2'         OR                      .
                .         OR         entry_fti.string ~'^wordn'        ) 
 
GROUP BY entry.entry_id,        entry.entry_name,        entry.entry_description_html,        image.image_name,
image.height,       image.width
 
ORDER BY rating DESC 
======

Now this all works, which is good. My problem now is that I want to 
limit the number of results shown on a page to 20 and show the number
of pages of extra results, much like you'd see on any search engine site.
Naturally I immediatly thought of the LIMIT and OFFSET clauses, but then:
a) I'd need to do an extra query, to find out the total number of results  to show the number of pages on the webpage.
b) I have no idea how to write that query. It'd be a COUNT of 'rating'  in the above, which would be a
COUNT(COUNT(entry_fti.id))which  would probably require some hideous (and not legal?) double GROUP  BY construct.
Ouch.

So basically, LIMIT/OFFSET looks like a no go. This leaves me with just
doing the above query, and using PHP to jump to a particular row in the
results depending on what page you are on and pg_numrows() to 
calculate the number of pages.

Would that be particularly inefficient? 
Should I be looking harder for a LIMIT/OFFSET based solution?

Perhaps I'd be better off splitting it into two queries, one to just
get the entry_id list in order, then another query to pull out the
rest of the information for the 20 of those entry_ids that are on the results 
page I wish to show?
That would stop Postgres from gathering so much information that I am just 
going to throw away anyway without looking at.

Any ideas? Have I missed something obvious that will help me? Or better yet, 
can someone who has done this sort of thing before tell me whether I am on the 
right track?

Paul



В списке pgsql-sql по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Using Array-Values in subselect
Следующее
От: Alvar Freude
Дата:
Сообщение: Re: Using Array-Values in subselect