FTI, paged, ranked searching and efficiency.
От | Paul |
---|---|
Тема | FTI, paged, ranked searching and efficiency. |
Дата | |
Msg-id | 3A36DC72@operamail.com обсуждение исходный текст |
Список | pgsql-sql |
Hello, This is going to be a bit long, I hope some of you will take the trouble to read it :) I am building a search engine for a section of a (PHP based) website. I wish the user to be able to a number of words in the search, and the search results to be ranked by the number of times words occur (both different words and the same word occuring multiple times are good). My (simplified) table structure is this: ====== Table "entry_fti"Attribute | Type | Modifier -----------+-------------+----------string | varchar(25) |id | oid | Index: entry_fti_string_idx Table "entry" Attribute | Type | Modifier -----------------------+-------+---------------------------------------------- -entry_id |integer| not null default nextval('entry_id_seq'::text)entry_name |text |entry_description_html|text |entry_image_id |integer|not null default 0entry_tn_image_id |integer| not null default 0entry_live |boolean| not null default't' Index: entry_pkey Table "image"Attribute | Type | Modifier ------------+-------------+------------------------------------------------image_id | integer | not null default nextval('image_id_seq'::text)image_name| varchar(32) |height | integer | not nullwidth | integer | not null Indices: image_pkey ====== And my (simplified) query looks like this: ====== SELECT COUNT(entry_fti.id) AS rating, entry.entry_name AS name, entry.entry_id AS id, entry.entry_description_htmlAS description_html, image.image_name AS thumb1_name, image.height AS thumb1_height, image.width AS thumb1_width FROM entry, entry_fti, image WHERE entry_fti.id=entry.oid AND entry.entrytn_image_id=image.image_id AND entry.entry_live = 't'::bool AND ( entry_fti.string ~'^word1' OR entry_fti.string ~'^word2' OR . . OR entry_fti.string ~'^wordn' ) GROUP BY entry.entry_id, entry.entry_name, entry.entry_description_html, image.image_name, image.height, image.width ORDER BY rating DESC ====== Now this all works, which is good. My problem now is that I want to limit the number of results shown on a page to 20 and show the number of pages of extra results, much like you'd see on any search engine site. Naturally I immediatly thought of the LIMIT and OFFSET clauses, but then: a) I'd need to do an extra query, to find out the total number of results to show the number of pages on the webpage. b) I have no idea how to write that query. It'd be a COUNT of 'rating' in the above, which would be a COUNT(COUNT(entry_fti.id))which would probably require some hideous (and not legal?) double GROUP BY construct. Ouch. So basically, LIMIT/OFFSET looks like a no go. This leaves me with just doing the above query, and using PHP to jump to a particular row in the results depending on what page you are on and pg_numrows() to calculate the number of pages. Would that be particularly inefficient? Should I be looking harder for a LIMIT/OFFSET based solution? Perhaps I'd be better off splitting it into two queries, one to just get the entry_id list in order, then another query to pull out the rest of the information for the 20 of those entry_ids that are on the results page I wish to show? That would stop Postgres from gathering so much information that I am just going to throw away anyway without looking at. Any ideas? Have I missed something obvious that will help me? Or better yet, can someone who has done this sort of thing before tell me whether I am on the right track? Paul
В списке pgsql-sql по дате отправления: