Обсуждение: SubQuery speed
Hi, I'm writing a indexed web search, stuffing some data into a few pgsql tables. The query involved can look something like: BEGIN; DECLARE CURSOR curs FOR SELECT url,title FROM idx_files WHERE id IN ( SELECT file_id FROM idx_ref WHERE word_id IN ( SELECT id FROM idx_words WHERE word = 'search term 1' OR .. ) ); FETCH FORWARD 10 IN curs; END; The idx_files table holds about 1.000 rows, the idx_words table holds about 50.000 words, and the idx_ref is a massive 250.000+ row cross-reference (file_id, word_id) which links the words and pages together. A query like the above took several hours on my Pentium 200 at work before I killed the stuck daemon. This is kind of strange since the innermost subquery is only suppose to return a single id. :-( Are there any alternate methods of solving this? I don't think it's possible to write this kind of query using a multi-JOIN? Thanks for your input, --- Andreas `dep' Fredriksson andreas.fredriksson@pacer.se Programmerare, Pacer Communication www.pacer.se, 08-665 34 10 Linux: Because rebooting is for adding new hardware.
At 16:45 +0200 on 8/9/98, andreas.fredriksson@pacer.se wrote: > Are there any alternate methods of solving this? I don't think it's > possible to write this kind of query using a multi-JOIN? Why not? So long as the query is "in", and not "not in"? Should be something like: SELECT DISTINCT url,title FROM idx_files, idx_ref, idx_words WHERE idx_files.id = idx_ref.file_id AND idx_ref.word_id = idx_words.id AND (idx_words.word = 'search term 1' OR .. ); In any case, are there indices on the appropriate fields? If the innermost query is supposed to return only one tuple, you should use "=" rather than IN, although I don't know how much that adds. The optimizer probably has an easier time with "=" than with IN. Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma
andreas.fredriksson@pacer.se wrote: > > BEGIN; > DECLARE CURSOR curs FOR > SELECT url,title FROM idx_files WHERE id IN ( > SELECT file_id FROM idx_ref WHERE word_id IN ( > SELECT id FROM idx_words WHERE word = 'search term 1' OR .. ) ); > FETCH FORWARD 10 IN curs; > END; > > The idx_files table holds about 1.000 rows, the idx_words table holds > about 50.000 words, and the idx_ref is a massive 250.000+ > row cross-reference (file_id, word_id) which links the words and > pages together. > > A query like the above took several hours on my Pentium 200 at work before > I killed the stuck daemon. This is kind of strange since the innermost > subquery is only suppose to return a single id. :-( > > Are there any alternate methods of solving this? I don't think it's > possible to write this kind of query using a multi-JOIN? Try to rewrite query using EXISTS instead of IN and create indices. Vadim