Обсуждение: PostgreSQL 9.1 : why is this query slow?
Hi All, I have a table with 665605 rows (counted, vacuum-ed): CREATE TABLE unique_words ( filename text NOT NULL, filetype text NOT NULL, word text NOT NULL, count integer,) The query is: select f.word , count(f.word) from unique_words as f, unique_words as s , unique_words as n where (f.word = s.word and s.word = n.word) and (f.filetype = 'f' and s.filetype = 's' and n.filetype = 'n') group by f.word Explain says: "GroupAggregate (cost=0.00..67237557.88 rows=1397 width=6)" " -> Nested Loop (cost=0.00..27856790.31 rows=7876150720 width=6)" " -> Nested Loop (cost=0.00..118722.04 rows=14770776 width=12)" " -> Index Scan using idx_unique_words_filetype_word on unique_words f (cost=0.00..19541.47 rows=92098 width=6)" " Index Cond: (filetype = 'f'::text)" " -> Index Scan using idx_unique_words_filetype_word on unique_words s (cost=0.00..0.91 rows=13 width=6)" " Index Cond: ((filetype = 's'::text) AND (word = f.word))" " -> Index Scan using idx_unique_words_filetype_word on unique_words n (cost=0.00..1.33 rows=44 width=6)" " Index Cond: ((filetype = 'n'::text) AND (word = f.word))" The right answer should be 3808 different words (according to a Java program I wrote). This query takes more than 1 hour (after which I cancelled the query). My questions are: - Is this to be expected? - Especially as the query over just 1 join takes 32 secs? (on f.word = s.word omitting everything for n ) - Why does explain say it takes "7876150720 rows"? - Is there a way to rephrase the query that makes it faster? - Could another table layout help (f,s,n are all possibilities for filetype)? - Anything else????? TIA -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 web: www.askesis.nl
On Mon, 2011-11-28 at 17:42 +0100, Joost Kraaijeveld wrote: > - Is there a way to rephrase the query that makes it faster? This query goes faster (6224 ms, but I am not sure it gives the correct answer as the result differs from my Java program): select word, count (word) from unique_words where word in (select word from unique_words where word in ( select word from unique_words where filetype = 'f') and filetype = 's') and filetype = 'n' group by word -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> wrote: > This query goes faster (6224 ms, but I am not sure it gives the > correct answer as the result differs from my Java program): It seems clear that you want to see words which appear with all three types of files, but it's not clear what you want the count to represent. The number of times the word appears in filetype 'n' references (as specified in your second query)? The number of permutations of documents which incorporate one 'f' document, one 's' document, and one 'n' document (as specified in your first query). Something else, like the total number of times the word appears? -Kevin
On Mon, 2011-11-28 at 11:05 -0600, Kevin Grittner wrote: > Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> wrote: > > > This query goes faster (6224 ms, but I am not sure it gives the > > correct answer as the result differs from my Java program): > > It seems clear that you want to see words which appear with all > three types of files, but it's not clear what you want the count to > represent. The number of times the word appears in filetype 'n' > references (as specified in your second query)? The number of > permutations of documents which incorporate one 'f' document, one > 's' document, and one 'n' document (as specified in your first > query). Something else, like the total number of times the word > appears? I would like the answer to be "the number of times the word appears in all three the queries", the intersection of the three queries. -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> wrote: > I would like the answer to be "the number of times the word > appears in all three the queries", the intersection of the three > queries. That's still not entirely clear to me. If there are two 'f' rows, three 's' rows, and four 'n' rows, do you want to see an answer of 2 (which seems like the intersection you request here), 9 (which is the sum), 24 (which is the product), or something else? If you really want the intersection, perhaps: with x as ( select word, count(*) as countall, count(case when filetype = 'f' then 1 else null end) as countf, count(case when filetype = 's' then 1 else null end) as as counts, count(case when filetype = 'n' then 1 else null end) as as countn from unique_words ) select word, least(countf, counts, countn) from x where countf > 0 and counts > 0 and countn > 0 order by word; -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > If you really want the intersection, perhaps: Or maybe closer: with x as ( select word, count(*) as countall, count(case when filetype = 'f' then 1 else null end) as countf, count(case when filetype = 's' then 1 else null end) as counts, count(case when filetype = 'n' then 1 else null end) as countn from unique_words group by word ) select word, least(countf, counts, countn) from x where countf > 0 and counts > 0 and countn > 0 order by word; Cranked out rather quickly and untested. -Kevin
On 28.11.2011 17:42, Joost Kraaijeveld wrote: > - Why does explain say it takes "7876150720 rows"? Any idea where this number came from? No matter what I do, the nested loop row estimates are alway very close to the product of the two estimates (outer rows * inner rows). Tomas
On Mon, 2011-11-28 at 11:36 -0600, Kevin Grittner wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote: > > > If you really want the intersection, perhaps: > > Or maybe closer: > > with x as > ( > select > word, > count(*) as countall, > count(case when filetype = 'f' then 1 else null end) > as countf, > count(case when filetype = 's' then 1 else null end) > as counts, > count(case when filetype = 'n' then 1 else null end) > as countn > from unique_words > group by word > ) > select word, least(countf, counts, countn) from x > where countf > 0 and counts > 0 and countn > 0 > order by word; > > Cranked out rather quickly and untested. I tested it and it worked as advertised. Takes ~ 3 secs to complete. Thanks. -- Groeten, Joost Kraaijeveld Askesis B.V. Molukkenstraat 14 6524NB Nijmegen tel: 024-3888063 / 06-51855277 fax: 024-3608416 web: www.askesis.nl