Обсуждение: Slow TSearch2 performance for table with 1 million documents.
Hi, I have very slow performance for a TSearch2 table. I have pasted the EXPLAIN ANALYZE queries below. 12 seconds is slow for almost any purpose. Is there any way to speed this up? # explain analyze select * FROM fulltext_article, to_tsquery ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------ Sort (cost=6576.74..6579.07 rows=933 width=774) (actual time=12969.237..12970.490 rows=5119 loops=1) Sort Key: rank(fulltext_article.idxfti, q.q) -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) (actual time=209.513..12955.498 rows=5119 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.005..0.006 rows=1 loops=1) -> Bitmap Heap Scan on fulltext_article (cost=3069.79..6516.70 rows=933 width=742) (actual time=209.322..234.390 rows=5119 loops=1) Recheck Cond: (fulltext_article.idxfti @@ q.q) -> Bitmap Index Scan on fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 rows=5119 loops=1) Index Cond: (fulltext_article.idxfti @@ q.q) Total runtime: 12973.035 ms (9 rows) # select count(*) from fulltext_article; count -------- 933001 (1 row) # select COUNT(*) FROM fulltext_article, to_tsquery('simple','blue & green') AS q WHERE idxfti @@ q; count ------- 6308 (1 row) Benjamin
Benjamin Arai wrote: > Hi, > > I have very slow performance for a TSearch2 table. I have pasted the > EXPLAIN ANALYZE queries below. 12 seconds is slow for almost any > purpose. Is there any way to speed this up? > > # explain analyze select * FROM fulltext_article, > to_tsquery('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, > q) DESC; Admittedly I'm kind of new to tsearch, but wouldn't SELECT * FROM fulltext_article WHERE idxfti @@ to_tsquery('simple','dog') ORDER BY rank(idxfti, to_tsquery('simple', 'dog')) DESC; be faster? Quick testing shows a similar query in our database to not use a nested loop and a function scan. For comparison, here are our plans: Your approach: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=4.86..4.87 rows=1 width=164) (actual time=0.151..0.161 rows=5 loops=1) Sort Key: rank(fulltext_article.idxfti, q.q) -> Nested Loop (cost=0.00..4.85 rows=1 width=164) (actual time=0.067..0.119 rows=5 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.010..0.012 rows=1 loops=1) -> Index Scan using fulltext_article_idxfti_idx on fulltext_article (cost=0.00..4.82 rows=1 width=132) (actual time=0.033..0.056 rows=5 loops=1) Index Cond: (fulltext_article.idxfti @@ "outer".q) Filter: (fulltext_article.idxfti @@ "outer".q) Total runtime: 0.242 ms (8 rows) My suggested approach: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=4.84..4.84 rows=1 width=132) (actual time=0.085..0.095 rows=5 loops=1) Sort Key: rank(idxfti, '''dog'''::tsquery) -> Index Scan using fulltext_article_idxfti_idx on fulltext_article (cost=0.00..4.83 rows=1 width=132) (actual time=0.025..0.052 rows=5 loops=1) Index Cond: (idxfti @@ '''dog'''::tsquery) Filter: (idxfti @@ '''dog'''::tsquery) Total runtime: 0.163 ms (6 rows) I hope this helps. -- Alban Hertroys a.hertroys@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
Benjamin Arai <benjamin@araisoft.com> writes: > # explain analyze select * FROM fulltext_article, to_tsquery > ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC; > QUERY PLAN > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > ------------ > Sort (cost=6576.74..6579.07 rows=933 width=774) (actual > time=12969.237..12970.490 rows=5119 loops=1) > Sort Key: rank(fulltext_article.idxfti, q.q) > -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) > (actual time=209.513..12955.498 rows=5119 loops=1) > -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) > (actual time=0.005..0.006 rows=1 loops=1) > -> Bitmap Heap Scan on fulltext_article > (cost=3069.79..6516.70 rows=933 width=742) (actual > time=209.322..234.390 rows=5119 loops=1) > Recheck Cond: (fulltext_article.idxfti @@ q.q) > -> Bitmap Index Scan on fulltext_article_idxfti_idx > (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 > rows=5119 loops=1) > Index Cond: (fulltext_article.idxfti @@ q.q) > Total runtime: 12973.035 ms > (9 rows) The time seems all spent at the join step, which is odd because it really hasn't got much to do. AFAICS all it has to do is compute the rank() values that the sort step will use. Is it possible that rank() is really slow? regards, tom lane
On Fri, 5 Oct 2007, Tom Lane wrote: > Benjamin Arai <benjamin@araisoft.com> writes: >> # explain analyze select * FROM fulltext_article, to_tsquery >> ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC; > >> QUERY PLAN >> ------------------------------------------------------------------------ >> ------------------------------------------------------------------------ >> ------------ >> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual >> time=12969.237..12970.490 rows=5119 loops=1) >> Sort Key: rank(fulltext_article.idxfti, q.q) >> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) >> (actual time=209.513..12955.498 rows=5119 loops=1) >> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) >> (actual time=0.005..0.006 rows=1 loops=1) >> -> Bitmap Heap Scan on fulltext_article >> (cost=3069.79..6516.70 rows=933 width=742) (actual >> time=209.322..234.390 rows=5119 loops=1) >> Recheck Cond: (fulltext_article.idxfti @@ q.q) >> -> Bitmap Index Scan on fulltext_article_idxfti_idx >> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 >> rows=5119 loops=1) >> Index Cond: (fulltext_article.idxfti @@ q.q) >> Total runtime: 12973.035 ms >> (9 rows) > > The time seems all spent at the join step, which is odd because it > really hasn't got much to do. AFAICS all it has to do is compute the > rank() values that the sort step will use. Is it possible that > rank() is really slow? can you try rank_cd() instead ? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Oct 5, 2007, at 8:32 AM, Oleg Bartunov wrote: > On Fri, 5 Oct 2007, Tom Lane wrote: > >> Benjamin Arai <benjamin@araisoft.com> writes: >>> # explain analyze select * FROM fulltext_article, to_tsquery >>> ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) >>> DESC; >> >>> QUERY PLAN >>> -------------------------------------------------------------------- >>> ---- >>> -------------------------------------------------------------------- >>> ---- >>> ------------ >>> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual >>> time=12969.237..12970.490 rows=5119 loops=1) >>> Sort Key: rank(fulltext_article.idxfti, q.q) >>> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) >>> (actual time=209.513..12955.498 rows=5119 loops=1) >>> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) >>> (actual time=0.005..0.006 rows=1 loops=1) >>> -> Bitmap Heap Scan on fulltext_article >>> (cost=3069.79..6516.70 rows=933 width=742) (actual >>> time=209.322..234.390 rows=5119 loops=1) >>> Recheck Cond: (fulltext_article.idxfti @@ q.q) >>> -> Bitmap Index Scan on fulltext_article_idxfti_idx >>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 >>> rows=5119 loops=1) >>> Index Cond: (fulltext_article.idxfti @@ q.q) >>> Total runtime: 12973.035 ms >>> (9 rows) >> >> The time seems all spent at the join step, which is odd because it >> really hasn't got much to do. AFAICS all it has to do is compute the >> rank() values that the sort step will use. Is it possible that >> rank() is really slow? > > can you try rank_cd() instead ? > Using Rank: -# ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------ Sort (cost=6576.74..6579.07 rows=933 width=774) (actual time=98083.081..98084.351 rows=5119 loops=1) Sort Key: rank(fulltext_article.idxfti, q.q) -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) (actual time=479.122..98067.594 rows=5119 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.003..0.004 rows=1 loops=1) -> Bitmap Heap Scan on fulltext_article (cost=3069.79..6516.70 rows=933 width=742) (actual time=341.739..37112.110 rows=5119 loops=1) Recheck Cond: (fulltext_article.idxfti @@ q.q) -> Bitmap Index Scan on fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) (actual time=321.443..321.443 rows=5119 loops=1) Index Cond: (fulltext_article.idxfti @@ q.q) Total runtime: 98087.575 ms (9 rows) Using Rank_cd: # explain analyze select * FROM fulltext_article, to_tsquery ('simple','cat') AS q WHERE idxfti @@ q ORDER BY rank_cd(idxfti, q) DESC; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------- Sort (cost=6576.74..6579.07 rows=933 width=774) (actual time=199316.648..199324.631 rows=26054 loops=1) Sort Key: rank_cd(fulltext_article.idxfti, q.q) -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) (actual time=871.428..199244.330 rows=26054 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=1) -> Bitmap Heap Scan on fulltext_article (cost=3069.79..6516.70 rows=933 width=742) (actual time=850.674..50146.477 rows=26054 loops=1) Recheck Cond: (fulltext_article.idxfti @@ q.q) -> Bitmap Index Scan on fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) (actual time=838.120..838.120 rows=26054 loops=1) Index Cond: (fulltext_article.idxfti @@ q.q) Total runtime: 199338.297 ms (9 rows) > >> >> regards, tom lane >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 >
It appears that the ORDER BY rank operation is the slowing factor. If I remove it then the query is pretty fast. Is there another way to perform ORDER BY such that it does not do a sort? Benjamin On Oct 5, 2007, at 3:57 PM, Benjamin Arai wrote: > > On Oct 5, 2007, at 8:32 AM, Oleg Bartunov wrote: > >> On Fri, 5 Oct 2007, Tom Lane wrote: >> >>> Benjamin Arai <benjamin@araisoft.com> writes: >>>> # explain analyze select * FROM fulltext_article, to_tsquery >>>> ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, >>>> q) DESC; >>> >>>> QUERY PLAN >>>> ------------------------------------------------------------------- >>>> ----- >>>> ------------------------------------------------------------------- >>>> ----- >>>> ------------ >>>> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual >>>> time=12969.237..12970.490 rows=5119 loops=1) >>>> Sort Key: rank(fulltext_article.idxfti, q.q) >>>> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) >>>> (actual time=209.513..12955.498 rows=5119 loops=1) >>>> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) >>>> (actual time=0.005..0.006 rows=1 loops=1) >>>> -> Bitmap Heap Scan on fulltext_article >>>> (cost=3069.79..6516.70 rows=933 width=742) (actual >>>> time=209.322..234.390 rows=5119 loops=1) >>>> Recheck Cond: (fulltext_article.idxfti @@ q.q) >>>> -> Bitmap Index Scan on >>>> fulltext_article_idxfti_idx >>>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 >>>> rows=5119 loops=1) >>>> Index Cond: (fulltext_article.idxfti @@ q.q) >>>> Total runtime: 12973.035 ms >>>> (9 rows) >>> >>> The time seems all spent at the join step, which is odd because it >>> really hasn't got much to do. AFAICS all it has to do is compute >>> the >>> rank() values that the sort step will use. Is it possible that >>> rank() is really slow? >> >> can you try rank_cd() instead ? >> > Using Rank: > > -# ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, > q) DESC; > > QUERY PLAN > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------- > Sort (cost=6576.74..6579.07 rows=933 width=774) (actual > time=98083.081..98084.351 rows=5119 loops=1) > Sort Key: rank(fulltext_article.idxfti, q.q) > -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) > (actual time=479.122..98067.594 rows=5119 loops=1) > -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) > (actual time=0.003..0.004 rows=1 loops=1) > -> Bitmap Heap Scan on fulltext_article > (cost=3069.79..6516.70 rows=933 width=742) (actual > time=341.739..37112.110 rows=5119 loops=1) > Recheck Cond: (fulltext_article.idxfti @@ q.q) > -> Bitmap Index Scan on > fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) > (actual time=321.443..321.443 rows=5119 loops=1) > Index Cond: (fulltext_article.idxfti @@ q.q) > Total runtime: 98087.575 ms > (9 rows) > > Using Rank_cd: > > # explain analyze select * FROM fulltext_article, to_tsquery > ('simple','cat') AS q WHERE idxfti @@ q ORDER BY rank_cd(idxfti, > q) DESC; > > QUERY PLAN > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ----------------- > Sort (cost=6576.74..6579.07 rows=933 width=774) (actual > time=199316.648..199324.631 rows=26054 loops=1) > Sort Key: rank_cd(fulltext_article.idxfti, q.q) > -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) > (actual time=871.428..199244.330 rows=26054 loops=1) > -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) > (actual time=0.006..0.007 rows=1 loops=1) > -> Bitmap Heap Scan on fulltext_article > (cost=3069.79..6516.70 rows=933 width=742) (actual > time=850.674..50146.477 rows=26054 loops=1) > Recheck Cond: (fulltext_article.idxfti @@ q.q) > -> Bitmap Index Scan on > fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) > (actual time=838.120..838.120 rows=26054 loops=1) > Index Cond: (fulltext_article.idxfti @@ q.q) > Total runtime: 199338.297 ms > (9 rows) > >> >>> >>> regards, tom lane >>> >>> ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 5: don't forget to increase your free space map settings >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet >> (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> >
Benjamin Arai <me@benjaminarai.com> writes: > It appears that the ORDER BY rank operation is the slowing factor. > If I remove it then the query is pretty fast. Is there another way > to perform ORDER BY such that it does not do a sort? I think you misunderstood: it's not the sort that's slow, it's the computation of the rank() values that are inputs to the sort. regards, tom lane
Oh, I see. I didn't look carefully at the EXPLAIN ANALYZE I posted. So, is there a solution to the rank problem? Benjamin On Oct 11, 2007, at 8:53 AM, Tom Lane wrote: > Benjamin Arai <me@benjaminarai.com> writes: >> It appears that the ORDER BY rank operation is the slowing factor. >> If I remove it then the query is pretty fast. Is there another way >> to perform ORDER BY such that it does not do a sort? > > I think you misunderstood: it's not the sort that's slow, it's the > computation of the rank() values that are inputs to the sort. > > regards, tom lane >