Обсуждение: Slow TSearch2 performance for table with 1 million documents.
Hi, I have very slow performance for a TSearch2 table. I have pasted the EXPLAIN ANALYZE queries below. 12 seconds is slow for almost any purpose. Is there any way to speed this up? # explain analyze select * FROM fulltext_article, to_tsquery ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------ Sort (cost=6576.74..6579.07 rows=933 width=774) (actual time=12969.237..12970.490 rows=5119 loops=1) Sort Key: rank(fulltext_article.idxfti, q.q) -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) (actual time=209.513..12955.498 rows=5119 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.005..0.006 rows=1 loops=1) -> Bitmap Heap Scan on fulltext_article (cost=3069.79..6516.70 rows=933 width=742) (actual time=209.322..234.390 rows=5119 loops=1) Recheck Cond: (fulltext_article.idxfti @@ q.q) -> Bitmap Index Scan on fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 rows=5119 loops=1) Index Cond: (fulltext_article.idxfti @@ q.q) Total runtime: 12973.035 ms (9 rows) # select count(*) from fulltext_article; count -------- 933001 (1 row) # select COUNT(*) FROM fulltext_article, to_tsquery('simple','blue & green') AS q WHERE idxfti @@ q; count ------- 6308 (1 row) Benjamin
Benjamin Arai <benjamin@araisoft.com> writes: > # explain analyze select * FROM fulltext_article, to_tsquery > ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC; > QUERY PLAN > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > ------------ > Sort (cost=6576.74..6579.07 rows=933 width=774) (actual > time=12969.237..12970.490 rows=5119 loops=1) > Sort Key: rank(fulltext_article.idxfti, q.q) > -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) > (actual time=209.513..12955.498 rows=5119 loops=1) > -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) > (actual time=0.005..0.006 rows=1 loops=1) > -> Bitmap Heap Scan on fulltext_article > (cost=3069.79..6516.70 rows=933 width=742) (actual > time=209.322..234.390 rows=5119 loops=1) > Recheck Cond: (fulltext_article.idxfti @@ q.q) > -> Bitmap Index Scan on fulltext_article_idxfti_idx > (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 > rows=5119 loops=1) > Index Cond: (fulltext_article.idxfti @@ q.q) > Total runtime: 12973.035 ms > (9 rows) The time seems all spent at the join step, which is odd because it really hasn't got much to do. AFAICS all it has to do is compute the rank() values that the sort step will use. Is it possible that rank() is really slow? regards, tom lane
On Fri, 5 Oct 2007, Tom Lane wrote: > Benjamin Arai <benjamin@araisoft.com> writes: >> # explain analyze select * FROM fulltext_article, to_tsquery >> ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC; > >> QUERY PLAN >> ------------------------------------------------------------------------ >> ------------------------------------------------------------------------ >> ------------ >> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual >> time=12969.237..12970.490 rows=5119 loops=1) >> Sort Key: rank(fulltext_article.idxfti, q.q) >> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) >> (actual time=209.513..12955.498 rows=5119 loops=1) >> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) >> (actual time=0.005..0.006 rows=1 loops=1) >> -> Bitmap Heap Scan on fulltext_article >> (cost=3069.79..6516.70 rows=933 width=742) (actual >> time=209.322..234.390 rows=5119 loops=1) >> Recheck Cond: (fulltext_article.idxfti @@ q.q) >> -> Bitmap Index Scan on fulltext_article_idxfti_idx >> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 >> rows=5119 loops=1) >> Index Cond: (fulltext_article.idxfti @@ q.q) >> Total runtime: 12973.035 ms >> (9 rows) > > The time seems all spent at the join step, which is odd because it > really hasn't got much to do. AFAICS all it has to do is compute the > rank() values that the sort step will use. Is it possible that > rank() is really slow? can you try rank_cd() instead ? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Oct 5, 2007, at 8:32 AM, Oleg Bartunov wrote: > On Fri, 5 Oct 2007, Tom Lane wrote: > >> Benjamin Arai <benjamin@araisoft.com> writes: >>> # explain analyze select * FROM fulltext_article, to_tsquery >>> ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) >>> DESC; >> >>> QUERY PLAN >>> -------------------------------------------------------------------- >>> ---- >>> -------------------------------------------------------------------- >>> ---- >>> ------------ >>> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual >>> time=12969.237..12970.490 rows=5119 loops=1) >>> Sort Key: rank(fulltext_article.idxfti, q.q) >>> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) >>> (actual time=209.513..12955.498 rows=5119 loops=1) >>> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) >>> (actual time=0.005..0.006 rows=1 loops=1) >>> -> Bitmap Heap Scan on fulltext_article >>> (cost=3069.79..6516.70 rows=933 width=742) (actual >>> time=209.322..234.390 rows=5119 loops=1) >>> Recheck Cond: (fulltext_article.idxfti @@ q.q) >>> -> Bitmap Index Scan on fulltext_article_idxfti_idx >>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 >>> rows=5119 loops=1) >>> Index Cond: (fulltext_article.idxfti @@ q.q) >>> Total runtime: 12973.035 ms >>> (9 rows) >> >> The time seems all spent at the join step, which is odd because it >> really hasn't got much to do. AFAICS all it has to do is compute the >> rank() values that the sort step will use. Is it possible that >> rank() is really slow? > > can you try rank_cd() instead ? > Using Rank: -# ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------ Sort (cost=6576.74..6579.07 rows=933 width=774) (actual time=98083.081..98084.351 rows=5119 loops=1) Sort Key: rank(fulltext_article.idxfti, q.q) -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) (actual time=479.122..98067.594 rows=5119 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.003..0.004 rows=1 loops=1) -> Bitmap Heap Scan on fulltext_article (cost=3069.79..6516.70 rows=933 width=742) (actual time=341.739..37112.110 rows=5119 loops=1) Recheck Cond: (fulltext_article.idxfti @@ q.q) -> Bitmap Index Scan on fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) (actual time=321.443..321.443 rows=5119 loops=1) Index Cond: (fulltext_article.idxfti @@ q.q) Total runtime: 98087.575 ms (9 rows) Using Rank_cd: # explain analyze select * FROM fulltext_article, to_tsquery ('simple','cat') AS q WHERE idxfti @@ q ORDER BY rank_cd(idxfti, q) DESC; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------- Sort (cost=6576.74..6579.07 rows=933 width=774) (actual time=199316.648..199324.631 rows=26054 loops=1) Sort Key: rank_cd(fulltext_article.idxfti, q.q) -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) (actual time=871.428..199244.330 rows=26054 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=1) -> Bitmap Heap Scan on fulltext_article (cost=3069.79..6516.70 rows=933 width=742) (actual time=850.674..50146.477 rows=26054 loops=1) Recheck Cond: (fulltext_article.idxfti @@ q.q) -> Bitmap Index Scan on fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) (actual time=838.120..838.120 rows=26054 loops=1) Index Cond: (fulltext_article.idxfti @@ q.q) Total runtime: 199338.297 ms (9 rows) > >> >> regards, tom lane >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 >
It appears that the ORDER BY rank operation is the slowing factor. If I remove it then the query is pretty fast. Is there another way to perform ORDER BY such that it does not do a sort? Benjamin On Oct 5, 2007, at 3:57 PM, Benjamin Arai wrote: > > On Oct 5, 2007, at 8:32 AM, Oleg Bartunov wrote: > >> On Fri, 5 Oct 2007, Tom Lane wrote: >> >>> Benjamin Arai <benjamin@araisoft.com> writes: >>>> # explain analyze select * FROM fulltext_article, to_tsquery >>>> ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, >>>> q) DESC; >>> >>>> QUERY PLAN >>>> ------------------------------------------------------------------- >>>> ----- >>>> ------------------------------------------------------------------- >>>> ----- >>>> ------------ >>>> Sort (cost=6576.74..6579.07 rows=933 width=774) (actual >>>> time=12969.237..12970.490 rows=5119 loops=1) >>>> Sort Key: rank(fulltext_article.idxfti, q.q) >>>> -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) >>>> (actual time=209.513..12955.498 rows=5119 loops=1) >>>> -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) >>>> (actual time=0.005..0.006 rows=1 loops=1) >>>> -> Bitmap Heap Scan on fulltext_article >>>> (cost=3069.79..6516.70 rows=933 width=742) (actual >>>> time=209.322..234.390 rows=5119 loops=1) >>>> Recheck Cond: (fulltext_article.idxfti @@ q.q) >>>> -> Bitmap Index Scan on >>>> fulltext_article_idxfti_idx >>>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373 >>>> rows=5119 loops=1) >>>> Index Cond: (fulltext_article.idxfti @@ q.q) >>>> Total runtime: 12973.035 ms >>>> (9 rows) >>> >>> The time seems all spent at the join step, which is odd because it >>> really hasn't got much to do. AFAICS all it has to do is compute >>> the >>> rank() values that the sort step will use. Is it possible that >>> rank() is really slow? >> >> can you try rank_cd() instead ? >> > Using Rank: > > -# ('simple','dog') AS q WHERE idxfti @@ q ORDER BY rank(idxfti, > q) DESC; > > QUERY PLAN > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ---------------- > Sort (cost=6576.74..6579.07 rows=933 width=774) (actual > time=98083.081..98084.351 rows=5119 loops=1) > Sort Key: rank(fulltext_article.idxfti, q.q) > -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) > (actual time=479.122..98067.594 rows=5119 loops=1) > -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) > (actual time=0.003..0.004 rows=1 loops=1) > -> Bitmap Heap Scan on fulltext_article > (cost=3069.79..6516.70 rows=933 width=742) (actual > time=341.739..37112.110 rows=5119 loops=1) > Recheck Cond: (fulltext_article.idxfti @@ q.q) > -> Bitmap Index Scan on > fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) > (actual time=321.443..321.443 rows=5119 loops=1) > Index Cond: (fulltext_article.idxfti @@ q.q) > Total runtime: 98087.575 ms > (9 rows) > > Using Rank_cd: > > # explain analyze select * FROM fulltext_article, to_tsquery > ('simple','cat') AS q WHERE idxfti @@ q ORDER BY rank_cd(idxfti, > q) DESC; > > QUERY PLAN > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > ----------------- > Sort (cost=6576.74..6579.07 rows=933 width=774) (actual > time=199316.648..199324.631 rows=26054 loops=1) > Sort Key: rank_cd(fulltext_article.idxfti, q.q) > -> Nested Loop (cost=3069.79..6530.71 rows=933 width=774) > (actual time=871.428..199244.330 rows=26054 loops=1) > -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) > (actual time=0.006..0.007 rows=1 loops=1) > -> Bitmap Heap Scan on fulltext_article > (cost=3069.79..6516.70 rows=933 width=742) (actual > time=850.674..50146.477 rows=26054 loops=1) > Recheck Cond: (fulltext_article.idxfti @@ q.q) > -> Bitmap Index Scan on > fulltext_article_idxfti_idx (cost=0.00..3069.56 rows=933 width=0) > (actual time=838.120..838.120 rows=26054 loops=1) > Index Cond: (fulltext_article.idxfti @@ q.q) > Total runtime: 199338.297 ms > (9 rows) > >> >>> >>> regards, tom lane >>> >>> ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 5: don't forget to increase your free space map settings >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet >> (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> >
Benjamin Arai <me@benjaminarai.com> writes: > It appears that the ORDER BY rank operation is the slowing factor. > If I remove it then the query is pretty fast. Is there another way > to perform ORDER BY such that it does not do a sort? I think you misunderstood: it's not the sort that's slow, it's the computation of the rank() values that are inputs to the sort. regards, tom lane
Oh, I see. I didn't look carefully at the EXPLAIN ANALYZE I posted. So, is there a solution to the rank problem? Benjamin On Oct 11, 2007, at 8:53 AM, Tom Lane wrote: > Benjamin Arai <me@benjaminarai.com> writes: >> It appears that the ORDER BY rank operation is the slowing factor. >> If I remove it then the query is pretty fast. Is there another way >> to perform ORDER BY such that it does not do a sort? > > I think you misunderstood: it's not the sort that's slow, it's the > computation of the rank() values that are inputs to the sort. > > regards, tom lane >