Обсуждение: Slow TSearch2 performance for table with 1 million documents.

Поиск
Список
Период
Сортировка

Slow TSearch2 performance for table with 1 million documents.

От
Benjamin Arai
Дата:
Hi,

I have very slow performance for a TSearch2 table.  I have pasted the
EXPLAIN ANALYZE queries below.  12 seconds is slow for almost any
purpose.  Is there any way to speed this up?

# explain analyze select * FROM fulltext_article, to_tsquery
('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC;

   QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
------------
Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
time=12969.237..12970.490 rows=5119 loops=1)
    Sort Key: rank(fulltext_article.idxfti, q.q)
    ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
(actual time=209.513..12955.498 rows=5119 loops=1)
          ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.005..0.006 rows=1 loops=1)
          ->  Bitmap Heap Scan on fulltext_article
(cost=3069.79..6516.70 rows=933 width=742) (actual
time=209.322..234.390 rows=5119 loops=1)
                Recheck Cond: (fulltext_article.idxfti @@ q.q)
                ->  Bitmap Index Scan on fulltext_article_idxfti_idx
(cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373
rows=5119 loops=1)
                      Index Cond: (fulltext_article.idxfti @@ q.q)
Total runtime: 12973.035 ms
(9 rows)

# select count(*) from fulltext_article;
count
--------
933001
(1 row)

# select COUNT(*) FROM fulltext_article, to_tsquery('simple','blue &
green') AS q  WHERE idxfti @@ q;
count
-------
   6308
(1 row)

Benjamin

Re: Slow TSearch2 performance for table with 1 million documents.

От
Alban Hertroys
Дата:
Benjamin Arai wrote:
> Hi,
>
> I have very slow performance for a TSearch2 table.  I have pasted the
> EXPLAIN ANALYZE queries below.  12 seconds is slow for almost any
> purpose.  Is there any way to speed this up?
>
> # explain analyze select * FROM fulltext_article,
> to_tsquery('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti,
> q) DESC;

Admittedly I'm kind of new to tsearch, but wouldn't

SELECT *
  FROM fulltext_article
 WHERE idxfti @@ to_tsquery('simple','dog')
 ORDER BY rank(idxfti, to_tsquery('simple', 'dog')) DESC;

be faster?

Quick testing shows a similar query in our database to not use a nested
loop and a function scan. For comparison, here are our plans:

Your approach:

QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4.86..4.87 rows=1 width=164) (actual time=0.151..0.161
rows=5 loops=1)
   Sort Key: rank(fulltext_article.idxfti, q.q)
   ->  Nested Loop  (cost=0.00..4.85 rows=1 width=164) (actual
time=0.067..0.119 rows=5 loops=1)
         ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.010..0.012 rows=1 loops=1)
         ->  Index Scan using fulltext_article_idxfti_idx on
fulltext_article  (cost=0.00..4.82 rows=1 width=132) (actual
time=0.033..0.056 rows=5 loops=1)
               Index Cond: (fulltext_article.idxfti @@ "outer".q)
               Filter: (fulltext_article.idxfti @@ "outer".q)
 Total runtime: 0.242 ms
(8 rows)


My suggested approach:

                                                                   QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4.84..4.84 rows=1 width=132) (actual time=0.085..0.095
rows=5 loops=1)
   Sort Key: rank(idxfti, '''dog'''::tsquery)
   ->  Index Scan using fulltext_article_idxfti_idx on fulltext_article
 (cost=0.00..4.83 rows=1 width=132) (actual time=0.025..0.052 rows=5
loops=1)
         Index Cond: (idxfti @@ '''dog'''::tsquery)
         Filter: (idxfti @@ '''dog'''::tsquery)
 Total runtime: 0.163 ms
(6 rows)

I hope this helps.

--
Alban Hertroys
a.hertroys@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: Slow TSearch2 performance for table with 1 million documents.

От
Tom Lane
Дата:
Benjamin Arai <benjamin@araisoft.com> writes:
> # explain analyze select * FROM fulltext_article, to_tsquery
> ('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC;

>    QUERY PLAN
> ------------------------------------------------------------------------
> ------------------------------------------------------------------------
> ------------
> Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
> time=12969.237..12970.490 rows=5119 loops=1)
>     Sort Key: rank(fulltext_article.idxfti, q.q)
>     ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
> (actual time=209.513..12955.498 rows=5119 loops=1)
>           ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
> (actual time=0.005..0.006 rows=1 loops=1)
>           ->  Bitmap Heap Scan on fulltext_article
> (cost=3069.79..6516.70 rows=933 width=742) (actual
> time=209.322..234.390 rows=5119 loops=1)
>                 Recheck Cond: (fulltext_article.idxfti @@ q.q)
>                 ->  Bitmap Index Scan on fulltext_article_idxfti_idx
> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373
> rows=5119 loops=1)
>                       Index Cond: (fulltext_article.idxfti @@ q.q)
> Total runtime: 12973.035 ms
> (9 rows)

The time seems all spent at the join step, which is odd because it
really hasn't got much to do.  AFAICS all it has to do is compute the
rank() values that the sort step will use.  Is it possible that
rank() is really slow?

            regards, tom lane

Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.

От
Oleg Bartunov
Дата:
On Fri, 5 Oct 2007, Tom Lane wrote:

> Benjamin Arai <benjamin@araisoft.com> writes:
>> # explain analyze select * FROM fulltext_article, to_tsquery
>> ('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti, q) DESC;
>
>>    QUERY PLAN
>> ------------------------------------------------------------------------
>> ------------------------------------------------------------------------
>> ------------
>> Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
>> time=12969.237..12970.490 rows=5119 loops=1)
>>     Sort Key: rank(fulltext_article.idxfti, q.q)
>>     ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
>> (actual time=209.513..12955.498 rows=5119 loops=1)
>>           ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
>> (actual time=0.005..0.006 rows=1 loops=1)
>>           ->  Bitmap Heap Scan on fulltext_article
>> (cost=3069.79..6516.70 rows=933 width=742) (actual
>> time=209.322..234.390 rows=5119 loops=1)
>>                 Recheck Cond: (fulltext_article.idxfti @@ q.q)
>>                 ->  Bitmap Index Scan on fulltext_article_idxfti_idx
>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373
>> rows=5119 loops=1)
>>                       Index Cond: (fulltext_article.idxfti @@ q.q)
>> Total runtime: 12973.035 ms
>> (9 rows)
>
> The time seems all spent at the join step, which is odd because it
> really hasn't got much to do.  AFAICS all it has to do is compute the
> rank() values that the sort step will use.  Is it possible that
> rank() is really slow?

can you try rank_cd() instead ?


>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.

От
Benjamin Arai
Дата:
On Oct 5, 2007, at 8:32 AM, Oleg Bartunov wrote:

> On Fri, 5 Oct 2007, Tom Lane wrote:
>
>> Benjamin Arai <benjamin@araisoft.com> writes:
>>> # explain analyze select * FROM fulltext_article, to_tsquery
>>> ('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti, q)
>>> DESC;
>>
>>>    QUERY PLAN
>>> --------------------------------------------------------------------
>>> ----
>>> --------------------------------------------------------------------
>>> ----
>>> ------------
>>> Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
>>> time=12969.237..12970.490 rows=5119 loops=1)
>>>     Sort Key: rank(fulltext_article.idxfti, q.q)
>>>     ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
>>> (actual time=209.513..12955.498 rows=5119 loops=1)
>>>           ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
>>> (actual time=0.005..0.006 rows=1 loops=1)
>>>           ->  Bitmap Heap Scan on fulltext_article
>>> (cost=3069.79..6516.70 rows=933 width=742) (actual
>>> time=209.322..234.390 rows=5119 loops=1)
>>>                 Recheck Cond: (fulltext_article.idxfti @@ q.q)
>>>                 ->  Bitmap Index Scan on fulltext_article_idxfti_idx
>>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373
>>> rows=5119 loops=1)
>>>                       Index Cond: (fulltext_article.idxfti @@ q.q)
>>> Total runtime: 12973.035 ms
>>> (9 rows)
>>
>> The time seems all spent at the join step, which is odd because it
>> really hasn't got much to do.  AFAICS all it has to do is compute the
>> rank() values that the sort step will use.  Is it possible that
>> rank() is really slow?
>
> can you try rank_cd() instead ?
>
Using Rank:

-# ('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti, q)
DESC;

   QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
------------
Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
time=98083.081..98084.351 rows=5119 loops=1)
    Sort Key: rank(fulltext_article.idxfti, q.q)
    ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
(actual time=479.122..98067.594 rows=5119 loops=1)
          ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.003..0.004 rows=1 loops=1)
          ->  Bitmap Heap Scan on fulltext_article
(cost=3069.79..6516.70 rows=933 width=742) (actual
time=341.739..37112.110 rows=5119 loops=1)
                Recheck Cond: (fulltext_article.idxfti @@ q.q)
                ->  Bitmap Index Scan on fulltext_article_idxfti_idx
(cost=0.00..3069.56 rows=933 width=0) (actual time=321.443..321.443
rows=5119 loops=1)
                      Index Cond: (fulltext_article.idxfti @@ q.q)
Total runtime: 98087.575 ms
(9 rows)

Using Rank_cd:

# explain analyze select * FROM fulltext_article, to_tsquery
('simple','cat') AS q  WHERE idxfti @@ q ORDER BY rank_cd(idxfti, q)
DESC;

   QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
-------------
Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
time=199316.648..199324.631 rows=26054 loops=1)
    Sort Key: rank_cd(fulltext_article.idxfti, q.q)
    ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
(actual time=871.428..199244.330 rows=26054 loops=1)
          ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.006..0.007 rows=1 loops=1)
          ->  Bitmap Heap Scan on fulltext_article
(cost=3069.79..6516.70 rows=933 width=742) (actual
time=850.674..50146.477 rows=26054 loops=1)
                Recheck Cond: (fulltext_article.idxfti @@ q.q)
                ->  Bitmap Index Scan on fulltext_article_idxfti_idx
(cost=0.00..3069.56 rows=933 width=0) (actual time=838.120..838.120
rows=26054 loops=1)
                      Index Cond: (fulltext_article.idxfti @@ q.q)
Total runtime: 199338.297 ms
(9 rows)

>
>>
>>             regards, tom lane
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>


Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.

От
Benjamin Arai
Дата:
It appears that the ORDER BY rank operation is the slowing factor.
If I remove it then the query is pretty fast.  Is there another way
to perform ORDER BY such that it does not do a sort?

Benjamin

On Oct 5, 2007, at 3:57 PM, Benjamin Arai wrote:

>
> On Oct 5, 2007, at 8:32 AM, Oleg Bartunov wrote:
>
>> On Fri, 5 Oct 2007, Tom Lane wrote:
>>
>>> Benjamin Arai <benjamin@araisoft.com> writes:
>>>> # explain analyze select * FROM fulltext_article, to_tsquery
>>>> ('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti,
>>>> q) DESC;
>>>
>>>>    QUERY PLAN
>>>> -------------------------------------------------------------------
>>>> -----
>>>> -------------------------------------------------------------------
>>>> -----
>>>> ------------
>>>> Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
>>>> time=12969.237..12970.490 rows=5119 loops=1)
>>>>     Sort Key: rank(fulltext_article.idxfti, q.q)
>>>>     ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
>>>> (actual time=209.513..12955.498 rows=5119 loops=1)
>>>>           ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
>>>> (actual time=0.005..0.006 rows=1 loops=1)
>>>>           ->  Bitmap Heap Scan on fulltext_article
>>>> (cost=3069.79..6516.70 rows=933 width=742) (actual
>>>> time=209.322..234.390 rows=5119 loops=1)
>>>>                 Recheck Cond: (fulltext_article.idxfti @@ q.q)
>>>>                 ->  Bitmap Index Scan on
>>>> fulltext_article_idxfti_idx
>>>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373
>>>> rows=5119 loops=1)
>>>>                       Index Cond: (fulltext_article.idxfti @@ q.q)
>>>> Total runtime: 12973.035 ms
>>>> (9 rows)
>>>
>>> The time seems all spent at the join step, which is odd because it
>>> really hasn't got much to do.  AFAICS all it has to do is compute
>>> the
>>> rank() values that the sort step will use.  Is it possible that
>>> rank() is really slow?
>>
>> can you try rank_cd() instead ?
>>
> Using Rank:
>
> -# ('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti,
> q) DESC;
>
>    QUERY PLAN
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------
> Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
> time=98083.081..98084.351 rows=5119 loops=1)
>    Sort Key: rank(fulltext_article.idxfti, q.q)
>    ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
> (actual time=479.122..98067.594 rows=5119 loops=1)
>          ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
> (actual time=0.003..0.004 rows=1 loops=1)
>          ->  Bitmap Heap Scan on fulltext_article
> (cost=3069.79..6516.70 rows=933 width=742) (actual
> time=341.739..37112.110 rows=5119 loops=1)
>                Recheck Cond: (fulltext_article.idxfti @@ q.q)
>                ->  Bitmap Index Scan on
> fulltext_article_idxfti_idx  (cost=0.00..3069.56 rows=933 width=0)
> (actual time=321.443..321.443 rows=5119 loops=1)
>                      Index Cond: (fulltext_article.idxfti @@ q.q)
> Total runtime: 98087.575 ms
> (9 rows)
>
> Using Rank_cd:
>
> # explain analyze select * FROM fulltext_article, to_tsquery
> ('simple','cat') AS q  WHERE idxfti @@ q ORDER BY rank_cd(idxfti,
> q) DESC;
>
>    QUERY PLAN
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -----------------
> Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
> time=199316.648..199324.631 rows=26054 loops=1)
>    Sort Key: rank_cd(fulltext_article.idxfti, q.q)
>    ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
> (actual time=871.428..199244.330 rows=26054 loops=1)
>          ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
> (actual time=0.006..0.007 rows=1 loops=1)
>          ->  Bitmap Heap Scan on fulltext_article
> (cost=3069.79..6516.70 rows=933 width=742) (actual
> time=850.674..50146.477 rows=26054 loops=1)
>                Recheck Cond: (fulltext_article.idxfti @@ q.q)
>                ->  Bitmap Index Scan on
> fulltext_article_idxfti_idx  (cost=0.00..3069.56 rows=933 width=0)
> (actual time=838.120..838.120 rows=26054 loops=1)
>                      Index Cond: (fulltext_article.idxfti @@ q.q)
> Total runtime: 199338.297 ms
> (9 rows)
>
>>
>>>
>>>             regards, tom lane
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 5: don't forget to increase your free space map settings
>>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet
>> (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>


Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.

От
Tom Lane
Дата:
Benjamin Arai <me@benjaminarai.com> writes:
> It appears that the ORDER BY rank operation is the slowing factor.
> If I remove it then the query is pretty fast.  Is there another way
> to perform ORDER BY such that it does not do a sort?

I think you misunderstood: it's not the sort that's slow, it's the
computation of the rank() values that are inputs to the sort.

            regards, tom lane

Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.

От
Benjamin Arai
Дата:
Oh, I see.  I didn't look carefully at the EXPLAIN ANALYZE I posted.
So, is there a solution to the rank problem?

Benjamin

On Oct 11, 2007, at 8:53 AM, Tom Lane wrote:

> Benjamin Arai <me@benjaminarai.com> writes:
>> It appears that the ORDER BY rank operation is the slowing factor.
>> If I remove it then the query is pretty fast.  Is there another way
>> to perform ORDER BY such that it does not do a sort?
>
> I think you misunderstood: it's not the sort that's slow, it's the
> computation of the rank() values that are inputs to the sort.
>
>             regards, tom lane
>