Re: Performance regression between 8.3 and 8.4 on heavy text indexing

Поиск
Список
Период
Сортировка
От gael@pilotsystems.net (Gaël Le Mignot)
Тема Re: Performance regression between 8.3 and 8.4 on heavy text indexing
Дата
Msg-id plop87k50qtve1.fsf@aoskar.kilobug.org
обсуждение исходный текст
Ответ на Re: Performance regression between 8.3 and 8.4 on heavy text indexing  (Guillaume Smet <guillaume.smet@gmail.com>)
Ответы Re: Performance regression between 8.3 and 8.4 on heavy text indexing  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Hello Guillaume!

Sun, 23 Aug 2009 14:49:05 +0200, you wrote:

 > Hi Gaël,
 > On Fri, Aug 21, 2009 at 3:37 PM, Gaël Le Mignot<gael@pilotsystems.net> wrote:
 >> With 8.3 ::
 >>
 >>  Limit  (cost=752.67..752.67 rows=1 width=24)
 >>  (11 rows)
 >>
 >> With 8.4 ::
 >>  (8 rows)

 > Could you provide us the EXPLAIN *ANALYZE* output of both plans?

Sure, here it is :

With 8.3 ::

libearticles=>  explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN
('methode','nica')AND (keywords_tsv @@ plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french',
'claude& duviau')))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50; 
                                                                     QUERY PLAN
                            

----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=760.74..760.74 rows=1 width=24) (actual time=449.057..449.080 rows=9 loops=1)
   ->  Sort  (cost=760.74..760.74 rows=1 width=24) (actual time=449.053..449.061 rows=9 loops=1)
         Sort Key: publicationdate, pagenumber
         Sort Method:  quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on libeindex  (cost=756.71..760.73 rows=1 width=24) (actual time=420.704..448.571 rows=9
loops=1)
               Recheck Cond: ((keywords_tsv @@ '''assassinat'''::tsquery) AND (fulltext_tsv @@ '''claud'' &
''duviau'''::tsquery))
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
               ->  BitmapAnd  (cost=756.71..756.71 rows=1 width=0) (actual time=420.612..420.612 rows=0 loops=1)
                     ->  Bitmap Index Scan on keywords_index  (cost=0.00..48.96 rows=573 width=0) (actual
time=129.338..129.338rows=10225 loops=1) 
                           Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
                     ->  Bitmap Index Scan on fulltext_index  (cost=0.00..707.50 rows=573 width=0) (actual
time=289.775..289.775rows=14 loops=1) 
                           Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
 Total runtime: 471.905 ms
(13 rows)

With 8.4 ::

libebench=>  explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN
('methode','nica')AND (keywords_tsv @@ plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french',
'claude& duviau')))) ORDER BY publicationDate DESC,pageNumber ASC LIMIT 50; 
                                                                                 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=758.51..758.51 rows=1 width=24) (actual time=50816.635..50816.660 rows=9 loops=1)
   ->  Sort  (cost=758.51..758.51 rows=1 width=24) (actual time=50816.628..50816.637 rows=9 loops=1)
         Sort Key: publicationdate, pagenumber
         Sort Method:  quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on libeindex  (cost=14.03..758.50 rows=1 width=24) (actual time=8810.133..50816.484
rows=9loops=1) 
               Recheck Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND (fulltext_tsv @@ '''claud'' &
''duviau'''::tsquery)AND ((classname)::text = 'article'::text)) 
               ->  Bitmap Index Scan on keywords_index  (cost=0.00..14.03 rows=192 width=0) (actual
time=158.563..158.563rows=10222 loops=1) 
                     Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
 Total runtime: 50817.040 ms
(10 rows)

So it seems it was quite wrong about estimated matching rows (192 predicted, 10222 reals).

 >> From what I can see, one of the difference is that the estimates of
 > the number of rows are / 3 for this part of the query:
 > 8.3 ->  Bitmap Index Scan on keywords_index  (cost=0.00..48.97 rows=574 width=0)
 > 8.4 ->  Bitmap Index Scan on keywords_index  (cost=0.00..14.03 rows=192 width=0)

 > It might be interesting to see if 8.4 is right or not.

 > Before 8.4, the selectivity for full text search was a constant (as
 > you can see it in your 8.3 plan: the number of rows are equal in both
 > bitmap index scan). 8.4 is smarter which might lead to other plans.

I see, thanks  for your answer. What's weird  is that this "smartness"
leads to overall worse results in  our case, is there some tweaking we
can  do?   I  didn't  see  anything in  the  documentation  to  change
weighting inside the text-match heuristic.

--
Gaël Le Mignot - gael@pilotsystems.net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: PostgreSQL does CAST implicitely between int and a domain derivedfrom int
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Performance regression between 8.3 and 8.4 on heavy text indexing