Performance regression between 8.3 and 8.4 on heavy text indexing

От: gael@pilotsystems.net (Gaël Le Mignot)
Тема: Performance regression between 8.3 and 8.4 on heavy text indexing
Дата: ,
Msg-id: plop871vn5b84g.fsf@aoskar.kilobug.org
(см: обсуждение, исходный текст)
Ответы: Re: Performance regression between 8.3 and 8.4 on heavy text indexing  (Guillaume Smet)
Список: pgsql-performance

Скрыть дерево обсуждения

Performance regression between 8.3 and 8.4 on heavy text indexing  ( (Gaël Le Mignot), )
 Re: Performance regression between 8.3 and 8.4 on heavy text indexing  (Guillaume Smet, )
  Re: Performance regression between 8.3 and 8.4 on heavy text indexing  ( (Gaël Le Mignot), )
   Re: Performance regression between 8.3 and 8.4 on heavy text indexing  (Tom Lane, )
    Re: Performance regression between 8.3 and 8.4 on heavy text indexing  (Guillaume Smet, )
     Re: Performance regression between 8.3 and 8.4 on heavy text indexing  ( (Gaël Le Mignot), )
      Re: Performance regression between 8.3 and 8.4 on heavy text indexing  (Guillaume Smet, )

Hello,

We are using PostgreSQL to index a huge collection (570 000) of articles for a french daily newspaper (Libération). We
usemassively the full text search feature. I attach to this mail the schema of the database we use. 

Overall, we have very interesting performances, except in a few cases, when combining a full text match with a lot of
matcheswith a date order and a limit (which is a very common use case, asking for the 50 more recent articles speaking
abouta famous person, for example). 

The reason of this mail is what we noticed a performance drop from PostgreSQL 8.3 to PostgreSQL 8.4.

In order to try to locate the performance cost, I changed a few settings in 8.4 to have the same values than in 8.3
(andrerun analyze after) :: 

 cursor_tuple_fraction = 1.0
 default_statistics_target = 10

We the modified settings, the peformance drop is much lower, but still
present. Here are the statistics  on replaying sequentially a bunch of
real-life queries to the two versions of the database :

With 8.3 ::

 7334 queries, average time is 0.20 s
 6 queries above 20.00 s (0.08 %)
 20 queries above 10.00 s (0.27 %)
 116 queries above 2.00 s (1.58 %)
 top ten:  15.09 15.15 15.19 16.60 20.40 63.05 67.89 78.21 90.30 97.56

With 8.4 ::

 7334 queries, average time is 0.23 s
 12 queries above 20.00 s (0.16 %)
 24 queries above 10.00 s (0.33 %)
 112 queries above 2.00 s (1.53 %)
 top ten:  31.76 31.94 32.63 47.21 48.80 63.50 79.57 83.36 96.44 113.61


Here is an example query that is significantly slower in 8.4 (31.76 seconds) than in 8.3 (10.52 seconds) ::

 SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN ('methode','nica') AND (keywords_tsv
@@plainto_tsquery('french', 'assassinat') AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau')))) ORDER BY
publicationDateDESC,pageNumber ASC LIMIT 50 

And the explain on it :

With 8.3 ::

 Limit  (cost=752.67..752.67 rows=1 width=24)
   ->  Sort  (cost=752.67..752.67 rows=1 width=24)
         Sort Key: publicationdate, pagenumber
         ->  Bitmap Heap Scan on libeindex  (cost=748.64..752.66 rows=1 width=24)
               Recheck Cond: ((keywords_tsv @@ '''assassinat'''::tsquery) AND (fulltext_tsv @@ '''claud'' &
''duviau'''::tsquery))
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
               ->  BitmapAnd  (cost=748.64..748.64 rows=1 width=0)
                     ->  Bitmap Index Scan on keywords_index  (cost=0.00..48.97 rows=574 width=0)
                           Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
                     ->  Bitmap Index Scan on fulltext_index  (cost=0.00..699.42 rows=574 width=0)
                           Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
 (11 rows)

With 8.4 ::

 Limit  (cost=758.51..758.51 rows=1 width=24)
   ->  Sort  (cost=758.51..758.51 rows=1 width=24)
         Sort Key: publicationdate, pagenumber
         ->  Bitmap Heap Scan on libeindex  (cost=14.03..758.50 rows=1 width=24)
               Recheck Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND (fulltext_tsv @@ '''claud'' &
''duviau'''::tsquery)AND ((classname)::text = 'article'::text)) 
               ->  Bitmap Index Scan on keywords_index  (cost=0.00..14.03 rows=192 width=0)
                     Index Cond: (keywords_tsv @@ '''assassinat'''::tsquery)
 (8 rows)

More informations on the setup :

- postgresql 8.3.7 from Debian Lenny ;

- postgresql 8.4.0 from Debian Lenny backports ;

- rurnning in a Xen virtual machine, using 64-bits kernel ;

- 2 cores of a 2GHz Core2Quad and 2Gb of RAM dedicated to the VM.

If you need additional informations, we'll gladly provide them. If you have any tips or advises so we could make the
8.4behave as least as good as the 8.3 it would be very nice. 

Hoping this can help you to improve this great software.

Regards,

--
Gaël Le Mignot - 
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com


В списке pgsql-performance по дате сообщения:

От: Alvaro Herrera
Дата:
Сообщение: Re: number of rows estimation for bit-AND operation
От: Jerry Champlin
Дата:
Сообщение: Re: Number of tables