Re: GIN improvements part 1: additional information

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: GIN improvements part 1: additional information
Дата
Msg-id 51D84912.2080000@fuzzy.cz
обсуждение исходный текст
Ответ на Re: GIN improvements part 1: additional information  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
Hi,

I've done a fair amount of testing by loading pgsql-general archives
into a database and running a bunch of simple ts queries that use a GIN
index.

I've tested this as well as the two other patches, but as I was able to
get meaningful results only from this patch, I'll post the results here
and info about segfaults and other observed errors to the other threads.

First of all - update the commitfest page whenever you submit a new
patch version, please. I've spent two or three hours testing and
debugging a patches linked from those pages only to find out that there
are newer versions. I should have checked that initially, but let's keep
that updated.

I wan't able to apply the patches to the current head, so I've used
b8fd1a09 (from 17/06) as a base commit.

The following table shows these metrics:
* data load   - how long it took to import ~200k messages from the  list archive   - includes a lot of time spent in
Python(parsing), checking FKs ...   - so unless this is significantly higher, it's probably OK
 
* index size   - size of the main GIN index on message body
* 1/2/3-word(s)   - number of queries in the form
     SELECT id FROM messages      WHERE body_tsvector @@ plainto_tsquery('english', 'w1 w2')      LIMIT 100
   (executed over 60 seconds, and 'per second' speed)

All the scripts are available at https://bitbucket.org/tvondra/archie

Now, the results:

no patches:   data load:  710 s   index size: 545 MB   1 word:      37500 (630/s)   2 words:     49800 (800/s)   3
words:    40000 (660/s)
 

additional info (ginaddinfo.7.patch):   data load:  693 s   index size: 448 MB   1 word:     135000 (2250/s)   2 words:
   85000 (1430/s)   3 words:     54000 ( 900/s)
 

additional info + fast scan (gin_fast_scan.4.patch):   data load:  720 s   index size: 455 MB   1 word:     FAIL   2
words:   FAIL   3 words:    FAIL
 

additional info + fast scan + ordering (gin_ordering.4.patch):   data load:  FAIL   index size: N/A   1 word:     N/A
2words:    N/A   3 words:    N/A
 

So the speedup after adding info into GIN seems very promising, although
I don't quite understand why searching for two words is so much slower.
Also the index size seems to decrease significantly.

After applying 'fast scan' the things started to break down, so I wasn't
able to run the queries and then even the load failed consistently.

I'll post the info into the appropriate threads.

Tomas



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kevin Grittner
Дата:
Сообщение: Re: refresh materialized view concurrently
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: GIN improvements part2: fast scan