Re: WIP: store additional info in GIN index

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: WIP: store additional info in GIN index
Дата
Msg-id CAPpHfdt+i0rjVouRNqiGSQBBDgaYsM3UewYLmAvOU-_OfAGkfg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: WIP: store additional info in GIN index  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: WIP: store additional info in GIN index
Список pgsql-hackers
Hi!

On Thu, Dec 6, 2012 at 5:44 AM, Tomas Vondra <tv@fuzzy.cz> wrote:
Then I've run a simple benchmarking script, and the results are not as
good as I expected, actually I'm getting much worse performance than
with the original GIN index.

The following table contains the time of loading the data (not a big
difference), and number of queries per minute for various number of
words in the query.

The queries looks like this

SELECT id FROM messages
 WHERE body_tsvector @@ plainto_tsquery('english', 'word1 word2 ...')

so it's really the simplest form of FTS query possible.

           without patch |      with patch
--------------------------------------------
loading       750 sec    |         770 sec
1 word           1500    |            1100
2 words         23000    |            9800
3 words         24000    |            9700
4 words         16000    |            7200
--------------------------------------------

I'm not saying this is a perfect benchmark, but the differences (of
querying) are pretty huge. Not sure where this difference comes from,
but it seems to be quite consistent (I usually get +-10% results, which
is negligible considering the huge difference).

Is this an expected behaviour that will be fixed by another patch?
 
Another patches which significantly accelerate index search will be provided. This patch changes only GIN posting lists/trees storage. However, it wasn't expected that this patch significantly changes index scan speed in any direction.

The database contains ~680k messages from the mailing list archives,
i.e. about 900 MB of data (in the table), and the GIN index on tsvector
is about 900MB too. So the whole dataset nicely fits into memory (8GB
RAM), and it seems to be completely CPU bound (no I/O activity at all).

The configuration was exactly the same in both cases

    shared buffers = 1GB
    work mem = 64 MB
    maintenance work mem = 256 MB

I can either upload the database somewhere, or provide the benchmarking
script if needed.

Unfortunately, I can't reproduce such huge slowdown on my testcases. Could you share both database and benchmarking script?

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: foreign key locks
Следующее
От: Pavel Stehule
Дата:
Сообщение: strange behave of fulltext query when query contains negation of prefix