Re: [HACKERS] Remove 1MB size limit in tsvector

Поиск
Список
Период
Сортировка
От Ildus Kurbangaliev
Тема Re: [HACKERS] Remove 1MB size limit in tsvector
Дата
Msg-id 20170911123332.38d5853a@wp.localdomain
обсуждение исходный текст
Ответ на Re: [HACKERS] Remove 1MB size limit in tsvector  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] Remove 1MB size limit in tsvector  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Thu, 7 Sep 2017 23:08:14 +0200
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> Hi,
> 
> On 08/17/2017 12:23 PM, Ildus Kurbangaliev wrote:
> > In my benchmarks when database fits into buffers (so it's
> > measurement of the time required for the tsvectors conversion) it
> > gives me these results:
> > 
> > Without conversion:
> > 
> > $ ./tsbench2 -database test1 -bench_time 300
> > 2017/08/17 12:04:44 Number of connections:  4
> > 2017/08/17 12:04:44 Database:  test1
> > 2017/08/17 12:09:44 Processed: 51419
> > 
> > With conversion:
> > 
> > $ ./tsbench2 -database test1 -bench_time 300
> > 2017/08/17 12:14:31 Number of connections:  4
> > 2017/08/17 12:14:31 Database:  test1
> > 2017/08/17 12:19:31 Processed: 43607
> > 
> > I ran a bunch of these tests, and these results are stable on my
> > machine. So in these specific tests performance regression about
> > 15%.
> > 
> > Same time I think this could be the worst case, because usually data
> > is on disk and conversion will not affect so much to performance.
> >   
> 
> That seems like a fairly significant regression, TBH. I don't quite
> agree we can simply assume in-memory workloads don't matter, plenty of
> databases have 99% cache hit ratio (particularly when considering not
> just shared buffers, but also page cache).

I think part of this regression is caused by better compression of new
format. I can't say exact percent here, need to check with perf.

If you care about performace, you create indexes, which means that
tsvector will no longer be used for text search (except for ORDER BY
rank). Index machinery will only peek into tsquery. Moreover, RUM index
stores positions + lexemes, so it doesn't need tsvectors for ranked
search. As a result, tsvector becomes a storage for
building indexes (indexable type), not something that should be used at
runtime. And the change of the format doesn't affect index creation
time.

> 
> Can you share the benchmarks, so that others can retry running them?

Benchmarks are published at github:
https://github.com/ildus/tsbench . I'm not sure that they are easy to
use.

Best regards,
Ildus Kurbangaliev


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: [HACKERS] Cached plans and statement generalization
Следующее
От: Aleksander Alekseev
Дата:
Сообщение: Re: [HACKERS] Automatic testing of patches in commit fest