Re: [HACKERS] Remove 1MB size limit in tsvector

Поиск

Список

Период

Сортировка

От	Ildus Kurbangaliev
Тема	Re: [HACKERS] Remove 1MB size limit in tsvector
Дата	11 сентября 2017 г. 12:33:32
Msg-id	20170911123332.38d5853a@wp.localdomain обсуждение
Ответ на	Re: [HACKERS] Remove 1MB size limit in tsvector (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы	Re: [HACKERS] Remove 1MB size limit in tsvector
Список	pgsql-hackers

Дерево обсуждения

On Thu, 7 Sep 2017 23:08:14 +0200
Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

> Hi,
> 
> On 08/17/2017 12:23 PM, Ildus Kurbangaliev wrote:
> > In my benchmarks when database fits into buffers (so it's
> > measurement of the time required for the tsvectors conversion) it
> > gives me these results:
> > 
> > Without conversion:
> > 
> > $ ./tsbench2 -database test1 -bench_time 300
> > 2017/08/17 12:04:44 Number of connections:  4
> > 2017/08/17 12:04:44 Database:  test1
> > 2017/08/17 12:09:44 Processed: 51419
> > 
> > With conversion:
> > 
> > $ ./tsbench2 -database test1 -bench_time 300
> > 2017/08/17 12:14:31 Number of connections:  4
> > 2017/08/17 12:14:31 Database:  test1
> > 2017/08/17 12:19:31 Processed: 43607
> > 
> > I ran a bunch of these tests, and these results are stable on my
> > machine. So in these specific tests performance regression about
> > 15%.
> > 
> > Same time I think this could be the worst case, because usually data
> > is on disk and conversion will not affect so much to performance.
> >   
> 
> That seems like a fairly significant regression, TBH. I don't quite
> agree we can simply assume in-memory workloads don't matter, plenty of
> databases have 99% cache hit ratio (particularly when considering not
> just shared buffers, but also page cache).

I think part of this regression is caused by better compression of new
format. I can't say exact percent here, need to check with perf.

If you care about performace, you create indexes, which means that
tsvector will no longer be used for text search (except for ORDER BY
rank). Index machinery will only peek into tsquery. Moreover, RUM index
stores positions + lexemes, so it doesn't need tsvectors for ranked
search. As a result, tsvector becomes a storage for
building indexes (indexable type), not something that should be used at
runtime. And the change of the format doesn't affect index creation
time.

> 
> Can you share the benchmarks, so that others can retry running them?

Benchmarks are published at github:
https://github.com/ildus/tsbench . I'm not sure that they are easy to
use.

Best regards,
Ildus Kurbangaliev

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] Remove 1MB size limit in tsvector