Обсуждение: FTS GIN Index Question

Поиск
Список
Период
Сортировка

FTS GIN Index Question

От
Christian Ramseyer
Дата:
Hi List

I have a largish partitioned table, it has ~60 million records in each
of 12 partitions. It appears that a Full Text Index could speed up some
user queries a lot.

A quick test with an additional tsvector column revealed that this would
take up around 35 GB of space for this column and then maybe 5 more for
the gin index on it. As this is a lot of space (~ 480 GB), I'm a bit
tempted to use a gin index without the separate tsvector column.
However, the doc says that this will be slower.

Does anyone have an idea of how much slower we're talking here? The
index defintion would be a concatenation of two setweights(), i.e.:

... using gin(
   (setweight(to_tsvector('config',coalesce(col1,'')), 'A') ||
    setweight(to_tsvector('config',coalesce(col2,'')), 'B')))

Also, general recommendations regarding full text search configurations
of that size are very welcome.

Christian




Re: FTS GIN Index Question

От
Oleg Bartunov
Дата:
Christian,

On Wed, 29 Sep 2010, Christian Ramseyer wrote:

> Hi List
>
> I have a largish partitioned table, it has ~60 million records in each of 12
> partitions. It appears that a Full Text Index could speed up some user
> queries a lot.
>
> A quick test with an additional tsvector column revealed that this would take
> up around 35 GB of space for this column and then maybe 5 more for the gin
> index on it. As this is a lot of space (~ 480 GB), I'm a bit tempted to use a
> gin index without the separate tsvector column. However, the doc says that
> this will be slower.

do you have problem with disk space ? Searching index is usually very fast
operation, only small part of index readed.  Did you checked time to read
index ?

>
> Does anyone have an idea of how much slower we're talking here? The index
> defintion would be a concatenation of two setweights(), i.e.:
>
> ... using gin(
>  (setweight(to_tsvector('config',coalesce(col1,'')), 'A') ||
>   setweight(to_tsvector('config',coalesce(col2,'')), 'B')))
>
> Also, general recommendations regarding full text search configurations of
> that size are very welcome.

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: FTS GIN Index Question

От
Christian Ramseyer
Дата:
On 09/29/2010 12:10 PM, Oleg Bartunov wrote:
> Christian,
>
> On Wed, 29 Sep 2010, Christian Ramseyer wrote:
>
>> Hi List
>>
>> I have a largish partitioned table, it has ~60 million records in each
>> of 12 partitions. It appears that a Full Text Index could speed up
>> some user queries a lot.
>>
>> A quick test with an additional tsvector column revealed that this
>> would take up around 35 GB of space for this column and then maybe 5
>> more for the gin index on it. As this is a lot of space (~ 480 GB),
>> I'm a bit tempted to use a gin index without the separate tsvector
>> column. However, the doc says that this will be slower.
>
> do you have problem with disk space ? Searching index is usually very fast
> operation, only small part of index readed. Did you checked time to read
> index ?
>

Hi Oleg, thanks for your reply,

Well I could get the disk space, but it's on a corporate SAN so it's a
bit tedious. I didn't compare directly so far since even creating both
index versions for only a single partition would bring me close to the
space limit and also take maybe 20 hours or so. So my idea was to ask
here first if I'm looking at a "a few percent" or a "orders of
magnitude" difference. But of course, if there isn't enough feeback for
an informed decision I'll either try both or just go with the
space-intensive variant to be on the safe side, user experience is
certainly more important than disk usage.

Christian