Re: Full text index not being used

Поиск
Список
Период
Сортировка
От Alex Neth
Тема Re: Full text index not being used
Дата
Msg-id B230030F-2C30-4A6B-9E44-9A40A9FA5E84@liivid.com
обсуждение исходный текст
Ответ на Re: Full text index not being used  (Oleg Bartunov <oleg@sai.msu.su>)
Ответы Re: Full text index not being used  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-general
Based on suggestions from this list, I am trying to create a tsvector
column and index that, since it is perhaps the recheck and rebuilding
of all the vectors that is slowing things down.  I don't understand
why a recheck is necessary on a gin index.....

My update statement has been running for 36 hours now and has not
finished.  The statement is:  update source_listings set flv =
to_tsvector('english', full_listing);  I know that it is still working
because it occasionally prints out one of those long word errors.

I have only 1.6M rows and each entry in that column is a standard size
web page with just the text, maybe 3-5K.

For sure I don't have meaningful long words.  Perhaps that is because
it is not handling the HTML well and I should be parsing down the web
page first.  Hopefully that doesn't mean I need to rebuild this column
over the course of 3 days - I didn't expect it to take this long so I
thought I'd just try it out.





On Feb 2, 2009, at 1:49 AM, Oleg Bartunov wrote:

> Alex,
>
> what text you're indexing ? I don't believe you have meaningful
> very long words ( > 2047 characters).
>
> Do you really need multicolumn index ?
>
> I'd recommend to separate problem - create column fts for
> tsvector('english',full_listing), create index on it and try full-text
> query. The way you're doing imply calling to_tsvector every time you
> search, which can be very costly.
>
>
> Olegk
>
> On Sun, 1 Feb 2009, Alex wrote:
>
>> So this seems to be because the result size is too big.  I still
>> don't
>> know why it is looping through every record and printing a warning,
>> but adding a LIMIT makes the queries complete in a reasonable time
>> (although not all that fast).
>>
>> However I need to sort and also have many other facets that may or
>> may
>> not be included in the query.  Adding a sort makes it load every
>> record again and take forever.
>>
>> I tried to create an index including all of the fields I query on to
>> see if that would work, but I get an error the the index row is too
>> large:
>>
>> => create index master_index on source_listings(geo_lat, geo_lon,
>> price, bedrooms, region, city, listing_type, to_tsvector('english',
>> full_listing), post_time);
>> NOTICE:  word is too long to be indexed
>> DETAIL:  Words longer than 2047 characters are ignored.
>> NOTICE:  word is too long to be indexed
>> DETAIL:  Words longer than 2047 characters are ignored.
>> NOTICE:  word is too long to be indexed
>> DETAIL:  Words longer than 2047 characters are ignored.
>> NOTICE:  word is too long to be indexed
>> DETAIL:  Words longer than 2047 characters are ignored.
>> ERROR:  index row requires 13356 bytes, maximum size is 8191
>>
>> Any ideas about how to resolve this?
>>
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83


В списке pgsql-general по дате отправления:

Предыдущее
От: Thomas Markus
Дата:
Сообщение: Re: embedded pgsql media-failure
Следующее
От: Phoenix Kiula
Дата:
Сообщение: Re: Smartest way to resize a column?