Re: Full text index not being used

Поиск

Список

Период

Сортировка

От	Oleg Bartunov
Тема	Re: Full text index not being used
Дата	3 февраля 2009 г. 16:19:37
Msg-id	Pine.LNX.4.64.0902032016590.4158@sn.sai.msu.ru обсуждение исходный текст
Ответ на	Re: Full text index not being used (Alex Neth <alex@liivid.com>)
Список	pgsql-general

Дерево обсуждения

Alex,

looks like you need to read documentation and tune postgresql.conf.
Currently, it looks not good. I have no time to guide you, so search
archives for tuning postgresql.conf. This was discussed a lot of time.

Oleg
On Wed, 4 Feb 2009, Alex Neth wrote:

> I've also found other queries that were really fast with MySQL are really
> slow in Postgres.  I'm hoping that is a matter of tuning.  Overall I'm
> finding the query times to be extremely unpredictable.
>
> I added a slow query logger to my application that also does an explain.
> Check these out.  The time in parens is the time for the initial execution
> (before it was cached).  These are not under heavy load.  Note that there are
> around 400 users - not a lot.  22 seconds is ridiculous.
>
> Slow query: (22.0198369026184) [0] SELECT * FROM "users" WHERE
> ("users"."remember_token" = E'26a2b54d2b2e18edc8aa2e1f0e0508b0d858d650')
> LIMIT 1
> Limit  (cost=0.00..33.17 rows=1 width=784) (actual time=0.182..0.183 rows=1
> loops=1)
> ->  Seq Scan on users  (cost=0.00..33.17 rows=1 width=784) (actual
> time=0.181..0.181 rows=1 loops=1)
>       Filter: ((remember_token)::text =
> '26a2b54d2b2e18edc8aa2e1f0e0508b0d858d650'::text)
> Total runtime: 0.223 ms
>
> Slow query: (59.8780090808868) [0] SELECT id FROM "source_listings" WHERE
> (post_time BETWEEN '2009-01-27 08:14:58.261978' AND '2009-02-05
> 08:14:58.262034' AND ((geo_lon BETWEEN 5751555 AND 5759369 AND geo_lat
> BETWEEN 12773379 AND 12776908)))  ORDER BY post_time DESC LIMIT 60 OFFSET 0
> Limit  (cost=89.38..89.38 rows=1 width=12) (actual time=1368.555..1368.644
> rows=60 loops=1)  ->  Sort  (cost=89.38..89.38 rows=1 width=12) (actual
> time=1368.552..1368.588 rows=60 loops=1)
>       Sort Key: post_time        Sort Method:  top-N heapsort  Memory: 19kB
> ->  Index Scan using x_sl_lat_lon_pt_br_lt_region on source_listings
> (cost=0.00..89.37 rows=1 width=12) (actual time=0.097..1365.469 rows=2078
> loops=1)
>             Index Cond: ((geo_lat >= 12773379) AND (geo_lat <= 12776908) AND
> (geo_lon >= 5751555) AND (geo_lon
> <= 5759369) AND (post_time >= '2009-01-27 08:14:58.261978'::timestamp without
> time zone) AND (post_time <= '2009-02-05 08:14:58.262034'::timestamp without
> time zone))Total runtime: 1368.722 ms
>
>
>
> On Feb 3, 2009, at 8:15 PM, Oleg Bartunov wrote:
>
>> Alex, can you somehow identify document, which has problem with
>> long word errors ? Also, if you have space on disk I'd recommend to try
>>
>> select *, to_tsvector('english',full_listing) as flv from source_listings;
>
> This is equally slow.
>
>>
>>
>> I don't remember if you said us information about
>> your setup (pg version, OS, memory, what did you change in
>> postgresql.conf..)
>>
>
> Version is 8.3.5.  Ubuntu 2.6.21.  2Gb RAM.  postgresql.conf changes:
>
> shared_buffers = 24MB            # min 128kB or max_connections*16kB
>
> work_mem = 10MB                # min 64kB
>
> max_fsm_pages = 153600            # min max_fsm_relations*16, 6 bytes
> each
>
>
>>
>> Oleg
>> On Tue, 3 Feb 2009, Alex Neth wrote:
>>
>>> Based on suggestions from this list, I am trying to create a tsvector
>>> column and index that, since it is perhaps the recheck and rebuilding of
>>> all the vectors that is slowing things down.  I don't understand why a
>>> recheck is necessary on a gin index.....
>>>
>>> My update statement has been running for 36 hours now and has not
>>> finished. The statement is:  update source_listings set flv =
>>> to_tsvector('english', full_listing);  I know that it is still working
>>> because it occasionally prints out one of those long word errors.
>>>
>>> I have only 1.6M rows and each entry in that column is a standard size web
>>> page with just the text, maybe 3-5K.
>>>
>>> For sure I don't have meaningful long words.  Perhaps that is because it
>>> is not handling the HTML well and I should be parsing down the web page
>>> first. Hopefully that doesn't mean I need to rebuild this column over the
>>> course of 3 days - I didn't expect it to take this long so I thought I'd
>>> just try it out.
>>>
>>>
>>>
>>>
>>>
>>> On Feb 2, 2009, at 1:49 AM, Oleg Bartunov wrote:
>>>
>>>> Alex,
>>>> what text you're indexing ? I don't believe you have meaningful
>>>> very long words ( > 2047 characters).
>>>> Do you really need multicolumn index ?
>>>> I'd recommend to separate problem - create column fts for
>>>> tsvector('english',full_listing), create index on it and try full-text
>>>> query. The way you're doing imply calling to_tsvector every time you
>>>> search, which can be very costly.
>>>> Olegk
>>>> On Sun, 1 Feb 2009, Alex wrote:
>>>>> So this seems to be because the result size is too big.  I still don't
>>>>> know why it is looping through every record and printing a warning,
>>>>> but adding a LIMIT makes the queries complete in a reasonable time
>>>>> (although not all that fast).
>>>>> However I need to sort and also have many other facets that may or may
>>>>> not be included in the query.  Adding a sort makes it load every
>>>>> record again and take forever.
>>>>> I tried to create an index including all of the fields I query on to
>>>>> see if that would work, but I get an error the the index row is too
>>>>> large:
>>>>> => create index master_index on source_listings(geo_lat, geo_lon,
>>>>> price, bedrooms, region, city, listing_type, to_tsvector('english',
>>>>> full_listing), post_time);
>>>>> NOTICE:  word is too long to be indexed
>>>>> DETAIL:  Words longer than 2047 characters are ignored.
>>>>> NOTICE:  word is too long to be indexed
>>>>> DETAIL:  Words longer than 2047 characters are ignored.
>>>>> NOTICE:  word is too long to be indexed
>>>>> DETAIL:  Words longer than 2047 characters are ignored.
>>>>> NOTICE:  word is too long to be indexed
>>>>> DETAIL:  Words longer than 2047 characters are ignored.
>>>>> ERROR:  index row requires 13356 bytes, maximum size is 8191
>>>>> Any ideas about how to resolve this?
>>>>
>>>>     Regards,
>>>>         Oleg
>>>> _____________________________________________________________
>>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>>> Sternberg Astronomical Institute, Moscow University, Russia
>>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-general по дате отправления:

Предыдущее

От: Alan Hodgson
Дата: 03 февраля 2009 г., 16:17:52
Сообщение: Re: Vacuums taking forever :(

Следующее

От: Tom Lane
Дата: 03 февраля 2009 г., 16:20:56
Сообщение: Re: LIKE with pattern containing backslash

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Full text index not being used

Предыдущее

Следующее