Обсуждение: tsearch is slow

Поиск
Список
Период
Сортировка

tsearch is slow

От
Abbath
Дата:
Hi,

Recently I have tried TSearch2 (also on Linux and win with an
Athlon64 3000+ machine).
I have a table and I loaded some text files into it, so I have a row
number, and a text column and there is a ts_vec column for tsvector.
I created the gist index for ts_vec. The table has ~ 1 million
records. It seems that using a regular search query like:

SELECT line_number, headline(line, to_tsquery('keyword'))
FROM tstexts
WHERE ts_vec @@ to_tsquery('keyword')
ORDER BY rank(ts_vec, to_tsquery('keyword')) DESC

is slow for the first time (7-15 sec), but then using the same keyword
next time it is fast (10-100 ms). The reason is, as I read, first time
it is not cached at all, but next time the index pages are
cached so it is fast.

I think in a real word application, in this form, it is useless, because
the 15 sec search time is not allowable for any user. Is there a way
to initially cache the index or tune some postgres parameter? (I tried
to increase "shared_buffers", "effective_cache_size", "work_mem" but
had no effect on it) Or I should look for another search technology? (I
just want to provide a 2 sec maximum search time at 1 million records,
I think it is not a big expectation nowadays)


Abbath




Re: tsearch is slow

От
Richard Huxton
Дата:
Abbath wrote:
> is slow for the first time (7-15 sec), but then using the same keyword
> next time it is fast (10-100 ms). The reason is, as I read, first time
> it is not cached at all, but next time the index pages are
> cached so it is fast.
>
> I think in a real word application, in this form, it is useless, because
> the 15 sec search time is not allowable for any user.

What, never? Even if this facility is only used once a year by one user
and you have 1000 other users who need their queries to complete in 0.2
secs at most? What you mean is that it's not useful for *your*
application - don't assume the same applies to all applications.

 > Is there a way
> to initially cache the index or tune some postgres parameter? (I tried
> to increase "shared_buffers", "effective_cache_size", "work_mem" but
> had no effect on it) Or I should look for another search technology? (I
> just want to provide a 2 sec maximum search time at 1 million records,
> I think it is not a big expectation nowadays)

If you want to force the data to be cached, just put a cron-job in to
run a query for "abc" or whatever once a minute.

Of course, if it turns out that your users aren't running the query very
often then you're wasting resources, and if they are running it often
then again you're wasting resources. But - if you really need to, that's
the solution.

--
   Richard Huxton
   Archonet Ltd

Re: tsearch is slow

От
Abbath
Дата:
Hello Richard,

Wednesday, March 15, 2006, 3:35:26 PM, you wrote:

> Abbath wrote:
>> is slow for the first time (7-15 sec), but then using the same keyword
>> next time it is fast (10-100 ms). The reason is, as I read, first time
>> it is not cached at all, but next time the index pages are
>> cached so it is fast.
>>
>> I think in a real word application, in this form, it is useless, because
>> the 15 sec search time is not allowable for any user.

> What, never? Even if this facility is only used once a year by one user
> and you have 1000 other users who need their queries to complete in 0.2
> secs at most? What you mean is that it's not useful for *your*
> application - don't assume the same applies to all applications.

The search function will be a frequently used one so it shouldn't be
slow. Ok, maybe it is not suitable for me if I want a user friendly
search function.

 >> Is there a way
>> to initially cache the index or tune some postgres parameter? (I tried
>> to increase "shared_buffers", "effective_cache_size", "work_mem" but
>> had no effect on it) Or I should look for another search technology? (I
>> just want to provide a 2 sec maximum search time at 1 million records,
>> I think it is not a big expectation nowadays)

> If you want to force the data to be cached, just put a cron-job in to
> run a query for "abc" or whatever once a minute.

I can't guess what the user want to search.

> Of course, if it turns out that your users aren't running the query very
> often then you're wasting resources, and if they are running it often
> then again you're wasting resources. But - if you really need to, that's
> the solution.

It seems if I periodically use VACUUM ANALYZE, it helps to reduce the search
time...I will see how can I tune it more.

Thanks for the reply.

Abbath


Re: tsearch is slow

От
Scott Marlowe
Дата:
On Wed, 2006-03-15 at 13:36, Abbath wrote:
> Hello Richard,
>
> Wednesday, March 15, 2006, 3:35:26 PM, you wrote:
> > If you want to force the data to be cached, just put a cron-job in to
> > run a query for "abc" or whatever once a minute.
>
> I can't guess what the user want to search.

But that query will likely load up all the index info into memory.

> > Of course, if it turns out that your users aren't running the query very
> > often then you're wasting resources, and if they are running it often
> > then again you're wasting resources. But - if you really need to, that's
> > the solution.
>
> It seems if I periodically use VACUUM ANALYZE, it helps to reduce the search
> time...I will see how can I tune it more.

That statement is pretty telling.  You're new to PostgreSQL I'll
assume.  You'll need to read up on the periodic maintenance section of
the docs.

Here ya go:
http://www.postgresql.org/docs/8.1/interactive/maintenance.html

Re: tsearch is slow

От
Abbath
Дата:
Hello Scott,

Wednesday, March 15, 2006, 8:49:00 PM, you wrote:

> On Wed, 2006-03-15 at 13:36, Abbath wrote:
>> Hello Richard,
>>
>> Wednesday, March 15, 2006, 3:35:26 PM, you wrote:
>> > If you want to force the data to be cached, just put a cron-job in to
>> > run a query for "abc" or whatever once a minute.
>>
>> I can't guess what the user want to search.

> But that query will likely load up all the index info into memory.

Misunderstanding: I experienced that if I run a search for a keyword
first time it is slow, then next time it is fast BUT for that keyword,
not for any keyword.

>> > Of course, if it turns out that your users aren't running the query very
>> > often then you're wasting resources, and if they are running it often
>> > then again you're wasting resources. But - if you really need to, that's
>> > the solution.
>>
>> It seems if I periodically use VACUUM ANALYZE, it helps to reduce the search
>> time...I will see how can I tune it more.

> That statement is pretty telling.  You're new to PostgreSQL I'll
> assume.  You'll need to read up on the periodic maintenance section of
> the docs.

> Here ya go:
> http://www.postgresql.org/docs/8.1/interactive/maintenance.html

Yes, I have just started to use postgres so I need further experience.
Thanks for the link.

> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly



Re: tsearch is slow

От
Scott Marlowe
Дата:
On Wed, 2006-03-15 at 18:09, Abbath wrote:
> Hello Scott,
>
> Wednesday, March 15, 2006, 8:49:00 PM, you wrote:
> >>
> >> I can't guess what the user want to search.
>
> > But that query will likely load up all the index info into memory.
>
> Misunderstanding: I experienced that if I run a search for a keyword
> first time it is slow, then next time it is fast BUT for that keyword,
> not for any keyword.

I think you mean "ONLY for that keyword" there?  If everything else
becomes fast but the keyword becomes slow, then we've got a very
interesting (and possibly difficult) problem.

Full text search is the kind of problem you throw ONE database at on a
machine with LOTS of ram.  It doesn't need lots of CPU horsepower, or
even disk performance, as long as everything can fit into RAM.  Then,
set shared_buffers to 10-15% of the memory size, and let the OS do the
caching.

One of the best performance tuning docs is here:

http://www.varlena.com/GeneralBits/Tidbits/perf.html


> > That statement is pretty telling.  You're new to PostgreSQL I'll
> > assume.  You'll need to read up on the periodic maintenance section of
> > the docs.
>
> > Here ya go:
> > http://www.postgresql.org/docs/8.1/interactive/maintenance.html
>
> Yes, I have just started to use postgres so I need further experience.
> Thanks for the link.

We all started somewhere.  PostgreSQL is a pretty good place to start
learning databases.