Обсуждение: tsearch is slow
Hi, Recently I have tried TSearch2 (also on Linux and win with an Athlon64 3000+ machine). I have a table and I loaded some text files into it, so I have a row number, and a text column and there is a ts_vec column for tsvector. I created the gist index for ts_vec. The table has ~ 1 million records. It seems that using a regular search query like: SELECT line_number, headline(line, to_tsquery('keyword')) FROM tstexts WHERE ts_vec @@ to_tsquery('keyword') ORDER BY rank(ts_vec, to_tsquery('keyword')) DESC is slow for the first time (7-15 sec), but then using the same keyword next time it is fast (10-100 ms). The reason is, as I read, first time it is not cached at all, but next time the index pages are cached so it is fast. I think in a real word application, in this form, it is useless, because the 15 sec search time is not allowable for any user. Is there a way to initially cache the index or tune some postgres parameter? (I tried to increase "shared_buffers", "effective_cache_size", "work_mem" but had no effect on it) Or I should look for another search technology? (I just want to provide a 2 sec maximum search time at 1 million records, I think it is not a big expectation nowadays) Abbath
Abbath wrote: > is slow for the first time (7-15 sec), but then using the same keyword > next time it is fast (10-100 ms). The reason is, as I read, first time > it is not cached at all, but next time the index pages are > cached so it is fast. > > I think in a real word application, in this form, it is useless, because > the 15 sec search time is not allowable for any user. What, never? Even if this facility is only used once a year by one user and you have 1000 other users who need their queries to complete in 0.2 secs at most? What you mean is that it's not useful for *your* application - don't assume the same applies to all applications. > Is there a way > to initially cache the index or tune some postgres parameter? (I tried > to increase "shared_buffers", "effective_cache_size", "work_mem" but > had no effect on it) Or I should look for another search technology? (I > just want to provide a 2 sec maximum search time at 1 million records, > I think it is not a big expectation nowadays) If you want to force the data to be cached, just put a cron-job in to run a query for "abc" or whatever once a minute. Of course, if it turns out that your users aren't running the query very often then you're wasting resources, and if they are running it often then again you're wasting resources. But - if you really need to, that's the solution. -- Richard Huxton Archonet Ltd
Hello Richard, Wednesday, March 15, 2006, 3:35:26 PM, you wrote: > Abbath wrote: >> is slow for the first time (7-15 sec), but then using the same keyword >> next time it is fast (10-100 ms). The reason is, as I read, first time >> it is not cached at all, but next time the index pages are >> cached so it is fast. >> >> I think in a real word application, in this form, it is useless, because >> the 15 sec search time is not allowable for any user. > What, never? Even if this facility is only used once a year by one user > and you have 1000 other users who need their queries to complete in 0.2 > secs at most? What you mean is that it's not useful for *your* > application - don't assume the same applies to all applications. The search function will be a frequently used one so it shouldn't be slow. Ok, maybe it is not suitable for me if I want a user friendly search function. >> Is there a way >> to initially cache the index or tune some postgres parameter? (I tried >> to increase "shared_buffers", "effective_cache_size", "work_mem" but >> had no effect on it) Or I should look for another search technology? (I >> just want to provide a 2 sec maximum search time at 1 million records, >> I think it is not a big expectation nowadays) > If you want to force the data to be cached, just put a cron-job in to > run a query for "abc" or whatever once a minute. I can't guess what the user want to search. > Of course, if it turns out that your users aren't running the query very > often then you're wasting resources, and if they are running it often > then again you're wasting resources. But - if you really need to, that's > the solution. It seems if I periodically use VACUUM ANALYZE, it helps to reduce the search time...I will see how can I tune it more. Thanks for the reply. Abbath
On Wed, 2006-03-15 at 13:36, Abbath wrote: > Hello Richard, > > Wednesday, March 15, 2006, 3:35:26 PM, you wrote: > > If you want to force the data to be cached, just put a cron-job in to > > run a query for "abc" or whatever once a minute. > > I can't guess what the user want to search. But that query will likely load up all the index info into memory. > > Of course, if it turns out that your users aren't running the query very > > often then you're wasting resources, and if they are running it often > > then again you're wasting resources. But - if you really need to, that's > > the solution. > > It seems if I periodically use VACUUM ANALYZE, it helps to reduce the search > time...I will see how can I tune it more. That statement is pretty telling. You're new to PostgreSQL I'll assume. You'll need to read up on the periodic maintenance section of the docs. Here ya go: http://www.postgresql.org/docs/8.1/interactive/maintenance.html
Hello Scott, Wednesday, March 15, 2006, 8:49:00 PM, you wrote: > On Wed, 2006-03-15 at 13:36, Abbath wrote: >> Hello Richard, >> >> Wednesday, March 15, 2006, 3:35:26 PM, you wrote: >> > If you want to force the data to be cached, just put a cron-job in to >> > run a query for "abc" or whatever once a minute. >> >> I can't guess what the user want to search. > But that query will likely load up all the index info into memory. Misunderstanding: I experienced that if I run a search for a keyword first time it is slow, then next time it is fast BUT for that keyword, not for any keyword. >> > Of course, if it turns out that your users aren't running the query very >> > often then you're wasting resources, and if they are running it often >> > then again you're wasting resources. But - if you really need to, that's >> > the solution. >> >> It seems if I periodically use VACUUM ANALYZE, it helps to reduce the search >> time...I will see how can I tune it more. > That statement is pretty telling. You're new to PostgreSQL I'll > assume. You'll need to read up on the periodic maintenance section of > the docs. > Here ya go: > http://www.postgresql.org/docs/8.1/interactive/maintenance.html Yes, I have just started to use postgres so I need further experience. Thanks for the link. > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
On Wed, 2006-03-15 at 18:09, Abbath wrote: > Hello Scott, > > Wednesday, March 15, 2006, 8:49:00 PM, you wrote: > >> > >> I can't guess what the user want to search. > > > But that query will likely load up all the index info into memory. > > Misunderstanding: I experienced that if I run a search for a keyword > first time it is slow, then next time it is fast BUT for that keyword, > not for any keyword. I think you mean "ONLY for that keyword" there? If everything else becomes fast but the keyword becomes slow, then we've got a very interesting (and possibly difficult) problem. Full text search is the kind of problem you throw ONE database at on a machine with LOTS of ram. It doesn't need lots of CPU horsepower, or even disk performance, as long as everything can fit into RAM. Then, set shared_buffers to 10-15% of the memory size, and let the OS do the caching. One of the best performance tuning docs is here: http://www.varlena.com/GeneralBits/Tidbits/perf.html > > That statement is pretty telling. You're new to PostgreSQL I'll > > assume. You'll need to read up on the periodic maintenance section of > > the docs. > > > Here ya go: > > http://www.postgresql.org/docs/8.1/interactive/maintenance.html > > Yes, I have just started to use postgres so I need further experience. > Thanks for the link. We all started somewhere. PostgreSQL is a pretty good place to start learning databases.