Re: Tsearch2 performance on big database

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Tsearch2 performance on big database
Дата
Msg-id Pine.GSO.4.62.0503221840480.5508@ra.sai.msu.su
обсуждение исходный текст
Ответ на Re: Tsearch2 performance on big database  (Mike Rylander <mrylander@gmail.com>)
Ответы Re: Tsearch2 performance on big database  (Rick Jansen <rick@rockingstone.nl>)
Список pgsql-performance
Mike,

no comments before Rick post tsearch configs and increased buffers !
Union shouldn't be faster than (term1|term2).
tsearch2 internals description might help you understanding tsearch2 limitations.
See  http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
Also, don't miss my notes:
http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes

Oleg
On Tue, 22 Mar 2005, Mike Rylander wrote:

> On Tue, 22 Mar 2005 14:25:19 +0100, Rick Jansen <rick@rockingstone.nl> wrote:
>>
>> ilab=# explain analyze select count(titel) from books where idxfti @@
>> to_tsquery('default', 'buckingham | palace');
>>                                                                 QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------------------------
>>   Aggregate  (cost=35547.99..35547.99 rows=1 width=56) (actual
>> time=125968.119..125968.120 rows=1 loops=1)
>>     ->  Index Scan using idxfti_idx on books  (cost=0.00..35525.81
>> rows=8869 width=56) (actual time=0.394..125958.245 rows=3080 loops=1)
>>           Index Cond: (idxfti @@ '\'buckingham\' | \'palac\''::tsquery)
>>   Total runtime: 125968.212 ms
>> (4 rows)
>>
>> Time: 125969.264 ms
>> ilab=#
>
> Ahh... I should have qualified my claim.  I am creating a google-esqe
> search interface and almost every query uses '&' as the term joiner.
> 'AND' queries and one-term queries are orders of magnitude faster than
> 'OR' queries, and fortunately are the expected default for most users.
> (Think, "I typed in these words, therefore I want to match these
> words"...)  An interesting test may be to time multiple queries
> independently, one for each search term, and see if the combined cost
> is less than a single 'OR' search.  If so, you could use UNION to join
> the results.
>
> However, the example you originally gave ('terminology') should be
> very fast.  On a comparable query ("select count(value) from
> metabib.full_rec where index_vector @@ to_tsquery('default','jane');")
> I get 12ms.
>
> Oleg, do you see anything else on the surface here?
>
> Try:
>
> EXPLAIN ANALYZE
>  SELECT titel FROM books WHERE idxfti @@
>    to_tsquery('default', 'buckingham')
>  UNION
>  SELECT titel FROM books WHERE idxfti @@
>    to_tsquery('default', 'palace');
>
> and see if using '&' instead of '|' where you can helps out.  I
> imagine you'd be surprised by the speed of:
>
>  SELECT titel FROM books WHERE idxfti @@
>    to_tsquery('default', 'buckingham&palace');
>
>
>>
>> > As an example of what I think you *should* be seeing, I have a similar
>> > box (4 procs, but that doesn't matter for one query) and I can search
>> > a column with tens of millions of rows in around a second.
>> >
>>
>> That sounds very promising, I'd love to get those results.. could you
>> tell me what your settings are, howmuch memory you have and such?
>
> 16G of RAM on a dedicated machine.
>
>
> shared_buffers = 15000          # min 16, at least max_connections*2, 8KB each
> work_mem = 10240                # min 64, size in KB
> maintenance_work_mem = 1000000  # min 1024, size in KB
> # big m_w_m for loading data...
>
> random_page_cost = 2.5          # units are one sequential page fetch cost
> # fast drives, and tons of RAM
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

В списке pgsql-performance по дате отправления:

Предыдущее
От: Mike Rylander
Дата:
Сообщение: Re: Tsearch2 performance on big database
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: What needs to be done for real Partitioning?