tsearch2 headline and postgresql.conf

Поиск
Список
Период
Сортировка
От pgsql-performance@nullmx.com
Тема tsearch2 headline and postgresql.conf
Дата
Msg-id 43D3386A.8000004@nullmx.com
обсуждение исходный текст
Ответы Re: tsearch2 headline and postgresql.conf
Список pgsql-performance
Hi folks,

I'm not sure if this is the right place for this but thought I'd ask.
I'm relateively new to postgres having only used it on 3 projects and am
just delving into the setup and admin for the second time.

I decided to try tsearch2 for this project's search requirements but am
having trouble attaining adequate performance.  I think I've nailed it
down to trouble with the headline() function in tsearch2.

In short, there is a crawler that grabs HTML docs and places them in a
database.  The search is done using tsearch2 pretty much installed
according to instructions.  I have read a couple online guides suggested
by this list for tuning the postgresql.conf file.  I only made modest
adjustments because I'm not working with top-end hardware and am still
uncertain of the actual impact of the different paramenters.

I've been learning 'explain' and over the course of reading I have done
enough query tweaking to discover the source of my headache seems to be
headline().

On a query of 429 documents, of which the avg size of the stripped down
document as stored is 21KB, and the max is 518KB (an anomaly), tsearch2
performs exceptionally well returning most queries in about 100ms.

On the other hand, following the tsearch2 guide which suggests returning
that first portion as a subquery and then generating the headline() from
those results, I see the query increase to 4 seconds!

This seems to be directly related to document size.  If I filter out
that 518KB doc along with some 100KB docs by returning "substring(
stripped_text FROM 0 FOR 50000) AS stripped_text" I decrease the time to
1.4 seconds, but increase the risk of not getting a headline.

Seeing as how this problem is directly tied to document size, I'm
wondering if there are any specific settings in postgresql.conf that may
help, or is this just a fact of life for the headline() function?  Or,
does anyone know what the problem is and how to overcome it?

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: libpq vs. unixODBC performance
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: tsearch2 headline and postgresql.conf