Re: Ways to speed up ts_rank

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Ways to speed up ts_rank
Дата
Msg-id Pine.LNX.4.64.1210101854460.1008@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Re: Ways to speed up ts_rank  (François Beausoleil <francois@teksol.info>)
Ответы Re: Ways to speed up ts_rank
Список pgsql-performance
We'll present in Prague some improvements in FTS. Unfortunately, we have
only several minutes during lighting talk. In short, we improved GIN to
store additional information, coordinates for fts, for example and return
ordered by rank search results, which gave us performance better than
sphynx. It's just a prototype, but we already got median at 8 msec for
6 mln classifieds.

We didn't tested for long documents yet.

Regards,
Oleg

On Wed, 10 Oct 2012, Fran?ois Beausoleil wrote:

>
> Le 2012-10-09 ? 17:38, Shane Hathaway a ?crit :
>
>> Hello,
>>
>> The database has a text index of around 200,000 documents. Investigation revealed that text queries are slow only
whenusing ts_rank or ts_rank_cd.  Without a ts_rank function, any query is answered within 200ms or so; with ts_rank
function,queries take up to 30 seconds.  Deeper investigation using gprof showed that the problem is probably not
ts_rankor ts_rank_cd, but the fact that those functions retrieve thousands of TOASTed tsvectors. 
>
> Is the query perhaps doing something like this:
>
> SELECT ...
> FROM table
> WHERE tsvectorcol @@ plainto_tsquery('...')
> ORDER BY ts_rank(...)
>
> If so, ts_rank() is run for every document. What you should do instead is:
>
> SELECT *
> FROM (
>    SELECT ...
>    FROM table
>    WHERE tsvectorcol @@ plainto_tsquery('...')) AS t1
> ORDER BY ts_rank(...)
>
> Notice the ts_rank() is on the outer query, which means it'll only run on the subset of documents which match the
query.This is explicitly mentioned in the docs: 
>
> """Ranking can be expensive since it requires consulting the tsvector of each matching document, which can be I/O
boundand therefore slow. Unfortunately, it is almost impossible to avoid since practical queries often result in large
numbersof matches.""" 
>
> (last paragraph of) http://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-RANKING
>
> Hope that helps!
> Fran?ois Beausoleil
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


В списке pgsql-performance по дате отправления:

Предыдущее
От: Shaun Thomas
Дата:
Сообщение: Re: shared_buffers/effective_cache_size on 96GB server
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Hyperthreading (was: Two identical systems, radically different performance)