Re: Ways to speed up ts_rank
| От | Shane Hathaway |
|---|---|
| Тема | Re: Ways to speed up ts_rank |
| Дата | |
| Msg-id | 5075BDB2.2030507@hathawaymix.org обсуждение |
| Ответ на | Re: Ways to speed up ts_rank (François Beausoleil <francois@teksol.info>) |
| Список | pgsql-performance |
On 10/10/2012 06:38 AM, François Beausoleil wrote: > Notice the ts_rank() is on the outer query, which means it'll only > run on the subset of documents which match the query. This is > explicitly mentioned in the docs: > > """Ranking can be expensive since it requires consulting the tsvector > of each matching document, which can be I/O bound and therefore slow. > Unfortunately, it is almost impossible to avoid since practical > queries often result in large numbers of matches.""" > > (last paragraph of) > http://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-RANKING Indeed, I have studied that paragraph in depth, trying to gather as much possible meaning from it as I can. :-) However, the following two queries take exactly the same time, suggesting to me that ts_rank_cd is really only looking at matching rows, not all rows: SELECT docid, coefficient * ts_rank_cd('{0.1, 0.2, 0.5, 1.0}', text_vector, to_tsquery('english', 'stuff')) AS rank FROM pgtextindex WHERE (text_vector @@ to_tsquery('english', 'stuff')) ORDER BY rank DESC limit 3; SELECT docid, coefficient * ts_rank_cd('{0.1, 0.2, 0.5, 1.0}', text_vector, to_tsquery('english', 'stuff')) AS rank FROM (SELECT * FROM pgtextindex WHERE (text_vector @@ to_tsquery('english', 'stuff'))) AS filtered ORDER BY rank DESC limit 3; Thanks for the suggestion though. By the way, all the tsvectors are already loaded into the kernel cache when I execute the queries, so ranking large documents is in fact CPU bound rather than I/O bound. The CPU is pegged for the whole time. Shane
В списке pgsql-performance по дате отправления: