Re: [GENERAL] Fragments in tsearch2 headline

Поиск
Список
Период
Сортировка
От Sushant Sinha
Тема Re: [GENERAL] Fragments in tsearch2 headline
Дата
Msg-id 9fb559330807142150m75fa325fv52f161e6857a712d@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [GENERAL] Fragments in tsearch2 headline  (Teodor Sigaev <teodor@sigaev.ru>)
Ответы Re: [GENERAL] Fragments in tsearch2 headline  (Teodor Sigaev <teodor@sigaev.ru>)
Re: [GENERAL] Fragments in tsearch2 headline  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-hackers
Attached a new patch that:

1. fixes previous bug
2. better handles the case when cover size is greater than the MaxWords. Basically it divides a cover greater than MaxWords into fragments of MaxWords, resizes each such fragment so that each end of the fragment contains a query word and then evaluates best fragments based on number of query words in each fragment. In case of tie it picks up the smaller fragment. This allows more query words to be shown with multiple fragments in case a single cover is larger than the MaxWords.

The resizing of a  fragment such that each end is a query word provides room for stretching both sides of the fragment. This (hopefully) better presents the context in which query words appear in the document. If a cover is smaller than MaxWords then the cover is treated as a fragment.

Let me know if you have any more suggestions or anything is not clear.

I have not yet added the regression tests. The regression test suite seemed to be only ensuring that the function works. How many tests should I be adding? Is there any other place that I need to add different test cases for the function?

-Sushant.


Nice. But it will be good to resolve following issues:
1) Patch contains mistakes, I didn't investigate or carefully read it. Get http://www.sai.msu.su/~megera/postgres/fts/apod.dump.gz and load in db.

Queries
# select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1') from apod where to_tsvector(body) @@ plainto_tsquery('black hole');

and

# select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1') from apod;

crash postgresql :(

2) pls, include in your patch documentation and regression tests.



Another change that I was thinking:

Right now if cover size > max_words then I just cut the trailing words.
Instead I was thinking that we should split the cover into more
fragments such that each fragment contains a few query words. Then each
fragment will not contain all query words but will show more occurrences
of query words in the headline. I would  like to know what your opinion
on this is.

Agreed.


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                  WWW: http://www.sigaev.ru/

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: [PATCHES] WIP: executor_hook for pg_stat_statements
Следующее
От: Greg Smith
Дата:
Сообщение: Re: posix advises ...