Re: [GENERAL] Fragments in tsearch2 headline
От | Sushant Sinha |
---|---|
Тема | Re: [GENERAL] Fragments in tsearch2 headline |
Дата | |
Msg-id | 1214056853.8689.10.camel@dragflick обсуждение исходный текст |
Ответ на | Re: [GENERAL] Fragments in tsearch2 headline (Teodor Sigaev <teodor@sigaev.ru>) |
Ответы |
Re: [GENERAL] Fragments in tsearch2 headline
|
Список | pgsql-hackers |
I have an attached an updated patch with following changes: 1. Respects ShortWord and MinWords 2. Uses hlCover instead of Cover 3. Does not store norm (or lexeme) for headline marking 4. Removes ts_rank.h 5. Earlier it was counting even NONWORDTOKEN in the headline. Now it only counts the actual words and excludes spaces etc. I have also changed NumFragments option to MaxFragments as there may not be enough covers to display NumFragments. Another change that I was thinking: Right now if cover size > max_words then I just cut the trailing words. Instead I was thinking that we should split the cover into more fragments such that each fragment contains a few query words. Then each fragment will not contain all query words but will show more occurrences of query words in the headline. I would like to know what your opinion on this is. -Sushant. On Thu, 2008-06-05 at 20:21 +0400, Teodor Sigaev wrote: > > A couple of caveats: > > > > 1. ts_headline testing was done with current cvs head where as > > headline_with_fragments was done with postgres 8.3.1. > > 2. For headline_with_fragments, TSVector for the document was obtained > > by joining with another table. > > Are these differences understandable? > > That is possible situation because ts_headline has several criterias of 'best' > covers - length, number of words from query, good words at the begin and at the > end of headline while your fragment's algorithm takes care only on total number > of words in all covers. It's not very good, but it's acceptable, I think. > Headline (and ranking too) hasn't any formal rules to define is it good or bad? > Just a people's opinions. > > Next possible reason: original algorithm had a look on all covers trying to find > the best one while your algorithm tries to find just the shortest covers to fill > a headline. > > But it's very desirable to use ShortWord - it's not very comfortable for user if > one option produces unobvious side effect with another one. > ` > > > If you think these caveats are the reasons or there is something I am > > missing, then I can repeat the entire experiments with exactly the same > > conditions. > > Interesting for me test is a comparing hlCover with Cover in your patch, i.e. > develop a patch which uses hlCover instead of Cover and compare old patch with > new one.
Вложения
В списке pgsql-hackers по дате отправления: