Chris Gamache wrote:
> For my particular case, word repetition shouldn't be relevant in determining
> the rank of a document. If I strip() the vector, I loose what relevance
> proximity and weight add to the rank. It seems impossible, yet I ask anyway: Is
> it possible to eliminate the second (third, fourth, fifth, etc.) occurrence of
> any given word when its presence in the document is being scored, yet kept in
> the equation for modifications to the score when proximity is being considered?
I don't see the way except modify strip or rank functions...
--
Teodor Sigaev E-mail: teodor@sigaev.ru