Обсуждение: Ts_rank internals

Поиск
Список
Период
Сортировка

Ts_rank internals

От
"Heikki Linnakangas"
Дата:
Hi,

I tried to understand how ts_rank works, but I failed. What does Cover
function do? How does it work? What is the DocRepresentation data
structure like? I can see the definition of the struct, and the
get_docrep function to convert to that format, but by reading those I
can't figure out what the resulting DocRepresentation looks like.

I wonder if we could get rid of the istrue flag in QueryOperand, and use
a local BitmapSet variable instead? It seems wrong to have a temporary
flag that's only used in one function, in a struct that's used everywhere.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Ts_rank internals

От
Teodor Sigaev
Дата:
> I tried to understand how ts_rank works, but I failed. What does Cover
> function do? How does it work? What is the DocRepresentation data
> structure like? I can see the definition of the struct, and the
> get_docrep function to convert to that format, but by reading those I
> can't figure out what the resulting DocRepresentation looks like.
> I wonder if we could get rid of the istrue flag in QueryOperand, and use
> a local BitmapSet variable instead? It seems wrong to have a temporary
> flag that's only used in one function, in a struct that's used everywhere.
It's a play around CDR algorithms (Cover Density Ranking).

Based on paper Clarke et al., “Relevance Ranking for One to Three Term Queries.”  "
(http://citeseer.ist.psu.edu/clarke00relevance.html.Sorry, I lost the 
 
article itself, but may be Oleg has it. Simple and short description is placed 
at http://www2002.org/CDROM/refereed/643/node7.html.

We change original algorithm to support weight of lexeme, details are on Oleg's 
site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking

Array of DocRepresentation is a representation of document, it contains only 
lexemes from both tsvector and tsquery, and lexemes are ordered by position - as 
in original doc. Each DocRepresentation has links to corresponding QueryOperand   to optimize query execution while
extentsearch. When we enlarge current 
 
extent for one word then we set istrue flag for corresponding QueryOperand and 
execution tsquery from cover becomes very simple task.

It's possible to eliminate istrue flag, but it's needed to implement algorithm 
to execute tsquery over continuos part of document, not over whole document.



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: Ts_rank internals

От
Oleg Bartunov
Дата:
On Tue, 11 Sep 2007, Teodor Sigaev wrote:

>> I tried to understand how ts_rank works, but I failed. What does Cover
>> function do? How does it work? What is the DocRepresentation data
>> structure like? I can see the definition of the struct, and the
>> get_docrep function to convert to that format, but by reading those I
>> can't figure out what the resulting DocRepresentation looks like.
>> I wonder if we could get rid of the istrue flag in QueryOperand, and use
>> a local BitmapSet variable instead? It seems wrong to have a temporary
>> flag that's only used in one function, in a struct that's used everywhere.
> It's a play around CDR algorithms (Cover Density Ranking).
>
> Based on paper Clarke et al., Relevance Ranking for One to Three Term 
> Queries.  " (http://citeseer.ist.psu.edu/clarke00relevance.html. Sorry, I 
> lost the article itself, but may be Oleg has it. Simple and short description 
> is placed at http://www2002.org/CDROM/refereed/643/node7.html.
>
> We change original algorithm to support weight of lexeme, details are on 
> Oleg's site: http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking

Actually, we used two papers
http://citeseer.ist.psu.edu/clarke00relevance.html
and 
http://portal.acm.org/ft_gateway.cfm?id=333137&type=pdf&dl=GUIDE&dl=ACM
I can send you the latter if you have no access to the ACM.


>
> Array of DocRepresentation is a representation of document, it contains only 
> lexemes from both tsvector and tsquery, and lexemes are ordered by position - 
> as in original doc. Each DocRepresentation has links to corresponding 
> QueryOperand   to optimize query execution while extent search. When we 
> enlarge current extent for one word then we set istrue flag for corresponding 
> QueryOperand and execution tsquery from cover becomes very simple task.
>
> It's possible to eliminate istrue flag, but it's needed to implement 
> algorithm to execute tsquery over continuos part of document, not over whole 
> document.
>
>
>
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83