Обсуждение: TSearch and rankings

Поиск
Список
Период
Сортировка

TSearch and rankings

От
"Bas Scheffers"
Дата:
Hi,

Is there a way to use tsearch so that it returns documents that have less
than all the required keywords? The idea is that if a document only has 3
out of 4 terms, it is still returned, but with a lower ranking.

So far I haven't found a way to do this in the documentation. Is there
something like a "maybe" operator? (ie: 'foo&bar&~doh', meaning documents
with foo and bar, and optionaly doh, but those with would be ranked
higher)

Cheers,
Bas.

Re: TSearch and rankings

От
Teodor Sigaev
Дата:

Bas Scheffers wrote:
> Hi,
>
> Is there a way to use tsearch so that it returns documents that have less
> than all the required keywords? The idea is that if a document only has 3
> out of 4 terms, it is still returned, but with a lower ranking.
>
> So far I haven't found a way to do this in the documentation. Is there
> something like a "maybe" operator? (ie: 'foo&bar&~doh', meaning documents
> with foo and bar, and optionaly doh, but those with would be ranked
> higher)



(foo&bar)|(foo&bar&doh)

I think, it's what you want.


--
Teodor Sigaev                                  E-mail: teodor@sigaev.ru

Re: TSearch and rankings

От
"Bas Scheffers"
Дата:
Teodor Sigaev said:
> (foo&bar)|(foo&bar&doh)
> I think, it's what you want.
That simple huh? Can become a bit complicated, doing an OR for all the
different combinations, but a quick test I just did did show a higher
ranking for the documents that matched the larger query. And quite usable
in my application.

Do big queries have a significant inpact on search performance? (this is
something that is important!)

Thanks,
Bas.


Re: TSearch and rankings

От
Oleg Bartunov
Дата:
On Mon, 9 Feb 2004, Bas Scheffers wrote:

> Teodor Sigaev said:
> > (foo&bar)|(foo&bar&doh)
> > I think, it's what you want.
> That simple huh? Can become a bit complicated, doing an OR for all the
> different combinations, but a quick test I just did did show a higher
> ranking for the documents that matched the larger query. And quite usable
> in my application.
>
> Do big queries have a significant inpact on search performance? (this is
> something that is important!)

Sure :( In degenerated case you end with query like (word1|word2|word3|..|wordN)
and it's equivalent running N searches with single word query, which isn't
effective. Intrinsically, tsearch2 is much faster for long AND queries,
which is opposite to standard search engines based on inverted indexes.

>
> Thanks,
> Bas.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: TSearch and rankings

От
Teodor Sigaev
Дата:

Oleg Bartunov wrote:
> On Mon, 9 Feb 2004, Bas Scheffers wrote:
>
>
>>Teodor Sigaev said:
>>
>>>(foo&bar)|(foo&bar&doh)
>>>I think, it's what you want.
>>
>>That simple huh? Can become a bit complicated, doing an OR for all the
>>different combinations, but a quick test I just did did show a higher
>>ranking for the documents that matched the larger query. And quite usable
>>in my application.
>>
>>Do big queries have a significant inpact on search performance? (this is
>>something that is important!)
>
>
> Sure :( In degenerated case you end with query like (word1|word2|word3|..|wordN)
> and it's equivalent running N searches with single word query, which isn't
> effective. Intrinsically, tsearch2 is much faster for long AND queries,
> which is opposite to standard search engines based on inverted indexes.

Ugh. The performance for complex query such as
(foo&bar)|(foo&bar&doh)|(foo&bar&doh&other)
will be equals to simple query foo&bar, because other variants is a stronger
that simplest variant. Performance is defined by number of page readed (we
suppose that CPU is much faster than disks) and if more ANDed words in query
than smaller number of readed pages.




--
Teodor Sigaev                                  E-mail: teodor@sigaev.ru

Re: TSearch and rankings

От
"Bas Scheffers"
Дата:
Teodor Sigaev said:
> (foo&bar)|(foo&bar&doh)|(foo&bar&doh&other)
> will be equals to simple query foo&bar, because other variants is a
stronger
That sounds encouraging. My "documents" are actualy quite small. That is
because they are not documents, but just keywords for a user's profile.
(like age28, height187, countryuk, etc) So my documents won't have much
more than 20-25 terms to begin with.

But you do get queries like
'(age25|age26|...|age35)&(height180|...|height200)&countryuk)'

I tested this with a 10000 user database last night on my Athlon 850/384MB
and queries returned actrately in <150ms (and this included a normal where
clause on the base table I need to do as well) So so far I am impressed.

I'll test with a 100K user set later this week, using the "maybe" query
style and let you know my results.

Thanks again,
Bas.