Re: Google Summer of Code 2008

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Google Summer of Code 2008
Дата
Msg-id Pine.LNX.4.64.0803090532050.10010@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Re: Google Summer of Code 2008  (Jan Urbański <j.urbanski@students.mimuw.edu.pl>)
Ответы Text search selectivity improvements (was Re: Google Summer of Code 2008)  (Jan Urbański <j.urbanski@students.mimuw.edu.pl>)
Список pgsql-hackers
On Sat, 8 Mar 2008, Jan Urbaski wrote:

>
>> Unfortunately, selectivity estimation for query is much difficult than just 
>> estimate frequency of individual word.
>
> Sure, given something like 'cats & dogs'::tsquery the frequency of 'cat' and 
> 'dog' won't suffice. But at least it's a starting point and if we estimate 
> that 80% of the documents have 'dog' and 70% have 'cat' then we can tell for 
> sure that at least 50% have both and that's a lot better than 0.1% that's 
> being returned now.

certainly yes and given that most popular queries are single word query
this would very helpful in most cases.

The reason I though about ts_stat() improvement is that we could use its
statistics for incomplete search feature people requested, when 
AND query like ( a & b &c ) rewrites to a set of AND|OR queries depending
on the terms occurency.
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Oleg Bartunov
Дата:
Сообщение: Re: Google Summer of Code 2008
Следующее
От: Warren Turkal
Дата:
Сообщение: timestamp datatype cleanup