Re: Google Summer of Code 2008

Поиск
Список
Период
Сортировка
От Jan Urbański
Тема Re: Google Summer of Code 2008
Дата
Msg-id 47D2DFDA.5010302@students.mimuw.edu.pl
обсуждение исходный текст
Ответ на Re: Google Summer of Code 2008  (Oleg Bartunov <oleg@sai.msu.su>)
Ответы Re: Google Summer of Code 2008  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-hackers
Oleg Bartunov wrote:
> Jan,
>
> the problem is known and well requested. From your promotion it's not
> clear what's an idea ?
>> Tom Lane wrote:
>>> Jan Urbański <j.urbanski@students.mimuw.edu.pl>
>>> writes:
>>>> 2. Implement better selectivity estimates for FTS.

OK, after reading through the some of the code the idea is to write a
custom typanalyze function for tsvector columns. It could look inside
the tsvectors, compute the most commonly appearing lexemes and store
that information in pg_statistics. Then there should be a custom
selectivity function for @@ and friends, that would look at the lexemes
in pg_statistics, see if the tsquery it got matches some/any of them and
return a result based on that.

I have a feeling that in many cases identifying the top 50 to 300
lexemes would be enough to talk about text search selectivity with a
degree of confidence. At least we wouldn't give overly low estimates for
queries looking for very popular words, which I believe is worse than
givng an overly high estimate for a obscure query (am I wrong here?).

Regards,
Jan

--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Simplifying Text Search
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: Google Summer of Code 2008