Re: BUG #8354: stripped positions can generate nonzero rank in ts_rank_cd

Поиск
Список
Период
Сортировка
От Alexander Hill
Тема Re: BUG #8354: stripped positions can generate nonzero rank in ts_rank_cd
Дата
Msg-id CA+KBOKxsaU7Q-Qc6-YV99AY1U_Rb0SbR78Bs5xM6=12PRMKJKA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #8354: stripped positions can generate nonzero rank in ts_rank_cd  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: BUG #8354: stripped positions can generate nonzero rank in ts_rank_cd  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-bugs
Hi Bruce, all,

I think this can be solved (if it's agreed that it's a bug) in a pretty
straightforward way: when creating the document representation used in
calculating cover density rank, we can just skip lexemes with no position
entirely.

Fix and tests here: https://github.com/AlexHill/postgres/compare/bug_8354

As a patch file here:
https://github.com/AlexHill/postgres/commit/cd522b254d166d569b86803115f0f499864e949b.patch

Cheers,
Alex





On Sat, Feb 1, 2014 at 5:22 AM, Bruce Momjian <bruce@momjian.us> wrote:

>
> Would someone please comment on this text search bug report?  Thanks.
>
> ---------------------------------------------------------------------------
>
> On Fri, Aug  2, 2013 at 07:03:42AM +0000, alex@hill.net.au wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference:      8354
> > Logged by:          Alex Hill
> > Email address:      alex@hill.net.au
> > PostgreSQL version: 9.2.4
> > Operating system:   OS X 10.8.4 Mountain Lion
> > Description:
> >
> > Hi all,
> >
> >
> > The docs for ts_rank_cd state:
> >
> >
> > "This function requires positional information in its input. Therefore it
> > will not work on "stripped" tsvector values -- it will always return
> zero."
> >
> >
> > However if a tsvector contains some stripped lexemes and some
> non-stripped,
> > ts_rank_cd will rank extents including the non-stripped values.
> >
> >
> > For example, this evaluates to zero as expected:
> >
> >
> >     SELECT ts_rank_cd(strip(to_tsvector('text search')),
> > plainto_tsquery('text search'))
> >
> >
> >
> >
> > But this doesn't:
> >
> >
> >     SELECT ts_rank_cd(to_tsvector('text') ||
> strip(to_tsvector('search')),
> > plainto_tsquery('text search'))
> >
> >
> >
> >
> > I think this is a bug, if not in the code then in the documentation,
> which
> > isn't clear on what happens when stripped and positioned lexemes are
> mixed
> > in one tsvector.
> >
> >
> > I would prefer that stripped lexemes were completely ignored by
> ts_rank_cd:
> > my use case is using this as a fifth pseudo-weight, which matches a @@
> query
> > but doesn't add to a ts_rank_cd ranking.
> >
> >
> > What do you think?
> >
> >
> > Cheers,
> > Alex
> >
> >
> >
> > --
> > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-bugs
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + Everyone has their own god. +
>

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown
Следующее
От: maxim.boguk@gmail.com
Дата:
Сообщение: BUG #9135: PostgreSQL doesn't want use index scan instead of (index scan+sort+limit)