Re: PG 14 release notes, first draft

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: PG 14 release notes, first draft
Дата
Msg-id CAPpHfdsfm7OLha65gW+y25RsVaVRM=CmawZPRFswDVdBkbUhGA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PG 14 release notes, first draft  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: PG 14 release notes, first draft  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
.On Tue, May 11, 2021 at 11:31 PM Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, May 11, 2021 at 01:16:38PM +0300, Alexander Korotkov wrote:
> > > OK, what symbols trigger this change?  Underscore?  What else?
> >
> > Any symbol, which is recognized as a separator by full-text parser,
> > but not tsquery parser.  Fulltext search is extensible and allowing
> > pluggable parsers.  In principle, we could dig the exact set of
> > symbols, but I'm not sure this worth the effort.
> >
> > >  You are
> > > saying the previous code allowed 'pg' and 'class' anywhere in the
> > > string, while the new code requires them to be adjacent, which more
> > > closely matches the pattern.
> >
> > Yes, that's it.
> >
> > > >  * Fix extra distance in phrase operators for quoted text in
> > > > websearch_to_tsquery() (Alexander Korotkov)
> > > > For example, websearch_to_tsquery('english', '"aaa: bbb"') becomes
> > > > 'aaa <> bbb' instead of  'aaa <2> bbb'.
> > >
> > > So colon and space were considered to be two tokens between 'aaa' and
> > > 'bbb', while is really only one because both tokens are discarded?  Is
> > > this true of any discarded tokens, e.g. ''"aaa ?:, bbb"'?
> >
> > Yes, that's true for any discarded tokens.
>
> I can up with this text for these two items.  I think it still needs ro
> be more specific:
>
>         <listitem>
>         <!--
>         Author: Alexander Korotkov <akorotkov@postgresql.org>
>         2021-01-31 [0c4f355c6] Fix parsing of complex morphs to tsquery
>         -->
>
>         <para>
>         Fix to_tsquery() and websearch_to_tsquery() to properly parse
>         certain discarded tokens in quotes (Alexander Korotkov)
>         </para>

This relates not just to quotes.  Original problem relates to quotes
in websearch_to_tsquery() and phrase operator in to_tsquery().  But
the solution changes output for all query operands containing
discarded tokens.

Could we try this?

Make to_tsquery() and websearch_to_tsquery() produce more strict
output for query parts containing discarded tokens.  In particular,
this makes to_tsquery() and websearch_to_tsquery() properly parse the
discarded tokens in phrase search operands and quotes correspondingly.

>         <para>
>         Certain discarded tokens, like underscore, caused the output
>         of these functions to produce incorrect tsquery output, e.g.,
>         websearch_to_tsquery('"pg_class pg"') used to output '( pg &
>         class ) <-> pg', but now outputs 'pg <-> class <-> pg'.
>         </para>
>         </listitem>

This part looks good to me.  I'd just suggest to extend the example to
to_tsquery() as well.

Certain discarded tokens, like underscore, caused the output of these
functions to produce incorrect tsquery output, e.g., both
websearch_to_tsquery('"pg_class pg"') and to_tsquery('pg_class <->
pg') used to output '( pg & class ) <-> pg', but now both output 'pg
<-> class <-> pg'.

>         <listitem>
>         <!--
>         Author: Alexander Korotkov <akorotkov@postgresql.org>
>         2021-05-03 [eb086056f] Make websearch_to_tsquery() parse text in quotes as a si
>         -->
>
>         <para>
>         Fix websearch_to_tsquery() to properly parse multiple adjacent
>         discarded tokens in quotes (Alexander Korotkov)
>         </para>
>
>         <para>
>         Previously, quoted text that contained multiple adjacent discarded
>         tokens were treated as multiple tokens, causing incorrect tsquery
>         output, e.g., websearch_to_tsquery('"aaa: bbb"') used to output
>         'aaa <2> bbb', but now  outputs 'aaa <-> bbb'.
>         </para>
>         </listitem>

This item looks good to me.

------
Regards,
Alexander Korotkov



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jonathan S. Katz"
Дата:
Сообщение: Re: PG 14 release notes, first draft
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: PG 14 release notes, first draft