Обсуждение: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

Поиск
Список
Период
Сортировка

Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Jean-Pierre Pelletier
Дата:
Hi,

I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
matching consecutive words but it won't work for us if it cannot handle
consecutive *duplicate* words.

For example, the following returns true:    select
phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');

Is this expected ?

Thanks,
Jean-Pierre Pelletier



Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
> matching consecutive words but it won't work for us if it cannot handle
> consecutive *duplicate* words.

> For example, the following returns true:    select
> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');

> Is this expected ?

I concur that that seems like a rather useless behavior.  If we have
"x <-> y" it is not possible to match at distance zero, while if we
have "x <-> x" it seems unlikely that the user is expecting us to
treat that identically to "x".  So phrase search simply should not
consider distance-zero matches.

The attached one-liner patch seems to fix this problem, though I am
uncertain whether any other places need to be changed to match.
Also, there is a regression test case that changes:

*** /home/postgres/pgsql/src/test/regress/expected/tstypes.out  Thu May  5 19:21:17 2016
--- /home/postgres/pgsql/src/test/regress/results/tstypes.out   Tue Jun  7 17:55:41 2016
***************
*** 897,903 ****
  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
   ts_rank_cd
  ------------
!   0.0714286
  (1 row)

  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
--- 897,903 ----
  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
   ts_rank_cd
  ------------
!           0
  (1 row)

  SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');


I'm not sure if this case is intentionally exhibiting the behavior that
both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
result simply wasn't thought about carefully.

            regards, tom lane

diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 591e59c..95ad69b 100644
*** a/src/backend/utils/adt/tsvector_op.c
--- b/src/backend/utils/adt/tsvector_op.c
*************** TS_phrase_execute(QueryItem *curitem,
*** 1409,1415 ****
          {
              while (Lpos < Ldata.pos + Ldata.npos)
              {
!                 if (WEP_GETPOS(*Lpos) <= WEP_GETPOS(*Rpos))
                  {
                      /*
                       * Lpos is behind the Rpos, so we have to check the
--- 1409,1415 ----
          {
              while (Lpos < Ldata.pos + Ldata.npos)
              {
!                 if (WEP_GETPOS(*Lpos) < WEP_GETPOS(*Rpos))
                  {
                      /*
                       * Lpos is behind the Rpos, so we have to check the

Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:

regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;?column? 
----------t
(1 row)

regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;?column? 
----------t
(1 row)

I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes.  That is, applying <-> to a stripped tsvector
seems like user error to me.  Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?

(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the
lexemes are all being treated as having position zero, but I have
not checked.)
        regards, tom lane



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Oleg Bartunov
Дата:
On Wed, Jun 8, 2016 at 9:01 PM, Jean-Pierre Pelletier
<jppelletier@e-djuster.com> wrote:
> If instead of casts, functions to_tsvector() and to_tsquery() are used,
> then the results is (I think ?) as expected:

because to_tsvector() function returns positions of words.

>
> select to_tsvector('simple', 'cat bat fat rat') @@ to_tsquery('simple',
> 'cat <-> rat');
> or
> select to_tsvector('simple', 'rat cat bat fat') @@ to_tsquery('simple',
> 'cat <-> rat');
> returns "false"
>
> select to_tsvector('simple', 'cat rat bat fat') @@ to_tsquery('simple',
> 'cat <-> rat');
> returns "true"
>
> Jean-Pierre Pelletier
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Wednesday, June 8, 2016 1:12 PM
> To: Teodor Sigaev; Oleg Bartunov
> Cc: Jean-Pierre Pelletier; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@
> to_tsvector('simple', 'blue') be true ?
>
> Another thing I noticed: if you test with tsvectors that don't contain
> position info, <-> seems to reduce to &, that is it doesn't enforce
> relative position:
>
> regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
> ?column?
> ----------
>  t
> (1 row)
>
> regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
> ?column?
> ----------
>  t
> (1 row)
>
> I'm doubtful that this is a good behavior, because it seems like it can
> silently mask mistakes.  That is, applying <-> to a stripped tsvector
> seems like user error to me.  Actually throwing an error might be too
> much, but perhaps we should make such cases return false not true?
>
> (This is against HEAD, without the patch I suggested yesterday.
> It strikes me that that patch might change this behavior, if the lexemes
> are all being treated as having position zero, but I have not checked.)
>
>                         regards, tom lane



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Jean-Pierre Pelletier
Дата:
If instead of casts, functions to_tsvector() and to_tsquery() are used,
then the results is (I think ?) as expected:

select to_tsvector('simple', 'cat bat fat rat') @@ to_tsquery('simple',
'cat <-> rat');
or
select to_tsvector('simple', 'rat cat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "false"

select to_tsvector('simple', 'cat rat bat fat') @@ to_tsquery('simple',
'cat <-> rat');
returns "true"

Jean-Pierre Pelletier

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 8, 2016 1:12 PM
To: Teodor Sigaev; Oleg Bartunov
Cc: Jean-Pierre Pelletier; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Should phraseto_tsquery('simple', 'blue blue') @@
to_tsvector('simple', 'blue') be true ?

Another thing I noticed: if you test with tsvectors that don't contain
position info, <-> seems to reduce to &, that is it doesn't enforce
relative position:

regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------t
(1 row)

regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
?column?
----------t
(1 row)

I'm doubtful that this is a good behavior, because it seems like it can
silently mask mistakes.  That is, applying <-> to a stripped tsvector
seems like user error to me.  Actually throwing an error might be too
much, but perhaps we should make such cases return false not true?

(This is against HEAD, without the patch I suggested yesterday.
It strikes me that that patch might change this behavior, if the lexemes
are all being treated as having position zero, but I have not checked.)
        regards, tom lane



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Oleg Bartunov
Дата:
On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Another thing I noticed: if you test with tsvectors that don't contain
> position info, <-> seems to reduce to &, that is it doesn't enforce
> relative position:
>
> regression=# select 'cat bat fat rat'::tsvector @@ 'cat <-> rat'::tsquery;
>  ?column?
> ----------
>  t
> (1 row)
>
> regression=# select 'rat cat bat fat'::tsvector @@ 'cat <-> rat'::tsquery;
>  ?column?
> ----------
>  t
> (1 row)

yes, that's documented behaviour.


>
> I'm doubtful that this is a good behavior, because it seems like it can
> silently mask mistakes.  That is, applying <-> to a stripped tsvector
> seems like user error to me.  Actually throwing an error might be too
> much, but perhaps we should make such cases return false not true?

it's question of convention. Probably, returning false will quickly
indicate user
on his error, so such behaviour looks better.

>
> (This is against HEAD, without the patch I suggested yesterday.
> It strikes me that that patch might change this behavior, if the
> lexemes are all being treated as having position zero, but I have
> not checked.)

I didn't see the patch yet.

>
>                         regards, tom lane



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Oleg Bartunov
Дата:
On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
>> I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
>> matching consecutive words but it won't work for us if it cannot handle
>> consecutive *duplicate* words.
>
>> For example, the following returns true:    select
>> phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
>
>> Is this expected ?
>
> I concur that that seems like a rather useless behavior.  If we have
> "x <-> y" it is not possible to match at distance zero, while if we
> have "x <-> x" it seems unlikely that the user is expecting us to
> treat that identically to "x".  So phrase search simply should not
> consider distance-zero matches.

what's about word with several infinitives

select to_tsvector('en', 'leavings');     to_tsvector
------------------------'leave':1 'leavings':1
(1 row)

select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;?column?
----------t
(1 row)


>
> The attached one-liner patch seems to fix this problem, though I am
> uncertain whether any other places need to be changed to match.
> Also, there is a regression test case that changes:
>
> *** /home/postgres/pgsql/src/test/regress/expected/tstypes.out  Thu May  5 19:21:17 2016
> --- /home/postgres/pgsql/src/test/regress/results/tstypes.out   Tue Jun  7 17:55:41 2016
> ***************
> *** 897,903 ****
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
>    ts_rank_cd
>   ------------
> !   0.0714286
>   (1 row)
>
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
> --- 897,903 ----
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:A');
>    ts_rank_cd
>   ------------
> !           0
>   (1 row)
>
>   SELECT ts_rank_cd(' a:1 sa:2A sb:2D g'::tsvector, 'a <-> s:* <-> sa:B');
>
>
> I'm not sure if this case is intentionally exhibiting the behavior that
> both parts of "s:* <-> sa:A" can be matched to the same lexeme, or if the
> result simply wasn't thought about carefully.
>
>                         regards, tom lane
>



Oleg Bartunov <obartunov@gmail.com> writes:
> On Wed, Jun 8, 2016 at 1:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I concur that that seems like a rather useless behavior.  If we have
>> "x <-> y" it is not possible to match at distance zero, while if we
>> have "x <-> x" it seems unlikely that the user is expecting us to
>> treat that identically to "x".  So phrase search simply should not
>> consider distance-zero matches.

> what's about word with several infinitives

> select to_tsvector('en', 'leavings');
>       to_tsvector
> ------------------------
>  'leave':1 'leavings':1
> (1 row)

> select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
>  ?column?
> ----------
>  t
> (1 row)

Hmm.  I can grant that there might be some cases where you want to see
if two separate patterns match the same lexeme, but that seems like an
extremely specialized use-case that you would only invoke very
intentionally.  It should not be built in as part of the default behavior
of every phrase search, because 99% of the time this would be an
unexpected and unwanted match.  I'm not even convinced that the operator
for this should be spelled <0> --- that seems more like a hack than a
natural extension of phrase search.  But if we do spell it like that,
then I think it should be called out as a special case that only applies
to <0>; that is, for any other value of N, the match has to be to separate
lexemes.

This brings up something else that I am not very sold on: to wit,
do we really want the "less than or equal" distance behavior at all?
The documentation gives the example thatphraseto_tsquery('cat ate some rats')
produces( 'cat' <-> 'ate' ) <2> 'rat'
because "some" is a stopword.  However, that pattern will also match
"cat ate rats", which seems surprising and unexpected to me; certainly
it would surprise a user who did not realize that "some" is a stopword.

So I think there's a reasonable case for decreeing that <N> should only
match lexemes *exactly* N apart.  If we did that, we would no longer have
the misbehavior that Jean-Pierre is complaining about, and we'd not need
to argue about whether <0> needs to be treated specially.

Or maybe we need two operators, one for exactly-N-apart and one for
at-most-N-apart.
        regards, tom lane



Oleg Bartunov <obartunov@gmail.com> writes:
> On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another thing I noticed: if you test with tsvectors that don't contain
>> position info, <-> seems to reduce to &, that is it doesn't enforce
>> relative position:

> yes, that's documented behaviour.

Oh?  Where?  I've been going through the phrase-search documentation and
copy-editing it today, and I have not found this stated anywhere.
        regards, tom lane



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Oleg Bartunov
Дата:
On Thu, Jun 9, 2016 at 12:47 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Oleg Bartunov <obartunov@gmail.com> writes:
>> On Wed, Jun 8, 2016 at 8:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Another thing I noticed: if you test with tsvectors that don't contain
>>> position info, <-> seems to reduce to &, that is it doesn't enforce
>>> relative position:
>
>> yes, that's documented behaviour.
>
> Oh?  Where?  I've been going through the phrase-search documentation and
> copy-editing it today, and I have not found this stated anywhere.

Hmm, looks like it is missing.  We have told about this since 2008. Just found
http://www.sai.msu.su/~megera/postgres/talks/2009.pdf (slide 5) and
http://www.sai.msu.su/~megera/postgres/talks/pgcon-2016-fts.pdf (slide 27)

We need to reach a consensus here, since there is no way to say "I don't know".
I inclined to agree with you, that returning false is better in such a
case.That will
indicate user to the source of problem.


>
>                         regards, tom lane



On Tue, Jun 07, 2016 at 06:05:10PM -0400, Tom Lane wrote:
> Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
> > I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
> > matching consecutive words but it won't work for us if it cannot handle
> > consecutive *duplicate* words.
> 
> > For example, the following returns true:    select
> > phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
> 
> > Is this expected ?
> 
> I concur that that seems like a rather useless behavior.  If we have
> "x <-> y" it is not possible to match at distance zero, while if we
> have "x <-> x" it seems unlikely that the user is expecting us to
> treat that identically to "x".  So phrase search simply should not
> consider distance-zero matches.

[Action required within 72 hours.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
efforts toward speedy resolution.  Thanks.

[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com



On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
> On Tue, Jun 07, 2016 at 06:05:10PM -0400, Tom Lane wrote:
> > Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
> > > I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
> > > matching consecutive words but it won't work for us if it cannot handle
> > > consecutive *duplicate* words.
> > 
> > > For example, the following returns true:    select
> > > phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
> > 
> > > Is this expected ?
> > 
> > I concur that that seems like a rather useless behavior.  If we have
> > "x <-> y" it is not possible to match at distance zero, while if we
> > have "x <-> x" it seems unlikely that the user is expecting us to
> > treat that identically to "x".  So phrase search simply should not
> > consider distance-zero matches.
> 
> [Action required within 72 hours.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> efforts toward speedy resolution.  Thanks.
> 
> [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com

This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com



On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
> On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
> > On Tue, Jun 07, 2016 at 06:05:10PM -0400, Tom Lane wrote:
> > > Jean-Pierre Pelletier <jppelletier@e-djuster.com> writes:
> > > > I wanted to test if phraseto_tsquery(), new with 9.6 could be used for
> > > > matching consecutive words but it won't work for us if it cannot handle
> > > > consecutive *duplicate* words.
> > > 
> > > > For example, the following returns true:    select
> > > > phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue');
> > > 
> > > > Is this expected ?
> > > 
> > > I concur that that seems like a rather useless behavior.  If we have
> > > "x <-> y" it is not possible to match at distance zero, while if we
> > > have "x <-> x" it seems unlikely that the user is expecting us to
> > > treat that identically to "x".  So phrase search simply should not
> > > consider distance-zero matches.
> > 
> > [Action required within 72 hours.  This is a generic notification.]
> > 
> > The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
> > since you committed the patch believed to have created it, you own this open
> > item.  If some other commit is more relevant or if this does not belong as a
> > 9.6 open item, please let us know.  Otherwise, please observe the policy on
> > open item ownership[1] and send a status update within 72 hours of this
> > message.  Include a date for your subsequent status update.  Testers may
> > discover new open items at any time, and I want to plan to get them all fixed
> > well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> > efforts toward speedy resolution.  Thanks.
> > 
> > [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> 
> This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> a status update within 24 hours, and include a date for your subsequent status
> update.  Refer to the policy on open item ownership:
> http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2016-06-16 07:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Teodor Sigaev
Дата:
> IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
> for your status update.  Please reacquaint yourself with the policy on open
> item ownership[1] and then reply immediately.  If I do not hear from you by
> 2016-06-16 07:00 UTC, I will transfer this item to release management team
> ownership without further notice.
>
> [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com

I'm working on it right now.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
> On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
> > On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
> > > On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
> > > > [Action required within 72 hours.  This is a generic notification.]
> > > > 
> > > > The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
> > > > since you committed the patch believed to have created it, you own this open
> > > > item.  If some other commit is more relevant or if this does not belong as a
> > > > 9.6 open item, please let us know.  Otherwise, please observe the policy on
> > > > open item ownership[1] and send a status update within 72 hours of this
> > > > message.  Include a date for your subsequent status update.  Testers may
> > > > discover new open items at any time, and I want to plan to get them all fixed
> > > > well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> > > > efforts toward speedy resolution.  Thanks.
> > > > 
> > > > [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> > > 
> > > This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> > > a status update within 24 hours, and include a date for your subsequent status
> > > update.  Refer to the policy on open item ownership:
> > > http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> > 
> >IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
> >for your status update.  Please reacquaint yourself with the policy on open
> >item ownership[1] and then reply immediately.  If I do not hear from you by
> >2016-06-16 07:00 UTC, I will transfer this item to release management team
> >ownership without further notice.
> >
> >[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> 
> I'm working on it right now.

That is good news, but it is not a valid status update.  In particular, it
does not specify a date for your next update.



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Teodor Sigaev
Дата:
>> what's about word with several infinitives
>
>> select to_tsvector('en', 'leavings');
>>        to_tsvector
>> ------------------------
>>   'leave':1 'leavings':1
>> (1 row)
>
>> select to_tsvector('en', 'leavings') @@ 'leave <0> leavings'::tsquery;
>>   ?column?
>> ----------
>>   t
>> (1 row)

Second example is not correct:

select phraseto_tsquery('en', 'leavings')
will produce 'leave | leavings'

and

select phraseto_tsquery('en', 'leavings cats')
will produce 'leave <-> cat | leavings <-> cat'

which seems correct and we don't need special threating of <0>.

> This brings up something else that I am not very sold on: to wit,
> do we really want the "less than or equal" distance behavior at all?
> The documentation gives the example that
>     phraseto_tsquery('cat ate some rats')
> produces
>     ( 'cat' <-> 'ate' ) <2> 'rat'
> because "some" is a stopword.  However, that pattern will also match
> "cat ate rats", which seems surprising and unexpected to me; certainly
> it would surprise a user who did not realize that "some" is a stopword.
>
> So I think there's a reasonable case for decreeing that <N> should only
> match lexemes *exactly* N apart.  If we did that, we would no longer have
> the misbehavior that Jean-Pierre is complaining about, and we'd not need
> to argue about whether <0> needs to be treated specially.

Agree, seems that's easy to change. I thought that I saw an issue with
hyphenated word but, fortunately, I forget that hyphenated words don't share a
position:
# select to_tsvector('foo-bar');
          to_tsvector
-----------------------------
  'bar':3 'foo':2 'foo-bar':1
# select phraseto_tsquery('foo-bar');
          phraseto_tsquery
-----------------------------------
  ( 'foo-bar' <-> 'foo' ) <-> 'bar'
and
# select to_tsvector('foo-bar') @@ phraseto_tsquery('foo-bar');
  ?column?
----------
  t


Patch is attached

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Teodor Sigaev
Дата:
> We need to reach a consensus here, since there is no way to say "I don't know".
> I inclined to agree with you, that returning false is better in such a
> case.That will
> indicate user to the source of problem.

Here is a patch, now phrase operation returns false if there is not postion
information. If this behavior looks more reasonable, I'll commit that.


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения
Teodor Sigaev <teodor@sigaev.ru> writes:
>> So I think there's a reasonable case for decreeing that <N> should only
>> match lexemes *exactly* N apart.  If we did that, we would no longer have
>> the misbehavior that Jean-Pierre is complaining about, and we'd not need
>> to argue about whether <0> needs to be treated specially.

> Agree, seems that's easy to change.
> ...
> Patch is attached

Hmm, couldn't the loop logic be simplified a great deal if this is the
definition?  Or are you leaving it like that with the idea that we might
later introduce another operator with the less-than-or-equal behavior?
        regards, tom lane



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Teodor Sigaev
Дата:

Tom Lane wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
>>> So I think there's a reasonable case for decreeing that <N> should only
>>> match lexemes *exactly* N apart.  If we did that, we would no longer have
>>> the misbehavior that Jean-Pierre is complaining about, and we'd not need
>>> to argue about whether <0> needs to be treated specially.
>
>> Agree, seems that's easy to change.
>> ...
>> Patch is attached
>
> Hmm, couldn't the loop logic be simplified a great deal if this is the
> definition?  Or are you leaving it like that with the idea that we might
> later introduce another operator with the less-than-or-equal behavior?

Do you suggest something like merge join of two sorted lists? ie:

while(Rpos < Rdata.pos + Rdata.npos && Lpos < Ldata.pos + Ldata.npos)
{if (*Lpos > *Rpos)    Rpos++;else if (*Lpos < *Rpos){    if (*Rpos - *Lpos == distance)        match!    Lpos++;}else{
  if (distance == 0)        match!    Lpos++; Rpos++;}
 
}

Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be 
not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Teodor Sigaev <teodor@sigaev.ru> writes:
> Tom Lane wrote:
>> Hmm, couldn't the loop logic be simplified a great deal if this is the
>> definition?  Or are you leaving it like that with the idea that we might
>> later introduce another operator with the less-than-or-equal behavior?

> Do you suggest something like merge join of two sorted lists? ie:
> ...
> Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be 
> not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';

Oh ... the indexes in the lists don't have much to do with the distances,
do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
okay with the patch as presented.
        regards, tom lane



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Teodor Sigaev
Дата:
>> Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair could be
>> not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';
>
> Oh ... the indexes in the lists don't have much to do with the distances,
> do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
> okay with the patch as presented.

Huh, I found that my isn't correct for example which I show :(. Reworked patch
is in attach.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Robert Haas
Дата:
On Fri, Jun 17, 2016 at 11:07 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
>>> Such algorithm finds closest pair of (Lpos, Rpos) but satisfying pair
>>> could be
>>> not closest, example: to_tsvector('simple', '1 2 1 2') @@ '1 <3> 2';
>>
>>
>> Oh ... the indexes in the lists don't have much to do with the distances,
>> do they.  OK, maybe it's not quite as easy as I was thinking.  I'm
>> okay with the patch as presented.
>
>
> Huh, I found that my isn't correct for example which I show :(. Reworked
> patch is in attach.

We're really quickly running out of time to get this done before
beta2.  Please don't commit anything that's going to break the tree
because we only have about 72 hours before the wrap, but if it's
correct then it should go in.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Teodor Sigaev
Дата:
> We're really quickly running out of time to get this done before
> beta2.  Please don't commit anything that's going to break the tree
> because we only have about 72 hours before the wrap, but if it's
> correct then it should go in.

Isn't late now? Or wait to beta2 is out?

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Teodor Sigaev <teodor@sigaev.ru> writes:
>> We're really quickly running out of time to get this done before
>> beta2.  Please don't commit anything that's going to break the tree
>> because we only have about 72 hours before the wrap, but if it's
>> correct then it should go in.

> Isn't late now? Or wait to beta2 is out?

Let's wait till after beta2.
        regards, tom lane



On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
> On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
> > On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
> > > On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
> > > > On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
> > > > > [Action required within 72 hours.  This is a generic notification.]
> > > > > 
> > > > > The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
> > > > > since you committed the patch believed to have created it, you own this open
> > > > > item.  If some other commit is more relevant or if this does not belong as a
> > > > > 9.6 open item, please let us know.  Otherwise, please observe the policy on
> > > > > open item ownership[1] and send a status update within 72 hours of this
> > > > > message.  Include a date for your subsequent status update.  Testers may
> > > > > discover new open items at any time, and I want to plan to get them all fixed
> > > > > well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> > > > > efforts toward speedy resolution.  Thanks.
> > > > > 
> > > > > [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> > > > 
> > > > This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> > > > a status update within 24 hours, and include a date for your subsequent status
> > > > update.  Refer to the policy on open item ownership:
> > > > http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> > > 
> > >IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
> > >for your status update.  Please reacquaint yourself with the policy on open
> > >item ownership[1] and then reply immediately.  If I do not hear from you by
> > >2016-06-16 07:00 UTC, I will transfer this item to release management team
> > >ownership without further notice.
> > >
> > >[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> > 
> > I'm working on it right now.
> 
> That is good news, but it is not a valid status update.  In particular, it
> does not specify a date for your next update.

You still have not delivered the status update due thirteen days ago.  If I do
not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, or
if this item ever again becomes overdue for a status update, I will transfer
the item to release management team ownership.



On Sun, Jun 26, 2016 at 10:22:26PM -0400, Noah Misch wrote:
> On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
> > On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
> > > On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
> > > > On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
> > > > > On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
> > > > > > [Action required within 72 hours.  This is a generic notification.]
> > > > > > 
> > > > > > The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
> > > > > > since you committed the patch believed to have created it, you own this open
> > > > > > item.  If some other commit is more relevant or if this does not belong as a
> > > > > > 9.6 open item, please let us know.  Otherwise, please observe the policy on
> > > > > > open item ownership[1] and send a status update within 72 hours of this
> > > > > > message.  Include a date for your subsequent status update.  Testers may
> > > > > > discover new open items at any time, and I want to plan to get them all fixed
> > > > > > well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
> > > > > > efforts toward speedy resolution.  Thanks.
> > > > > > 
> > > > > > [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> > > > > 
> > > > > This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
> > > > > a status update within 24 hours, and include a date for your subsequent status
> > > > > update.  Refer to the policy on open item ownership:
> > > > > http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> > > > 
> > > >IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
> > > >for your status update.  Please reacquaint yourself with the policy on open
> > > >item ownership[1] and then reply immediately.  If I do not hear from you by
> > > >2016-06-16 07:00 UTC, I will transfer this item to release management team
> > > >ownership without further notice.
> > > >
> > > >[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
> > > 
> > > I'm working on it right now.
> > 
> > That is good news, but it is not a valid status update.  In particular, it
> > does not specify a date for your next update.
> 
> You still have not delivered the status update due thirteen days ago.  If I do
> not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, or
> if this item ever again becomes overdue for a status update, I will transfer
> the item to release management team ownership.

This PostgreSQL 9.6 open item now needs a permanent owner.  Would any other
committer like to take ownership?  I see Teodor committed some things relevant
to this item just today, so the task may be as simple as verifying that those
commits resolve the item.  If this role interests you, please read this thread
and the policy linked above, then send an initial status update bearing a date
for your subsequent status update.  If the item does not have a permanent
owner by 2016-07-01 07:00 UTC, I will resolve the item by reverting all phrase
search commits.

Thanks,
nm



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Oleg Bartunov
Дата:
On Tue, Jun 28, 2016 at 9:32 AM, Noah Misch <noah@leadboat.com> wrote:
> On Sun, Jun 26, 2016 at 10:22:26PM -0400, Noah Misch wrote:
>> On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
>> > On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
>> > > On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
>> > > > On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
>> > > > > On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
>> > > > > > [Action required within 72 hours.  This is a generic notification.]
>> > > > > >
>> > > > > > The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
>> > > > > > since you committed the patch believed to have created it, you own this open
>> > > > > > item.  If some other commit is more relevant or if this does not belong as a
>> > > > > > 9.6 open item, please let us know.  Otherwise, please observe the policy on
>> > > > > > open item ownership[1] and send a status update within 72 hours of this
>> > > > > > message.  Include a date for your subsequent status update.  Testers may
>> > > > > > discover new open items at any time, and I want to plan to get them all fixed
>> > > > > > well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
>> > > > > > efforts toward speedy resolution.  Thanks.
>> > > > > >
>> > > > > > [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
>> > > > >
>> > > > > This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
>> > > > > a status update within 24 hours, and include a date for your subsequent status
>> > > > > update.  Refer to the policy on open item ownership:
>> > > > > http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
>> > > >
>> > > >IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
>> > > >for your status update.  Please reacquaint yourself with the policy on open
>> > > >item ownership[1] and then reply immediately.  If I do not hear from you by
>> > > >2016-06-16 07:00 UTC, I will transfer this item to release management team
>> > > >ownership without further notice.
>> > > >
>> > > >[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
>> > >
>> > > I'm working on it right now.
>> >
>> > That is good news, but it is not a valid status update.  In particular, it
>> > does not specify a date for your next update.
>>
>> You still have not delivered the status update due thirteen days ago.  If I do
>> not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, or
>> if this item ever again becomes overdue for a status update, I will transfer
>> the item to release management team ownership.
>
> This PostgreSQL 9.6 open item now needs a permanent owner.  Would any other
> committer like to take ownership?  I see Teodor committed some things relevant
> to this item just today, so the task may be as simple as verifying that those
> commits resolve the item.  If this role interests you, please read this thread
> and the policy linked above, then send an initial status update bearing a date
> for your subsequent status update.  If the item does not have a permanent
> owner by 2016-07-01 07:00 UTC, I will resolve the item by reverting all phrase
> search commits.

Teodor pushed three patches, two of them fix the issues discussed in
this topic (working with duplicates and disable fallback to & for
stripped tsvector)and the one about precedence of phrase search tsquery operator, which
was discussed in separate thread
(https://www.postgresql.org/message-id/flat/576AB63C.7090504%40sigaev.ru#576AB63C.7090504@sigaev.ru)

They all look good, but need small documentation patch. I will provide it later.



>
> Thanks,
> nm



Re: Should phraseto_tsquery('simple', 'blue blue') @@ to_tsvector('simple', 'blue') be true ?

От
Oleg Bartunov
Дата:
On Tue, Jun 28, 2016 at 7:00 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
> On Tue, Jun 28, 2016 at 9:32 AM, Noah Misch <noah@leadboat.com> wrote:
>> On Sun, Jun 26, 2016 at 10:22:26PM -0400, Noah Misch wrote:
>>> On Wed, Jun 15, 2016 at 11:08:54AM -0400, Noah Misch wrote:
>>> > On Wed, Jun 15, 2016 at 03:02:15PM +0300, Teodor Sigaev wrote:
>>> > > On Wed, Jun 15, 2016 at 02:54:33AM -0400, Noah Misch wrote:
>>> > > > On Mon, Jun 13, 2016 at 10:44:06PM -0400, Noah Misch wrote:
>>> > > > > On Fri, Jun 10, 2016 at 03:10:40AM -0400, Noah Misch wrote:
>>> > > > > > [Action required within 72 hours.  This is a generic notification.]
>>> > > > > >
>>> > > > > > The above-described topic is currently a PostgreSQL 9.6 open item.  Teodor,
>>> > > > > > since you committed the patch believed to have created it, you own this open
>>> > > > > > item.  If some other commit is more relevant or if this does not belong as a
>>> > > > > > 9.6 open item, please let us know.  Otherwise, please observe the policy on
>>> > > > > > open item ownership[1] and send a status update within 72 hours of this
>>> > > > > > message.  Include a date for your subsequent status update.  Testers may
>>> > > > > > discover new open items at any time, and I want to plan to get them all fixed
>>> > > > > > well in advance of shipping 9.6rc1.  Consequently, I will appreciate your
>>> > > > > > efforts toward speedy resolution.  Thanks.
>>> > > > > >
>>> > > > > > [1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
>>> > > > >
>>> > > > > This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
>>> > > > > a status update within 24 hours, and include a date for your subsequent status
>>> > > > > update.  Refer to the policy on open item ownership:
>>> > > > > http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
>>> > > >
>>> > > >IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 9.6 open item is long past due
>>> > > >for your status update.  Please reacquaint yourself with the policy on open
>>> > > >item ownership[1] and then reply immediately.  If I do not hear from you by
>>> > > >2016-06-16 07:00 UTC, I will transfer this item to release management team
>>> > > >ownership without further notice.
>>> > > >
>>> > > >[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com
>>> > >
>>> > > I'm working on it right now.
>>> >
>>> > That is good news, but it is not a valid status update.  In particular, it
>>> > does not specify a date for your next update.
>>>
>>> You still have not delivered the status update due thirteen days ago.  If I do
>>> not hear from you a fully-conforming status update by 2016-06-28 03:00 UTC, or
>>> if this item ever again becomes overdue for a status update, I will transfer
>>> the item to release management team ownership.
>>
>> This PostgreSQL 9.6 open item now needs a permanent owner.  Would any other
>> committer like to take ownership?  I see Teodor committed some things relevant
>> to this item just today, so the task may be as simple as verifying that those
>> commits resolve the item.  If this role interests you, please read this thread
>> and the policy linked above, then send an initial status update bearing a date
>> for your subsequent status update.  If the item does not have a permanent
>> owner by 2016-07-01 07:00 UTC, I will resolve the item by reverting all phrase
>> search commits.
>
> Teodor pushed three patches, two of them fix the issues discussed in
> this topic (working with duplicates and disable fallback to & for
> stripped tsvector)
>  and the one about precedence of phrase search tsquery operator, which
> was discussed in separate thread
> (https://www.postgresql.org/message-id/flat/576AB63C.7090504%40sigaev.ru#576AB63C.7090504@sigaev.ru)
>
> They all look good, but need small documentation patch. I will provide it later.

I attached a little documentation patch to textsearch.sgml.

>
>
>
>>
>> Thanks,
>> nm

Вложения
Oleg Bartunov <obartunov@gmail.com> writes:
>> On Tue, Jun 28, 2016 at 9:32 AM, Noah Misch <noah@leadboat.com> wrote:
> This PostgreSQL 9.6 open item now needs a permanent owner.  Would any other
> committer like to take ownership?  I see Teodor committed some things relevant
> to this item just today, so the task may be as simple as verifying that those
> commits resolve the item.

> I attached a little documentation patch to textsearch.sgml.

That didn't cover all the places that needed to be fixed, but I have
re-read the docs and believe I've made things good now.

I have reviewed this thread and verified that all the cases raised in it
now work as desired, so I have marked the open item closed.
        regards, tom lane



> That didn't cover all the places that needed to be fixed, but I have
> re-read the docs and believe I've made things good now.
>
> I have reviewed this thread and verified that all the cases raised in it
> now work as desired, so I have marked the open item closed.

Thank you very much!
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/