Обсуждение: [HACKERS] Rethinking our fulltext phrase-search implementation

Поиск
Список
Период
Сортировка

[HACKERS] Rethinking our fulltext phrase-search implementation

От
Tom Lane
Дата:
I've been thinking about how to fix the problem Andreas Seltenreich
reported at
https://postgr.es/m/87eg1y2s3x.fsf@credativ.de

The core of that problem is that the phrase-search patch attempts to
restructure tsquery trees so that there are no operators underneath a
PHRASE operator, except possibly other PHRASE operators.  The goal is
to not have to deal with identifying specific match locations except
while processing a PHRASE operator.  Well, that's an OK idea if it can
be implemented reasonably, but there are several problems with the code
as it stands:

1. The transformation is done by normalize_phrase_tree(), which is
currently invoked (via cleanup_fakeval_and_phrase()) in a rather
ad-hoc set of places including tsqueryin() and the various variants
of to_tsquery().  This leaves lots of scope for errors of omission,
which is exactly the proximate cause of Andreas' bug: ts_rewrite() is
neglecting to re-normalize its result tsquery.  I have little faith
that there aren't other similar errors of omission today, and even
less that we won't introduce more in future.

2. Because the transformation is invoked as early as tsqueryin(),
it's user-visible:

regression=# select 'a <-> (b | c)'::tsquery;         tsquery          
---------------------------'a' <-> 'b' | 'a' <-> 'c'
(1 row)

This is confusing to users, and I do not think it's a good idea to
expose what's basically an optimization detail this way.  For stored
tsqueries, we're frozen into this approach forever even if we later
decide that checking for 'a' twice isn't such a hot idea.

3. Worse, early application of the transformation creates problems for
operations such as ts_rewrite: a rewrite might fail to match, or produce
surprising results, because the query trees actually being operated on
aren't what the user probably thinks they are.  At a minimum, to get
consistent results I think we'd have to re-normalize after *each step*
of ts_rewrite, not only at the end.  That would be expensive, and it's
far from clear that it would eliminate all the surprises.

4. The transformations are wrong anyway.  The OR case I showed above is
all right, but as I argued in <24331.1480199636@sss.pgh.pa.us>, the AND
case is not:

regression=# select 'a <-> (b & c)'::tsquery;         tsquery          
---------------------------'a' <-> 'b' & 'a' <-> 'c'
(1 row)

This matches 'a b a c', because 'a <-> b' and 'a <-> c' can each be
matched at different places in that text; but it seems highly unlikely to
me that that's what the writer of such a query wanted.  (If she did want
that, she would write it that way to start with.)  NOT is not very nice
either:

regression=# select '!a <-> !b'::tsquery;             tsquery               
------------------------------------!'a' & !( !( 'a' <-> 'b' ) & 'b' )
(1 row)

If you dig through that, you realize that the <-> test is pointless;
the query can't match any vector containing 'a', so certainly the <->
condition can't succeed, making this an expensive way to spell "!a & !b".
And that's not the right semantics anyway: if 'x y' matches this query,
which it does and should, why doesn't 'x y a' match?

5. The case with only one NOT under <-> looks like this:

regression=# select '!a <-> b'::tsquery;       tsquery         
------------------------!( 'a' <-> 'b' ) & 'b'
(1 row)

This is more or less in line with the naive view of what its semantics
should be, although I notice that it will match a 'b' at position 1,
which might be a surprising result.  We're not out of the woods though:
this will (and should) match, eg, 'c b a'.  But that means that '!a & b'
is not a safe lossy approximation to '!a <-> b', which is an assumption
that is wired into a number of places.  Simple testing shows that indeed
GIN and GIST index searches get the wrong answers, different from what
you get in a non-indexed search, for queries like this.


So we have a mess here, which needs to be cleaned up quite aside from the
fact that it's capable of provoking Assert failures and/or crashes.

I thought for awhile about moving the normalize_phrase_tree() work
to occur at the start of tsquery execution, rather than in tsqueryin()
et al.  That addresses points 1..3, but doesn't by itself do anything
for points 4 or 5.  Also, it would be expensive because right now
execution works directly from the flat tsquery representation; there's no
conversion to an explicit tree structure on which normalize_phrase_tree()
could conveniently be applied.  Nor do we know at the start whether the
tsquery contains any PHRASE operators.

On the whole, it seems like the best fix would be to get rid of
normalize_phrase_tree() altogether, and instead fix the TS_execute()
engine so it can cope with regular operators underneath phrase operators.
We can still have the optimization of not worrying about lexeme locations
when no phrase operator has been seen, but we have to change
TS_phrase_execute to cope with plain AND/OR/NOT operators and calculate
proper location information for the result of one.

Here is a design for doing that.  The hard part seems to be dealing with
NOT: merging position lists during AND or OR is pretty clear, but how do
we represent the positions where a lexeme isn't?  I briefly considered
explicitly negating the position list, eg [2,5,7] goes to [1,3,4,6,8,...],
but the problem is where to stop.  If we knew the largest position present
in the current document, we could stop there, but AFAICS we generally
don't know that --- and even if we did, this approach could be quite
expensive for large documents.  So my design is based on keeping the
original position list and adding a "negate" flag that says these are
the positions where the query pattern does NOT occur.

Hence, I propose adding a field to ExecPhraseData:
typedef struct ExecPhraseData{    int         npos;           /* number of positions reported */    bool
allocated;     /* pos points to palloc'd data? */
 
+    bool        negate;         /* positions are where value is NOT */    WordEntryPos *pos;          /* ordered,
non-duplicatelexeme positions */} ExecPhraseData;
 

It's already the case that TS_phrase_execute is always responsible for
initializing this struct, so adding this field and initializing it to
false doesn't break any existing TSExecuteCallback functions (it's even
ABI-compatible because the field is going into a padding hole).  I've
worked out the following specification for the meaning of this field
given that a TSExecuteCallback function, or recursive execution of
TS_phrase_execute, has returned true or false:

if npos > 0:
func result = false, negate = false: disallowed
func result = true, negate = false: asserts that query is matched atspecified position(s) (and only those positions)
func result = false, negate = true: disallowed
func result = true, negate = true: asserts that query is matched at allpositions *except* specified position(s)

if npos == 0:
func result = false, negate = false: asserts that query is not matchedanywhere
func result = true, negate = false: query is (possibly) matched, matchingposition(s) are unknown 
func result = false, negate = true: disallowed
func result = true, negate = true: asserts that query is matched at allpositions

The negate = false cases agree with the existing semantics, so that
TSExecuteCallback functions do not need to know about the field; there
is no case where they'd need to set it to true.

Given this definition, the tsquery operators can be implemented as follows
in TS_phrase_execute:

OP_NOT:

if npos > 0 (implying its subquery returned true), invert the negateflag and return true, keeping the same list of
positions
if npos == 0, the possible cases are:subquery result    subquery's negate    return    result negatefalse        false
         true    truetrue        false            true    falsetrue        true            false    false
 
The correctness of these rules can be seen from the specification of
the flag meanings.  Notice that NOT atop NOT is a no-op, as expected.

OP_AND, OP_OR:

"not" notations here indicate that L or R input has the negate flag set:

a & b: emit positions listed in both inputs
a & !b: emit positions listed in a but not b
!a & b: emit positions listed in b but not a
!a & !b: treat as !(a | b), ie emit positions listed in either input,        then set negate flag on output
a | b: emit positions listed in either input
a | !b: treat as !(!a & b), ie emit positions listed in b but not a,        then set negate flag on output
!a | b: treat as !(a & !b), ie emit positions listed in a but not b,        then set negate flag on output
!a | !b: treat as !(a & b), ie emit positions listed in both inputs,        then set negate flag on output

AND/OR function result is always true when output negate flag is set,
else it is true if output npos > 0

OP_PHRASE:

Works like OP_AND except we allow for a possibly-nonzero offset between
L and R positions, ie we compare an R position of X to an L position of
X minus offset while deciding whether to emit position X.  Note in
particular that <0> acts *exactly* like OP_AND.

We could accept a match only if X minus offset is greater than zero, so
that "!a <-> b" doesn't match b at start of document.  But I see no way
to do the equivalent for "a <-> !b" at the end of the document, so I'm
inclined not to do that, leaving the existing semantics for these cases
alone.


The above rules for AND/OR/PHRASE work only when we have position
information for both inputs (ie, neither input returned true with npos = 0
and negate = false); otherwise we just do the dumb thing and return
true/false with npos = 0, negate = false, to indicate either "there might
be a match" or "there definitely is not a match".

It's worth noting that with these rules, phrase searches will act as
though "!x" always matches somewhere; for instance "!a <-> !b" will match
any tsvector.  I argue that this is not wrong, not even if the tsvector is
empty: there could have been adjacent stopwords matching !a and !b in the
original text.  Since we've adjusted the phrase matching rules to treat
stopwords as unknown-but-present words in a phrase, I think this is
consistent.  It's also pretty hard to assert this is wrong and at the same
time accept "!a <-> b" matching b at the start of the document.


This sounds like it would be a lot of code, but I think that all of the
AND/OR/PHRASE cases can be implemented by a single subroutine that's
told whether to emit a position depending on whether it finds that
position in both inputs/neither input/left only/right only.  That
subroutine otherwise won't be much more complicated than the existing
position-finding loop for OP_PHRASE.  I'm guessing that it'll end up
being roughly a wash once you allow for removing normalize_phrase_tree().

I haven't yet looked into point 5 (wrong GIN/GIST search results),
but I'm hopeful that removing the assumption that <-> approximates
as AND will fix it.  In any case we need to make the base tsquery
engine right before we try to fix the index approximations to it.

Thoughts, objections?  Anybody see errors in the logic?
        regards, tom lane



Re: [HACKERS] Rethinking our fulltext phrase-search implementation

От
Tom Lane
Дата:
I wrote:
> It's worth noting that with these rules, phrase searches will act as
> though "!x" always matches somewhere; for instance "!a <-> !b" will match
> any tsvector.  I argue that this is not wrong, not even if the tsvector is
> empty: there could have been adjacent stopwords matching !a and !b in the
> original text.  Since we've adjusted the phrase matching rules to treat
> stopwords as unknown-but-present words in a phrase, I think this is
> consistent.  It's also pretty hard to assert this is wrong and at the same
> time accept "!a <-> b" matching b at the start of the document.

To clarify this point, I'm imagining that the patch would include
documentation changes like the attached.

            regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 67d0c34..464ce83 100644
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*************** SELECT 'fat & rat & ! cat'::tsqu
*** 3959,3973 ****
          tsquery
  ------------------------
   'fat' & 'rat' & !'cat'
-
- SELECT '(fat | rat) <-> cat'::tsquery;
-               tsquery
- -----------------------------------
-  'fat' <-> 'cat' | 'rat' <-> 'cat'
  </programlisting>
-
-      The last example demonstrates that <type>tsquery</type> sometimes
-      rearranges nested operators into a logically equivalent formulation.
      </para>

      <para>
--- 3959,3965 ----
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 2da7595..bc33a70 100644
*** a/doc/src/sgml/textsearch.sgml
--- b/doc/src/sgml/textsearch.sgml
*************** text @@ text
*** 323,328 ****
--- 323,330 ----
      at least one of its arguments must appear, while the <literal>!</> (NOT)
      operator specifies that its argument must <emphasis>not</> appear in
      order to have a match.
+     For example, the query <literal>fat & ! rat</> matches documents that
+     contain <literal>fat</> but not <literal>rat</>.
     </para>

     <para>
*************** SELECT phraseto_tsquery('the cats ate th
*** 377,382 ****
--- 379,401 ----
      then <literal>&</literal>, then <literal><-></literal>,
      and <literal>!</literal> most tightly.
     </para>
+
+    <para>
+     It's worth noticing that the AND/OR/NOT operators mean something subtly
+     different when they are within the arguments of a FOLLOWED BY operator
+     than when they are not, because then the position of the match is
+     significant.  Normally, <literal>!x</> matches only documents that do not
+     contain <literal>x</> anywhere.  But <literal>x <-> !y</>
+     matches <literal>x</> if it is not immediately followed by <literal>y</>;
+     an occurrence of <literal>y</> elsewhere in the document does not prevent
+     a match.  Another example is that <literal>x & y</> normally only
+     requires that <literal>x</> and <literal>y</> both appear somewhere in the
+     document, but <literal>(x & y) <-> z</> requires <literal>x</>
+     and <literal>y</> to match at the same place, immediately before
+     a <literal>z</>.  Thus this query behaves differently from <literal>x
+     <-> z & y <-> z</>, which would match a document
+     containing two separate sequences <literal>x z</> and <literal>y z</>.
+    </para>
    </sect2>

    <sect2 id="textsearch-intro-configurations">

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking our fulltext phrase-search implementation

От
Tom Lane
Дата:
I wrote:
> I've been thinking about how to fix the problem Andreas Seltenreich
> reported at
> https://postgr.es/m/87eg1y2s3x.fsf@credativ.de

Attached is a proposed patch that deals with the problems discussed
here and in <26706.1482087250@sss.pgh.pa.us>.  Is anyone interested
in reviewing this, or should I just push it?

BTW, I noticed that ts_headline() seems to not behave all that nicely
for phrase searches, eg

regression=# SELECT ts_headline('simple', '1 2 3 1 3'::text, '2 <-> 3', 'ShortWord=0');
          ts_headline
--------------------------------
 1 <b>2</b> <b>3</b> 1 <b>3</b>
(1 row)

Highlighting the second "3", which is not a match, seems pretty dubious.
Negative items are even worse, they don't change the results at all:

regression=# SELECT ts_headline('simple', '1 2 3 1 3'::text, '!2 <-> 3', 'ShortWord=0');
          ts_headline
--------------------------------
 1 <b>2</b> <b>3</b> 1 <b>3</b>
(1 row)

However, the code involved seems unrelated to the present patch, and
it's also about as close to completely uncommented as I've seen anywhere
in the PG code base.  So I'm not excited about touching it.

            regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 67d0c34..464ce83 100644
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*************** SELECT 'fat & rat & ! cat'::tsqu
*** 3959,3973 ****
          tsquery
  ------------------------
   'fat' & 'rat' & !'cat'
-
- SELECT '(fat | rat) <-> cat'::tsquery;
-               tsquery
- -----------------------------------
-  'fat' <-> 'cat' | 'rat' <-> 'cat'
  </programlisting>
-
-      The last example demonstrates that <type>tsquery</type> sometimes
-      rearranges nested operators into a logically equivalent formulation.
      </para>

      <para>
--- 3959,3965 ----
diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml
index 2da7595..bc33a70 100644
*** a/doc/src/sgml/textsearch.sgml
--- b/doc/src/sgml/textsearch.sgml
*************** SELECT 'fat & cow'::tsquery @@ 'a fa
*** 264,270 ****
      text, any more than a <type>tsvector</type> is.  A <type>tsquery</type>
      contains search terms, which must be already-normalized lexemes, and
      may combine multiple terms using AND, OR, NOT, and FOLLOWED BY operators.
!     (For details see <xref linkend="datatype-tsquery">.)  There are
      functions <function>to_tsquery</>, <function>plainto_tsquery</>,
      and <function>phraseto_tsquery</>
      that are helpful in converting user-written text into a proper
--- 264,270 ----
      text, any more than a <type>tsvector</type> is.  A <type>tsquery</type>
      contains search terms, which must be already-normalized lexemes, and
      may combine multiple terms using AND, OR, NOT, and FOLLOWED BY operators.
!     (For syntax details see <xref linkend="datatype-tsquery">.)  There are
      functions <function>to_tsquery</>, <function>plainto_tsquery</>,
      and <function>phraseto_tsquery</>
      that are helpful in converting user-written text into a proper
*************** text @@ text
*** 323,328 ****
--- 323,330 ----
      at least one of its arguments must appear, while the <literal>!</> (NOT)
      operator specifies that its argument must <emphasis>not</> appear in
      order to have a match.
+     For example, the query <literal>fat & ! rat</> matches documents that
+     contain <literal>fat</> but not <literal>rat</>.
     </para>

     <para>
*************** SELECT phraseto_tsquery('the cats ate th
*** 377,382 ****
--- 379,401 ----
      then <literal>&</literal>, then <literal><-></literal>,
      and <literal>!</literal> most tightly.
     </para>
+
+    <para>
+     It's worth noticing that the AND/OR/NOT operators mean something subtly
+     different when they are within the arguments of a FOLLOWED BY operator
+     than when they are not, because then the position of the match is
+     significant.  Normally, <literal>!x</> matches only documents that do not
+     contain <literal>x</> anywhere.  But <literal>x <-> !y</>
+     matches <literal>x</> if it is not immediately followed by <literal>y</>;
+     an occurrence of <literal>y</> elsewhere in the document does not prevent
+     a match.  Another example is that <literal>x & y</> normally only
+     requires that <literal>x</> and <literal>y</> both appear somewhere in the
+     document, but <literal>(x & y) <-> z</> requires <literal>x</>
+     and <literal>y</> to match at the same place, immediately before
+     a <literal>z</>.  Thus this query behaves differently from <literal>x
+     <-> z & y <-> z</>, which would match a document
+     containing two separate sequences <literal>x z</> and <literal>y z</>.
+    </para>
    </sect2>

    <sect2 id="textsearch-intro-configurations">
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index efc111e..3e0a444 100644
*** a/src/backend/utils/adt/tsginidx.c
--- b/src/backend/utils/adt/tsginidx.c
*************** checkcondition_gin(void *checkval, Query
*** 212,218 ****
   * Evaluate tsquery boolean expression using ternary logic.
   */
  static GinTernaryValue
! TS_execute_ternary(GinChkVal *gcv, QueryItem *curitem)
  {
      GinTernaryValue val1,
                  val2,
--- 212,218 ----
   * Evaluate tsquery boolean expression using ternary logic.
   */
  static GinTernaryValue
! TS_execute_ternary(GinChkVal *gcv, QueryItem *curitem, bool in_phrase)
  {
      GinTernaryValue val1,
                  val2,
*************** TS_execute_ternary(GinChkVal *gcv, Query
*** 230,236 ****
      switch (curitem->qoperator.oper)
      {
          case OP_NOT:
!             result = TS_execute_ternary(gcv, curitem + 1);
              if (result == GIN_MAYBE)
                  return result;
              return !result;
--- 230,239 ----
      switch (curitem->qoperator.oper)
      {
          case OP_NOT:
!             /* In phrase search, always return MAYBE since we lack positions */
!             if (in_phrase)
!                 return GIN_MAYBE;
!             result = TS_execute_ternary(gcv, curitem + 1, in_phrase);
              if (result == GIN_MAYBE)
                  return result;
              return !result;
*************** TS_execute_ternary(GinChkVal *gcv, Query
*** 238,254 ****
          case OP_PHRASE:

              /*
!              * GIN doesn't contain any information about positions, treat
               * OP_PHRASE as OP_AND with recheck requirement
               */
!             *gcv->need_recheck = true;
              /* FALL THRU */

          case OP_AND:
!             val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left);
              if (val1 == GIN_FALSE)
                  return GIN_FALSE;
!             val2 = TS_execute_ternary(gcv, curitem + 1);
              if (val2 == GIN_FALSE)
                  return GIN_FALSE;
              if (val1 == GIN_TRUE && val2 == GIN_TRUE)
--- 241,261 ----
          case OP_PHRASE:

              /*
!              * GIN doesn't contain any information about positions, so treat
               * OP_PHRASE as OP_AND with recheck requirement
               */
!             *(gcv->need_recheck) = true;
!             /* Pass down in_phrase == true in case there's a NOT below */
!             in_phrase = true;
!
              /* FALL THRU */

          case OP_AND:
!             val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
!                                       in_phrase);
              if (val1 == GIN_FALSE)
                  return GIN_FALSE;
!             val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
              if (val2 == GIN_FALSE)
                  return GIN_FALSE;
              if (val1 == GIN_TRUE && val2 == GIN_TRUE)
*************** TS_execute_ternary(GinChkVal *gcv, Query
*** 257,266 ****
                  return GIN_MAYBE;

          case OP_OR:
!             val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left);
              if (val1 == GIN_TRUE)
                  return GIN_TRUE;
!             val2 = TS_execute_ternary(gcv, curitem + 1);
              if (val2 == GIN_TRUE)
                  return GIN_TRUE;
              if (val1 == GIN_FALSE && val2 == GIN_FALSE)
--- 264,274 ----
                  return GIN_MAYBE;

          case OP_OR:
!             val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
!                                       in_phrase);
              if (val1 == GIN_TRUE)
                  return GIN_TRUE;
!             val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
              if (val2 == GIN_TRUE)
                  return GIN_TRUE;
              if (val1 == GIN_FALSE && val2 == GIN_FALSE)
*************** gin_tsquery_consistent(PG_FUNCTION_ARGS)
*** 307,313 ****

          res = TS_execute(GETQUERY(query),
                           &gcv,
!                          TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_AS_AND,
                           checkcondition_gin);
      }

--- 315,321 ----

          res = TS_execute(GETQUERY(query),
                           &gcv,
!                          TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_NO_POS,
                           checkcondition_gin);
      }

*************** gin_tsquery_triconsistent(PG_FUNCTION_AR
*** 343,349 ****
          gcv.map_item_operand = (int *) (extra_data[0]);
          gcv.need_recheck = &recheck;

!         res = TS_execute_ternary(&gcv, GETQUERY(query));

          if (res == GIN_TRUE && recheck)
              res = GIN_MAYBE;
--- 351,357 ----
          gcv.map_item_operand = (int *) (extra_data[0]);
          gcv.need_recheck = &recheck;

!         res = TS_execute_ternary(&gcv, GETQUERY(query), false);

          if (res == GIN_TRUE && recheck)
              res = GIN_MAYBE;
diff --git a/src/backend/utils/adt/tsgistidx.c b/src/backend/utils/adt/tsgistidx.c
index 6cdfb13..a4c2bb9 100644
*** a/src/backend/utils/adt/tsgistidx.c
--- b/src/backend/utils/adt/tsgistidx.c
*************** gtsvector_consistent(PG_FUNCTION_ARGS)
*** 359,370 ****
          if (ISALLTRUE(key))
              PG_RETURN_BOOL(true);

!         PG_RETURN_BOOL(TS_execute(
!                                   GETQUERY(query),
                                    (void *) GETSIGN(key),
!                                   TS_EXEC_PHRASE_AS_AND,
!                                   checkcondition_bit
!                                   ));
      }
      else
      {                            /* only leaf pages */
--- 359,369 ----
          if (ISALLTRUE(key))
              PG_RETURN_BOOL(true);

!         /* since signature is lossy, cannot specify CALC_NOT here */
!         PG_RETURN_BOOL(TS_execute(GETQUERY(query),
                                    (void *) GETSIGN(key),
!                                   TS_EXEC_PHRASE_NO_POS,
!                                   checkcondition_bit));
      }
      else
      {                            /* only leaf pages */
*************** gtsvector_consistent(PG_FUNCTION_ARGS)
*** 372,383 ****

          chkval.arrb = GETARR(key);
          chkval.arre = chkval.arrb + ARRNELEM(key);
!         PG_RETURN_BOOL(TS_execute(
!                                   GETQUERY(query),
                                    (void *) &chkval,
!                                   TS_EXEC_PHRASE_AS_AND | TS_EXEC_CALC_NOT,
!                                   checkcondition_arr
!                                   ));
      }
  }

--- 371,380 ----

          chkval.arrb = GETARR(key);
          chkval.arre = chkval.arrb + ARRNELEM(key);
!         PG_RETURN_BOOL(TS_execute(GETQUERY(query),
                                    (void *) &chkval,
!                                   TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
!                                   checkcondition_arr));
      }
  }

diff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index 3d11a1c..f0bd528 100644
*** a/src/backend/utils/adt/tsquery.c
--- b/src/backend/utils/adt/tsquery.c
*************** findoprnd_recurse(QueryItem *ptr, uint32
*** 557,569 ****
                     curitem->oper == OP_OR ||
                     curitem->oper == OP_PHRASE);

-             if (curitem->oper == OP_PHRASE)
-                 *needcleanup = true;    /* push OP_PHRASE down later */
-
              (*pos)++;

              /* process RIGHT argument */
              findoprnd_recurse(ptr, pos, nnodes, needcleanup);
              curitem->left = *pos - tmp; /* set LEFT arg's offset */

              /* process LEFT argument */
--- 557,567 ----
                     curitem->oper == OP_OR ||
                     curitem->oper == OP_PHRASE);

              (*pos)++;

              /* process RIGHT argument */
              findoprnd_recurse(ptr, pos, nnodes, needcleanup);
+
              curitem->left = *pos - tmp; /* set LEFT arg's offset */

              /* process LEFT argument */
*************** findoprnd_recurse(QueryItem *ptr, uint32
*** 574,581 ****


  /*
!  * Fills in the left-fields previously left unfilled. The input
!  * QueryItems must be in polish (prefix) notation.
   */
  static void
  findoprnd(QueryItem *ptr, int size, bool *needcleanup)
--- 572,580 ----


  /*
!  * Fill in the left-fields previously left unfilled.
!  * The input QueryItems must be in polish (prefix) notation.
!  * Also, set *needcleanup to true if there are any QI_VALSTOP nodes.
   */
  static void
  findoprnd(QueryItem *ptr, int size, bool *needcleanup)
*************** parse_tsquery(char *buf,
*** 687,701 ****
      memcpy((void *) GETOPERAND(query), (void *) state.op, state.sumlen);
      pfree(state.op);

!     /* Set left operand pointers for every operator. */
      findoprnd(ptr, query->size, &needcleanup);

      /*
!      * QI_VALSTOP nodes should be cleaned and OP_PHRASE should be pushed
!      * down
       */
      if (needcleanup)
!         return cleanup_fakeval_and_phrase(query);

      return query;
  }
--- 686,702 ----
      memcpy((void *) GETOPERAND(query), (void *) state.op, state.sumlen);
      pfree(state.op);

!     /*
!      * Set left operand pointers for every operator.  While we're at it,
!      * detect whether there are any QI_VALSTOP nodes.
!      */
      findoprnd(ptr, query->size, &needcleanup);

      /*
!      * If there are QI_VALSTOP nodes, delete them and simplify the tree.
       */
      if (needcleanup)
!         query = cleanup_tsquery_stopwords(query);

      return query;
  }
*************** tsqueryrecv(PG_FUNCTION_ARGS)
*** 1088,1093 ****
--- 1089,1097 ----
       */
      findoprnd(item, size, &needcleanup);

+     /* Can't have found any QI_VALSTOP nodes */
+     Assert(!needcleanup);
+
      /* Copy operands to output struct */
      for (i = 0; i < size; i++)
      {
*************** tsqueryrecv(PG_FUNCTION_ARGS)
*** 1105,1113 ****

      SET_VARSIZE(query, len + datalen);

-     if (needcleanup)
-         PG_RETURN_TSQUERY(cleanup_fakeval_and_phrase(query));
-
      PG_RETURN_TSQUERY(query);
  }

--- 1109,1114 ----
diff --git a/src/backend/utils/adt/tsquery_cleanup.c b/src/backend/utils/adt/tsquery_cleanup.c
index 330664d..c10c7ef 100644
*** a/src/backend/utils/adt/tsquery_cleanup.c
--- b/src/backend/utils/adt/tsquery_cleanup.c
*************** typedef struct NODE
*** 26,44 ****
  } NODE;

  /*
-  * To simplify walking on query tree and pushing down of phrase operator
-  * we define some fake priority here: phrase operator has highest priority
-  * of any other operators (and we believe here that OP_PHRASE is a highest
-  * code of operations) and value node has ever highest priority.
-  * Priority values of other operations don't matter until they are less than
-  * phrase operator and value node.
-  */
- #define VALUE_PRIORITY            (OP_COUNT + 1)
- #define NODE_PRIORITY(x) \
-     ( ((x)->valnode->qoperator.type == QI_OPR) ? \
-         (x)->valnode->qoperator.oper : VALUE_PRIORITY )
-
- /*
   * make query tree from plain view of query
   */
  static NODE *
--- 26,31 ----
*************** clean_stopword_intree(NODE *node, int *l
*** 368,594 ****
      return node;
  }

- static NODE *
- copyNODE(NODE *node)
- {
-     NODE       *cnode = palloc(sizeof(NODE));
-
-     /* since this function recurses, it could be driven to stack overflow. */
-     check_stack_depth();
-
-     cnode->valnode = palloc(sizeof(QueryItem));
-     *(cnode->valnode) = *(node->valnode);
-
-     if (node->valnode->type == QI_OPR)
-     {
-         cnode->right = copyNODE(node->right);
-         if (node->valnode->qoperator.oper != OP_NOT)
-             cnode->left = copyNODE(node->left);
-     }
-
-     return cnode;
- }
-
- static NODE *
- makeNODE(int8 op, NODE *left, NODE *right)
- {
-     NODE       *node = palloc(sizeof(NODE));
-
-     /* zeroing allocation to prevent difference in unused bytes */
-     node->valnode = palloc0(sizeof(QueryItem));
-
-     node->valnode->qoperator.type = QI_OPR;
-     node->valnode->qoperator.oper = op;
-
-     node->left = left;
-     node->right = right;
-
-     return node;
- }
-
- /*
-  * Move operation with high priority to the leaves. This guarantees
-  * that the phrase operator will be near the bottom of the tree.
-  * An idea behind is do not store position of lexemes during execution
-  * of ordinary operations (AND, OR, NOT) because it could be expensive.
-  * Actual transformation will be performed only on subtrees under the
-  * <-> (<n>) operation since it's needed solely for the phrase operator.
-  *
-  * Rules:
-  *      a  <->  (b | c)    =>    (a <-> b)  |   (a <-> c)
-  *     (a | b)  <->     c       =>    (a <-> c)  |   (b <-> c)
-  *      a  <->    !b       =>        a      &  !(a <-> b)
-  *     !a  <->     b       =>        b      &  !(a <-> b)
-  *
-  * Warnings for readers:
-  *          a <-> b       !=       b <-> a
-  *
-  *      a <n> (b <n> c)    !=     (a <n> b) <n> c since the phrase lengths are:
-  *             n                    2n-1
-  */
- static NODE *
- normalize_phrase_tree(NODE *node)
- {
-     /* there should be no stop words at this point */
-     Assert(node->valnode->type != QI_VALSTOP);
-
-     if (node->valnode->type == QI_VAL)
-         return node;
-
-     /* since this function recurses, it could be driven to stack overflow. */
-     check_stack_depth();
-
-     Assert(node->valnode->type == QI_OPR);
-
-     if (node->valnode->qoperator.oper == OP_NOT)
-     {
-         NODE       *orignode = node;
-
-         /* eliminate NOT sequence */
-         while (node->valnode->type == QI_OPR &&
-         node->valnode->qoperator.oper == node->right->valnode->qoperator.oper)
-         {
-             node = node->right->right;
-         }
-
-         if (orignode != node)
-             /* current node isn't checked yet */
-             node = normalize_phrase_tree(node);
-         else
-             node->right = normalize_phrase_tree(node->right);
-     }
-     else if (node->valnode->qoperator.oper == OP_PHRASE)
-     {
-         int16        distance;
-         NODE       *X;
-
-         node->left = normalize_phrase_tree(node->left);
-         node->right = normalize_phrase_tree(node->right);
-
-         /*
-          * if subtree contains only nodes with higher "priority" then we are
-          * done. See comment near NODE_PRIORITY()
-          */
-         if (NODE_PRIORITY(node) <= NODE_PRIORITY(node->right) &&
-             NODE_PRIORITY(node) <= NODE_PRIORITY(node->left))
-             return node;
-
-         /*
-          * We can't swap left-right and works only with left child because of
-          * a <-> b    !=    b <-> a
-          */
-
-         distance = node->valnode->qoperator.distance;
-
-         if (node->right->valnode->type == QI_OPR)
-         {
-             switch (node->right->valnode->qoperator.oper)
-             {
-                 case OP_AND:
-                     /* a <-> (b & c)  =>  (a <-> b) & (a <-> c) */
-                     node = makeNODE(OP_AND,
-                                     makeNODE(OP_PHRASE,
-                                              node->left,
-                                              node->right->left),
-                                     makeNODE(OP_PHRASE,
-                                              copyNODE(node->left),
-                                              node->right->right));
-                     node->left->valnode->qoperator.distance =
-                         node->right->valnode->qoperator.distance = distance;
-                     break;
-                 case OP_OR:
-                     /* a <-> (b | c)  =>  (a <-> b) | (a <-> c) */
-                     node = makeNODE(OP_OR,
-                                     makeNODE(OP_PHRASE,
-                                              node->left,
-                                              node->right->left),
-                                     makeNODE(OP_PHRASE,
-                                              copyNODE(node->left),
-                                              node->right->right));
-                     node->left->valnode->qoperator.distance =
-                         node->right->valnode->qoperator.distance = distance;
-                     break;
-                 case OP_NOT:
-                     /* a <-> !b  =>  a & !(a <-> b) */
-                     X = node->right;
-                     node->right = node->right->right;
-                     X->right = node;
-                     node = makeNODE(OP_AND,
-                                     copyNODE(node->left),
-                                     X);
-                     break;
-                 case OP_PHRASE:
-                     /* no-op */
-                     break;
-                 default:
-                     elog(ERROR, "Wrong type of tsquery node: %d",
-                          node->right->valnode->qoperator.oper);
-             }
-         }
-
-         if (node->left->valnode->type == QI_OPR &&
-             node->valnode->qoperator.oper == OP_PHRASE)
-         {
-             /*
-              * if the node is still OP_PHRASE, check the left subtree,
-              * otherwise the whole node will be transformed later.
-              */
-             switch (node->left->valnode->qoperator.oper)
-             {
-                 case OP_AND:
-                     /* (a & b) <-> c  =>  (a <-> c) & (b <-> c) */
-                     node = makeNODE(OP_AND,
-                                     makeNODE(OP_PHRASE,
-                                              node->left->left,
-                                              node->right),
-                                     makeNODE(OP_PHRASE,
-                                              node->left->right,
-                                              copyNODE(node->right)));
-                     node->left->valnode->qoperator.distance =
-                         node->right->valnode->qoperator.distance = distance;
-                     break;
-                 case OP_OR:
-                     /* (a | b) <-> c  =>  (a <-> c) | (b <-> c) */
-                     node = makeNODE(OP_OR,
-                                     makeNODE(OP_PHRASE,
-                                              node->left->left,
-                                              node->right),
-                                     makeNODE(OP_PHRASE,
-                                              node->left->right,
-                                              copyNODE(node->right)));
-                     node->left->valnode->qoperator.distance =
-                         node->right->valnode->qoperator.distance = distance;
-                     break;
-                 case OP_NOT:
-                     /* !a <-> b  =>  b & !(a <-> b) */
-                     X = node->left;
-                     node->left = node->left->right;
-                     X->right = node;
-                     node = makeNODE(OP_AND,
-                                     X,
-                                     copyNODE(node->right));
-                     break;
-                 case OP_PHRASE:
-                     /* no-op */
-                     break;
-                 default:
-                     elog(ERROR, "Wrong type of tsquery node: %d",
-                          node->left->valnode->qoperator.oper);
-             }
-         }
-
-         /* continue transformation */
-         node = normalize_phrase_tree(node);
-     }
-     else    /* AND or OR */
-     {
-         node->left = normalize_phrase_tree(node->left);
-         node->right = normalize_phrase_tree(node->right);
-     }
-
-     return node;
- }
-
  /*
   * Number of elements in query tree
   */
--- 355,360 ----
*************** calcstrlen(NODE *node)
*** 613,620 ****
      return size;
  }

  TSQuery
! cleanup_fakeval_and_phrase(TSQuery in)
  {
      int32        len,
                  lenstr,
--- 379,389 ----
      return size;
  }

+ /*
+  * Remove QI_VALSTOP (stopword) nodes from TSQuery.
+  */
  TSQuery
! cleanup_tsquery_stopwords(TSQuery in)
  {
      int32        len,
                  lenstr,
*************** cleanup_fakeval_and_phrase(TSQuery in)
*** 642,650 ****
          return out;
      }

-     /* push OP_PHRASE nodes down */
-     root = normalize_phrase_tree(root);
-
      /*
       * Build TSQuery from plain view
       */
--- 411,416 ----
diff --git a/src/backend/utils/adt/tsquery_op.c b/src/backend/utils/adt/tsquery_op.c
index a574b4b..8f90ce9 100644
*** a/src/backend/utils/adt/tsquery_op.c
--- b/src/backend/utils/adt/tsquery_op.c
*************** tsquery_or(PG_FUNCTION_ARGS)
*** 104,110 ****
      PG_FREE_IF_COPY(a, 0);
      PG_FREE_IF_COPY(b, 1);

!     PG_RETURN_POINTER(query);
  }

  Datum
--- 104,110 ----
      PG_FREE_IF_COPY(a, 0);
      PG_FREE_IF_COPY(b, 1);

!     PG_RETURN_TSQUERY(query);
  }

  Datum
*************** tsquery_phrase_distance(PG_FUNCTION_ARGS
*** 140,146 ****
      PG_FREE_IF_COPY(a, 0);
      PG_FREE_IF_COPY(b, 1);

!     PG_RETURN_POINTER(cleanup_fakeval_and_phrase(query));
  }

  Datum
--- 140,146 ----
      PG_FREE_IF_COPY(a, 0);
      PG_FREE_IF_COPY(b, 1);

!     PG_RETURN_TSQUERY(query);
  }

  Datum
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 36cc10c..a2272fe 100644
*** a/src/backend/utils/adt/tsvector_op.c
--- b/src/backend/utils/adt/tsvector_op.c
***************
*** 11,19 ****
   *
   *-------------------------------------------------------------------------
   */
-
  #include "postgres.h"

  #include "access/htup_details.h"
  #include "catalog/namespace.h"
  #include "catalog/pg_type.h"
--- 11,20 ----
   *
   *-------------------------------------------------------------------------
   */
  #include "postgres.h"

+ #include <limits.h>
+
  #include "access/htup_details.h"
  #include "catalog/namespace.h"
  #include "catalog/pg_type.h"
*************** checkcondition_str(void *checkval, Query
*** 1405,1551 ****
  }

  /*
   * Execute tsquery at or below an OP_PHRASE operator.
   *
!  * This handles the recursion at levels where we need to care about
!  * match locations.  In addition to the same arguments used for TS_execute,
!  * the caller may pass a preinitialized-to-zeroes ExecPhraseData struct to
!  * be filled with lexeme match positions on success.  data == NULL if no
!  * match data need be returned.  (In practice, outside callers pass NULL,
!  * and only the internal recursion cases pass a data pointer.)
   */
  static bool
  TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
!                   ExecPhraseData *data,
!                   TSExecuteCallback chkcond)
  {
      /* since this function recurses, it could be driven to stack overflow */
      check_stack_depth();

      if (curitem->type == QI_VAL)
-     {
          return chkcond(arg, (QueryOperand *) curitem, data);
-     }
-     else
-     {
-         ExecPhraseData Ldata = {0, false, NULL},
-                     Rdata = {0, false, NULL};
-         WordEntryPos *Lpos,
-                    *LposStart,
-                    *Rpos,
-                    *pos_iter = NULL;
-
-         Assert(curitem->qoperator.oper == OP_PHRASE);
-
-         if (!TS_phrase_execute(curitem + curitem->qoperator.left,
-                                arg, flags, &Ldata, chkcond))
-             return false;

!         if (!TS_phrase_execute(curitem + 1, arg, flags, &Rdata, chkcond))
!             return false;

!         /*
!          * If either operand has no position information, then we normally
!          * return false.  But if TS_EXEC_PHRASE_AS_AND flag is set then we
!          * return true, treating OP_PHRASE as if it were OP_AND.
!          */
!         if (Ldata.npos == 0 || Rdata.npos == 0)
!             return (flags & TS_EXEC_PHRASE_AS_AND) ? true : false;

-         /*
-          * Prepare output position array if needed.
-          */
-         if (data)
-         {
              /*
!              * We can recycle the righthand operand's result array if it was
!              * palloc'd, else must allocate our own.  The number of matches
!              * couldn't be more than the smaller of the two operands' matches.
               */
!             if (!Rdata.allocated)
!                 data->pos = palloc(sizeof(WordEntryPos) * Min(Ldata.npos, Rdata.npos));
              else
!                 data->pos = Rdata.pos;
!
!             data->allocated = true;
!             data->npos = 0;
!             pos_iter = data->pos;
!         }

!         /*
!          * Find matches by distance.  WEP_GETPOS() is needed because
!          * ExecPhraseData->data can point to a tsvector's WordEntryPosVector.
!          *
!          * Note that the output positions are those of the matching RIGHT
!          * operands.
!          */
!         Rpos = Rdata.pos;
!         LposStart = Ldata.pos;
!         while (Rpos < Rdata.pos + Rdata.npos)
!         {
!             /*
!              * We need to check all possible distances, so reset Lpos to
!              * guaranteed not yet satisfied position.
!              */
!             Lpos = LposStart;
!             while (Lpos < Ldata.pos + Ldata.npos)
              {
!                 if (WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) ==
!                     curitem->qoperator.distance)
!                 {
!                     /* MATCH! */
!                     if (data)
!                     {
!                         /* Store position for upper phrase operator */
!                         *pos_iter = WEP_GETPOS(*Rpos);
!                         pos_iter++;

!                         /*
!                          * Set left start position to next, because current
!                          * one could not satisfy distance for any other right
!                          * position
!                          */
!                         LposStart = Lpos + 1;
!                         break;
!                     }
!                     else
!                     {
!                         /*
!                          * We are at the root of the phrase tree and hence we
!                          * don't have to identify all the match positions.
!                          * Just report success.
!                          */
!                         return true;
!                     }

                  }
!                 else if (WEP_GETPOS(*Rpos) <= WEP_GETPOS(*Lpos) ||
!                          WEP_GETPOS(*Rpos) - WEP_GETPOS(*Lpos) <
!                          curitem->qoperator.distance)
                  {
                      /*
!                      * Go to the next Rpos, because Lpos is ahead or on less
!                      * distance than required by current operator
                       */
!                     break;
!
                  }

!                 Lpos++;
              }

!             Rpos++;
!         }

!         if (data)
!         {
!             data->npos = pos_iter - data->pos;

!             if (data->npos > 0)
!                 return true;
!         }
      }

      return false;
  }

--- 1406,1807 ----
  }

  /*
+  * Compute output position list for a tsquery operator in phrase mode.
+  *
+  * Merge the position lists in Ldata and Rdata as specified by "emit",
+  * returning the result list into *data.  The input position lists must be
+  * sorted and unique, and the output will be as well.
+  *
+  * data: pointer to initially-all-zeroes output struct, or NULL
+  * Ldata, Rdata: input position lists
+  * emit: bitmask of TSPO_XXX flags
+  * Loffset: offset to be added to Ldata positions before comparing/outputting
+  * Roffset: offset to be added to Rdata positions before comparing/outputting
+  * max_npos: maximum possible required size of output position array
+  *
+  * Loffset and Roffset should not be negative, else we risk trying to output
+  * negative positions, which won't fit into WordEntryPos.
+  *
+  * Returns true if any positions were emitted to *data; or if data is NULL,
+  * returns true if any positions would have been emitted.
+  */
+ #define TSPO_L_ONLY        0x01    /* emit positions appearing only in L */
+ #define TSPO_R_ONLY        0x02    /* emit positions appearing only in R */
+ #define TSPO_BOTH        0x04    /* emit positions appearing in both L&R */
+
+ static bool
+ TS_phrase_output(ExecPhraseData *data,
+                  ExecPhraseData *Ldata,
+                  ExecPhraseData *Rdata,
+                  int emit,
+                  int Loffset,
+                  int Roffset,
+                  int max_npos)
+ {
+     int            Lindex,
+                 Rindex;
+
+     /* Loop until both inputs are exhausted */
+     Lindex = Rindex = 0;
+     while (Lindex < Ldata->npos || Rindex < Rdata->npos)
+     {
+         int            Lpos,
+                     Rpos;
+         int            output_pos = 0;
+
+         /*
+          * Fetch current values to compare.  WEP_GETPOS() is needed because
+          * ExecPhraseData->data can point to a tsvector's WordEntryPosVector.
+          */
+         if (Lindex < Ldata->npos)
+             Lpos = WEP_GETPOS(Ldata->pos[Lindex]) + Loffset;
+         else
+         {
+             /* L array exhausted, so we're done if R_ONLY isn't set */
+             if (!(emit & TSPO_R_ONLY))
+                 break;
+             Lpos = INT_MAX;
+         }
+         if (Rindex < Rdata->npos)
+             Rpos = WEP_GETPOS(Rdata->pos[Rindex]) + Roffset;
+         else
+         {
+             /* R array exhausted, so we're done if L_ONLY isn't set */
+             if (!(emit & TSPO_L_ONLY))
+                 break;
+             Rpos = INT_MAX;
+         }
+
+         /* Merge-join the two input lists */
+         if (Lpos < Rpos)
+         {
+             /* Lpos is not matched in Rdata, should we output it? */
+             if (emit & TSPO_L_ONLY)
+                 output_pos = Lpos;
+             Lindex++;
+         }
+         else if (Lpos == Rpos)
+         {
+             /* Lpos and Rpos match ... should we output it? */
+             if (emit & TSPO_BOTH)
+                 output_pos = Rpos;
+             Lindex++;
+             Rindex++;
+         }
+         else    /* Lpos > Rpos */
+         {
+             /* Rpos is not matched in Ldata, should we output it? */
+             if (emit & TSPO_R_ONLY)
+                 output_pos = Rpos;
+             Rindex++;
+         }
+
+         if (output_pos > 0)
+         {
+             if (data)
+             {
+                 /* Store position, first allocating output array if needed */
+                 if (data->pos == NULL)
+                 {
+                     data->pos = (WordEntryPos *)
+                         palloc(max_npos * sizeof(WordEntryPos));
+                     data->allocated = true;
+                 }
+                 data->pos[data->npos++] = output_pos;
+             }
+             else
+             {
+                 /*
+                  * Exact positions not needed, so return true as soon as we
+                  * know there is at least one.
+                  */
+                 return true;
+             }
+         }
+     }
+
+     if (data && data->npos > 0)
+     {
+         /* Let's assert we didn't overrun the array */
+         Assert(data->npos <= max_npos);
+         return true;
+     }
+     return false;
+ }
+
+ /*
   * Execute tsquery at or below an OP_PHRASE operator.
   *
!  * This handles tsquery execution at recursion levels where we need to care
!  * about match locations.
!  *
!  * In addition to the same arguments used for TS_execute, the caller may pass
!  * a preinitialized-to-zeroes ExecPhraseData struct, to be filled with lexeme
!  * match position info on success.  data == NULL if no position data need be
!  * returned.  (In practice, outside callers pass NULL, and only the internal
!  * recursion cases pass a data pointer.)
!  *
!  * The detailed semantics of the match data, given that the function returned
!  * "true" (successful match, or possible match), are:
!  *
!  * npos > 0, negate = false:
!  *     query is matched at specified position(s) (and only those positions)
!  * npos > 0, negate = true:
!  *     query is matched at all positions *except* specified position(s)
!  * npos = 0, negate = false:
!  *     query is possibly matched, matching position(s) are unknown
!  *     (this should only be returned when TS_EXEC_PHRASE_NO_POS flag is set)
!  * npos = 0, negate = true:
!  *     query is matched at all positions
!  *
!  * Successful matches also return a "width" value which is the match width in
!  * lexemes, less one.  Hence, "width" is zero for simple one-lexeme matches,
!  * and is the sum of the phrase operator distances for phrase matches.  Note
!  * that when width > 0, the listed positions represent the ends of matches not
!  * the starts.  (This unintuitive rule is needed to avoid possibly generating
!  * negative positions, which wouldn't fit into the WordEntryPos arrays.)
!  *
!  * When the function returns "false" (no match), it must return npos = 0,
!  * negate = false (which is the state initialized by the caller); but the
!  * "width" output in such cases is undefined.
   */
  static bool
  TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
!                   TSExecuteCallback chkcond,
!                   ExecPhraseData *data)
  {
+     int            Loffset,
+                 Roffset,
+                 maxwidth;
+
      /* since this function recurses, it could be driven to stack overflow */
      check_stack_depth();

      if (curitem->type == QI_VAL)
          return chkcond(arg, (QueryOperand *) curitem, data);

!     /* Note: we assume data != NULL for operators other than OP_PHRASE */

!     switch (curitem->qoperator.oper)
!     {
!         case OP_NOT:

              /*
!              * Because a "true" result with no specific positions is taken as
!              * uncertain, we need no special care here for !TS_EXEC_CALC_NOT.
!              * If it's a false positive, the right things happen anyway.
!              *
!              * Also, we need not touch data->width, since a NOT operation does
!              * not change the match width.
               */
!             if (TS_phrase_execute(curitem + 1, arg, flags, chkcond, data))
!             {
!                 if (data->npos > 0)
!                 {
!                     /* we have some positions, invert negate flag */
!                     data->negate = !data->negate;
!                     return true;
!                 }
!                 else if (data->negate)
!                 {
!                     /* change "match everywhere" to "match nowhere" */
!                     data->negate = false;
!                     return false;
!                 }
!                 /* match positions are, and remain, uncertain */
!                 return true;
!             }
              else
!             {
!                 /* change "match nowhere" to "match everywhere" */
!                 Assert(data->npos == 0 && !data->negate);
!                 data->negate = true;
!                 return true;
!             }

!         case OP_PHRASE:
!         case OP_AND:
              {
!                 ExecPhraseData Ldata,
!                             Rdata;

!                 memset(&Ldata, 0, sizeof(Ldata));
!                 memset(&Rdata, 0, sizeof(Rdata));
!
!                 if (!TS_phrase_execute(curitem + curitem->qoperator.left,
!                                        arg, flags, chkcond, &Ldata))
!                     return false;
!
!                 if (!TS_phrase_execute(curitem + 1,
!                                        arg, flags, chkcond, &Rdata))
!                     return false;

+                 /*
+                  * If either operand has no position information, then we
+                  * can't return position data, only a "possible match" result.
+                  * "Possible match" answers are only wanted when
+                  * TS_EXEC_PHRASE_NO_POS flag is set, otherwise return false.
+                  */
+                 if ((Ldata.npos == 0 && !Ldata.negate) ||
+                     (Rdata.npos == 0 && !Rdata.negate))
+                     return (flags & TS_EXEC_PHRASE_NO_POS) ? true : false;
+
+                 if (curitem->qoperator.oper == OP_PHRASE)
+                 {
+                     /*
+                      * Compute Loffset and Roffset suitable for phrase match,
+                      * and compute overall width of whole phrase match.
+                      */
+                     Loffset = curitem->qoperator.distance + Rdata.width;
+                     Roffset = 0;
+                     if (data)
+                         data->width = curitem->qoperator.distance +
+                             Ldata.width + Rdata.width;
                  }
!                 else
                  {
                      /*
!                      * For OP_AND, set output width and alignment like OP_OR
!                      * (see comment below)
                       */
!                     maxwidth = Max(Ldata.width, Rdata.width);
!                     Loffset = maxwidth - Ldata.width;
!                     Roffset = maxwidth - Rdata.width;
!                     if (data)
!                         data->width = maxwidth;
                  }

!                 if (Ldata.negate && Rdata.negate)
!                 {
!                     /* !L & !R: treat as !(L | R) */
!                     (void) TS_phrase_output(data, &Ldata, &Rdata,
!                                        TSPO_BOTH | TSPO_L_ONLY | TSPO_R_ONLY,
!                                             Loffset, Roffset,
!                                             Ldata.npos + Rdata.npos);
!                     if (data)
!                         data->negate = true;
!                     return true;
!                 }
!                 else if (Ldata.negate)
!                 {
!                     /* !L & R */
!                     return TS_phrase_output(data, &Ldata, &Rdata,
!                                             TSPO_R_ONLY,
!                                             Loffset, Roffset,
!                                             Rdata.npos);
!                 }
!                 else if (Rdata.negate)
!                 {
!                     /* L & !R */
!                     return TS_phrase_output(data, &Ldata, &Rdata,
!                                             TSPO_L_ONLY,
!                                             Loffset, Roffset,
!                                             Ldata.npos);
!                 }
!                 else
!                 {
!                     /* straight AND */
!                     return TS_phrase_output(data, &Ldata, &Rdata,
!                                             TSPO_BOTH,
!                                             Loffset, Roffset,
!                                             Min(Ldata.npos, Rdata.npos));
!                 }
              }

!         case OP_OR:
!             {
!                 ExecPhraseData Ldata,
!                             Rdata;
!                 bool        lmatch,
!                             rmatch;

!                 memset(&Ldata, 0, sizeof(Ldata));
!                 memset(&Rdata, 0, sizeof(Rdata));

!                 lmatch = TS_phrase_execute(curitem + curitem->qoperator.left,
!                                            arg, flags, chkcond, &Ldata);
!                 rmatch = TS_phrase_execute(curitem + 1,
!                                            arg, flags, chkcond, &Rdata);
!
!                 if (!lmatch && !rmatch)
!                     return false;
!
!                 /*
!                  * If a valid operand has no position information, then we
!                  * can't return position data, only a "possible match" result.
!                  * "Possible match" answers are only wanted when
!                  * TS_EXEC_PHRASE_NO_POS flag is set, otherwise return false.
!                  */
!                 if ((lmatch && Ldata.npos == 0 && !Ldata.negate) ||
!                     (rmatch && Rdata.npos == 0 && !Rdata.negate))
!                     return (flags & TS_EXEC_PHRASE_NO_POS) ? true : false;
!
!                 /*
!                  * Cope with undefined output width from failed submatch.
!                  * (This takes less code than trying to ensure that all
!                  * failure returns set data->width to zero.)
!                  */
!                 if (!lmatch)
!                     Ldata.width = 0;
!                 if (!rmatch)
!                     Rdata.width = 0;
!
!                 /*
!                  * For OP_AND and OP_OR, report the width of the wider of the
!                  * two inputs, and align the narrower input's positions to the
!                  * right end of that width.  This rule deals at least somewhat
!                  * reasonably with cases like "x <-> (y | z <-> q)".
!                  */
!                 maxwidth = Max(Ldata.width, Rdata.width);
!                 Loffset = maxwidth - Ldata.width;
!                 Roffset = maxwidth - Rdata.width;
!                 data->width = maxwidth;
!
!                 if (Ldata.negate && Rdata.negate)
!                 {
!                     /* !L | !R: treat as !(L & R) */
!                     (void) TS_phrase_output(data, &Ldata, &Rdata,
!                                             TSPO_BOTH,
!                                             Loffset, Roffset,
!                                             Min(Ldata.npos, Rdata.npos));
!                     data->negate = true;
!                     return true;
!                 }
!                 else if (Ldata.negate)
!                 {
!                     /* !L | R: treat as !(L & !R) */
!                     (void) TS_phrase_output(data, &Ldata, &Rdata,
!                                             TSPO_L_ONLY,
!                                             Loffset, Roffset,
!                                             Ldata.npos);
!                     data->negate = true;
!                     return true;
!                 }
!                 else if (Rdata.negate)
!                 {
!                     /* L | !R: treat as !(!L & R) */
!                     (void) TS_phrase_output(data, &Ldata, &Rdata,
!                                             TSPO_R_ONLY,
!                                             Loffset, Roffset,
!                                             Rdata.npos);
!                     data->negate = true;
!                     return true;
!                 }
!                 else
!                 {
!                     /* straight OR */
!                     return TS_phrase_output(data, &Ldata, &Rdata,
!                                        TSPO_BOTH | TSPO_L_ONLY | TSPO_R_ONLY,
!                                             Loffset, Roffset,
!                                             Ldata.npos + Rdata.npos);
!                 }
!             }
!
!         default:
!             elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
      }

+     /* not reachable, but keep compiler quiet */
      return false;
  }

*************** TS_execute(QueryItem *curitem, void *arg
*** 1594,1605 ****
                  return TS_execute(curitem + 1, arg, flags, chkcond);

          case OP_PHRASE:
!
!             /*
!              * do not check TS_EXEC_PHRASE_AS_AND here because chkcond() could
!              * do something more if it's called from TS_phrase_execute()
!              */
!             return TS_phrase_execute(curitem, arg, flags, NULL, chkcond);

          default:
              elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
--- 1850,1856 ----
                  return TS_execute(curitem + 1, arg, flags, chkcond);

          case OP_PHRASE:
!             return TS_phrase_execute(curitem, arg, flags, chkcond, NULL);

          default:
              elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index 1fbd983..d74853b 100644
*** a/src/include/tsearch/ts_utils.h
--- b/src/include/tsearch/ts_utils.h
*************** extern text *generateHeadline(HeadlinePa
*** 113,120 ****
   * struct ExecPhraseData is passed to a TSExecuteCallback function if we need
   * lexeme position data (because of a phrase-match operator in the tsquery).
   * The callback should fill in position data when it returns true (success).
!  * If it cannot return position data, it may ignore its "data" argument, but
!  * then the caller of TS_execute() must pass the TS_EXEC_PHRASE_AS_AND flag
   * and must arrange for a later recheck with position data available.
   *
   * The reported lexeme positions must be sorted and unique.  Callers must only
--- 113,120 ----
   * struct ExecPhraseData is passed to a TSExecuteCallback function if we need
   * lexeme position data (because of a phrase-match operator in the tsquery).
   * The callback should fill in position data when it returns true (success).
!  * If it cannot return position data, it may leave "data" unchanged, but
!  * then the caller of TS_execute() must pass the TS_EXEC_PHRASE_NO_POS flag
   * and must arrange for a later recheck with position data available.
   *
   * The reported lexeme positions must be sorted and unique.  Callers must only
*************** extern text *generateHeadline(HeadlinePa
*** 123,135 ****
--- 123,143 ----
   * portion of a tsvector value.  If "allocated" is true then the pos array
   * is palloc'd workspace and caller may free it when done.
   *
+  * "negate" means that the pos array contains positions where the query does
+  * not match, rather than positions where it does.  "width" is positive when
+  * the match is wider than one lexeme.  Neither of these fields normally need
+  * to be touched by TSExecuteCallback functions; they are used for
+  * phrase-search processing within TS_execute.
+  *
   * All fields of the ExecPhraseData struct are initially zeroed by caller.
   */
  typedef struct ExecPhraseData
  {
      int            npos;            /* number of positions reported */
      bool        allocated;        /* pos points to palloc'd data? */
+     bool        negate;            /* positions are where query is NOT matched */
      WordEntryPos *pos;            /* ordered, non-duplicate lexeme positions */
+     int            width;            /* width of match in lexemes, less 1 */
  } ExecPhraseData;

  /*
*************** typedef struct ExecPhraseData
*** 139,145 ****
   * val: lexeme to test for presence of
   * data: to be filled with lexeme positions; NULL if position data not needed
   *
!  * Return TRUE if lexeme is present in data, else FALSE
   */
  typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
                                                 ExecPhraseData *data);
--- 147,155 ----
   * val: lexeme to test for presence of
   * data: to be filled with lexeme positions; NULL if position data not needed
   *
!  * Return TRUE if lexeme is present in data, else FALSE.  If data is not
!  * NULL, it should be filled with lexeme positions, but function can leave
!  * it as zeroes if position data is not available.
   */
  typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
                                                 ExecPhraseData *data);
*************** typedef bool (*TSExecuteCallback) (void
*** 151,165 ****
  /*
   * If TS_EXEC_CALC_NOT is not set, then NOT expressions are automatically
   * evaluated to be true.  Useful in cases where NOT cannot be accurately
!  * computed (GiST) or it isn't important (ranking).
   */
  #define TS_EXEC_CALC_NOT        (0x01)
  /*
!  * Treat OP_PHRASE as OP_AND.  Used when positional information is not
!  * accessible, like in consistent methods of GIN/GiST indexes; rechecking
!  * must occur later.
   */
! #define TS_EXEC_PHRASE_AS_AND    (0x02)

  extern bool TS_execute(QueryItem *curitem, void *arg, uint32 flags,
             TSExecuteCallback chkcond);
--- 161,180 ----
  /*
   * If TS_EXEC_CALC_NOT is not set, then NOT expressions are automatically
   * evaluated to be true.  Useful in cases where NOT cannot be accurately
!  * computed (GiST) or it isn't important (ranking).  From TS_execute's
!  * perspective, !CALC_NOT means that the TSExecuteCallback function might
!  * return false-positive indications of a lexeme's presence.
   */
  #define TS_EXEC_CALC_NOT        (0x01)
  /*
!  * If TS_EXEC_PHRASE_NO_POS is set, allow OP_PHRASE to be executed lossily
!  * in the absence of position information: a TRUE result indicates that the
!  * phrase might be present.  Without this flag, OP_PHRASE always returns
!  * false if lexeme position information is not available.
   */
! #define TS_EXEC_PHRASE_NO_POS    (0x02)
! /* Obsolete spelling of TS_EXEC_PHRASE_NO_POS: */
! #define TS_EXEC_PHRASE_AS_AND    TS_EXEC_PHRASE_NO_POS

  extern bool TS_execute(QueryItem *curitem, void *arg, uint32 flags,
             TSExecuteCallback chkcond);
*************** extern Datum gin_tsquery_consistent_olds
*** 228,234 ****
   * TSQuery Utilities
   */
  extern QueryItem *clean_NOT(QueryItem *ptr, int32 *len);
! extern TSQuery cleanup_fakeval_and_phrase(TSQuery in);

  typedef struct QTNode
  {
--- 243,249 ----
   * TSQuery Utilities
   */
  extern QueryItem *clean_NOT(QueryItem *ptr, int32 *len);
! extern TSQuery cleanup_tsquery_stopwords(TSQuery in);

  typedef struct QTNode
  {
diff --git a/src/test/regress/expected/tsdicts.out b/src/test/regress/expected/tsdicts.out
index c55591a..8ed64d3 100644
*** a/src/test/regress/expected/tsdicts.out
--- b/src/test/regress/expected/tsdicts.out
*************** SELECT to_tsquery('hunspell_tst', 'footb
*** 470,484 ****
  (1 row)

  SELECT to_tsquery('hunspell_tst', 'footballyklubber:b <-> sky');
!                            to_tsquery
! -----------------------------------------------------------------
!  'foot':B <-> 'sky' & 'ball':B <-> 'sky' & 'klubber':B <-> 'sky'
  (1 row)

  SELECT phraseto_tsquery('hunspell_tst', 'footballyklubber sky');
!                      phraseto_tsquery
! -----------------------------------------------------------
!  'foot' <-> 'sky' & 'ball' <-> 'sky' & 'klubber' <-> 'sky'
  (1 row)

  -- Test ispell dictionary with hunspell affix with FLAG long in configuration
--- 470,484 ----
  (1 row)

  SELECT to_tsquery('hunspell_tst', 'footballyklubber:b <-> sky');
!                    to_tsquery
! -------------------------------------------------
!  ( 'foot':B & 'ball':B & 'klubber':B ) <-> 'sky'
  (1 row)

  SELECT phraseto_tsquery('hunspell_tst', 'footballyklubber sky');
!              phraseto_tsquery
! -------------------------------------------
!  ( 'foot' & 'ball' & 'klubber' ) <-> 'sky'
  (1 row)

  -- Test ispell dictionary with hunspell affix with FLAG long in configuration
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index cf3beb3..0681d43 100644
*** a/src/test/regress/expected/tsearch.out
--- b/src/test/regress/expected/tsearch.out
*************** SELECT plainto_tsquery('english', 'foo b
*** 556,570 ****

  -- Check stop word deletion, a and s are stop-words
  SELECT to_tsquery('english', '!(a & !b) & c');
!  to_tsquery
! ------------
!  'b' & 'c'
  (1 row)

  SELECT to_tsquery('english', '!(a & !b)');
   to_tsquery
  ------------
!  'b'
  (1 row)

  SELECT to_tsquery('english', '(1 <-> 2) <-> a');
--- 556,570 ----

  -- Check stop word deletion, a and s are stop-words
  SELECT to_tsquery('english', '!(a & !b) & c');
!  to_tsquery
! -------------
!  !!'b' & 'c'
  (1 row)

  SELECT to_tsquery('english', '!(a & !b)');
   to_tsquery
  ------------
!  !!'b'
  (1 row)

  SELECT to_tsquery('english', '(1 <-> 2) <-> a');
*************** SELECT ts_rewrite('1 & (2 <2> 3)', 'SELE
*** 1240,1254 ****
  (1 row)

  SELECT ts_rewrite('5 <-> (1 & (2 <-> 3))', 'SELECT keyword, sample FROM test_tsquery'::text );
!               ts_rewrite
! ---------------------------------------
!  '5' <-> '1' & '5' <-> ( '2' <-> '3' )
  (1 row)

  SELECT ts_rewrite('5 <-> (6 | 8)', 'SELECT keyword, sample FROM test_tsquery'::text );
!         ts_rewrite
! ---------------------------
!  '5' <-> '7' | '5' <-> '8'
  (1 row)

  -- Check empty substitution
--- 1240,1254 ----
  (1 row)

  SELECT ts_rewrite('5 <-> (1 & (2 <-> 3))', 'SELECT keyword, sample FROM test_tsquery'::text );
!        ts_rewrite
! -------------------------
!  '5' <-> ( '2' <-> '4' )
  (1 row)

  SELECT ts_rewrite('5 <-> (6 | 8)', 'SELECT keyword, sample FROM test_tsquery'::text );
!       ts_rewrite
! -----------------------
!  '5' <-> ( '6' | '8' )
  (1 row)

  -- Check empty substitution
*************** SELECT ts_rewrite( query, 'SELECT keywor
*** 1386,1391 ****
--- 1386,1411 ----
   'citi' & 'foo' & ( 'bar' | 'qq' ) & ( 'nyc' | 'big' & 'appl' | 'new' & 'york' )
  (1 row)

+ SELECT ts_rewrite(tsquery_phrase('foo', 'foo'), 'foo', 'bar | baz');
+                ts_rewrite
+ -----------------------------------------
+  ( 'bar' | 'baz' ) <-> ( 'bar' | 'baz' )
+ (1 row)
+
+ SELECT to_tsvector('foo bar') @@
+   ts_rewrite(tsquery_phrase('foo', 'foo'), 'foo', 'bar | baz');
+  ?column?
+ ----------
+  f
+ (1 row)
+
+ SELECT to_tsvector('bar baz') @@
+   ts_rewrite(tsquery_phrase('foo', 'foo'), 'foo', 'bar | baz');
+  ?column?
+ ----------
+  t
+ (1 row)
+
  RESET enable_seqscan;
  --test GUC
  SET default_text_search_config=simple;
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index 8d9290c..d107001 100644
*** a/src/test/regress/expected/tstypes.out
--- b/src/test/regress/expected/tstypes.out
*************** SELECT '!!a & !!b'::tsquery;
*** 366,498 ****
   !!'a' & !!'b'
  (1 row)

- -- phrase transformation
- SELECT 'a <-> (b|c)'::tsquery;
-           tsquery
- ---------------------------
-  'a' <-> 'b' | 'a' <-> 'c'
- (1 row)
-
- SELECT '(a|b) <-> c'::tsquery;
-           tsquery
- ---------------------------
-  'a' <-> 'c' | 'b' <-> 'c'
- (1 row)
-
- SELECT '(a|b) <-> (d|c)'::tsquery;
-                         tsquery
- -------------------------------------------------------
-  'a' <-> 'd' | 'b' <-> 'd' | 'a' <-> 'c' | 'b' <-> 'c'
- (1 row)
-
- SELECT 'a <-> (b&c)'::tsquery;
-           tsquery
- ---------------------------
-  'a' <-> 'b' & 'a' <-> 'c'
- (1 row)
-
- SELECT '(a&b) <-> c'::tsquery;
-           tsquery
- ---------------------------
-  'a' <-> 'c' & 'b' <-> 'c'
- (1 row)
-
- SELECT '(a&b) <-> (d&c)'::tsquery;
-                         tsquery
- -------------------------------------------------------
-  'a' <-> 'd' & 'b' <-> 'd' & 'a' <-> 'c' & 'b' <-> 'c'
- (1 row)
-
- SELECT 'a <-> !b'::tsquery;
-         tsquery
- ------------------------
-  'a' & !( 'a' <-> 'b' )
- (1 row)
-
- SELECT '!a <-> b'::tsquery;
-         tsquery
- ------------------------
-  !( 'a' <-> 'b' ) & 'b'
- (1 row)
-
- SELECT '!a <-> !b'::tsquery;
-               tsquery
- ------------------------------------
-  !'a' & !( !( 'a' <-> 'b' ) & 'b' )
- (1 row)
-
- SELECT 'a <-> !(b&c)'::tsquery;
-                tsquery
- --------------------------------------
-  'a' & !( 'a' <-> 'b' & 'a' <-> 'c' )
- (1 row)
-
- SELECT 'a <-> !(b|c)'::tsquery;
-                tsquery
- --------------------------------------
-  'a' & !( 'a' <-> 'b' | 'a' <-> 'c' )
- (1 row)
-
- SELECT  '!(a&b) <-> c'::tsquery;
-                tsquery
- --------------------------------------
-  !( 'a' <-> 'c' & 'b' <-> 'c' ) & 'c'
- (1 row)
-
- SELECT  '!(a|b) <-> c'::tsquery;
-                tsquery
- --------------------------------------
-  !( 'a' <-> 'c' | 'b' <-> 'c' ) & 'c'
- (1 row)
-
- SELECT  '(!a|b) <-> c'::tsquery;
-                tsquery
- --------------------------------------
-  !( 'a' <-> 'c' ) & 'c' | 'b' <-> 'c'
- (1 row)
-
- SELECT  '(!a&b) <-> c'::tsquery;
-                tsquery
- --------------------------------------
-  !( 'a' <-> 'c' ) & 'c' & 'b' <-> 'c'
- (1 row)
-
- SELECT  'c <-> (!a|b)'::tsquery;
-                tsquery
- --------------------------------------
-  'c' & !( 'c' <-> 'a' ) | 'c' <-> 'b'
- (1 row)
-
- SELECT  'c <-> (!a&b)'::tsquery;
-                tsquery
- --------------------------------------
-  'c' & !( 'c' <-> 'a' ) & 'c' <-> 'b'
- (1 row)
-
- SELECT  '(a|b) <-> !c'::tsquery;
-                     tsquery
- ------------------------------------------------
-  ( 'a' | 'b' ) & !( 'a' <-> 'c' | 'b' <-> 'c' )
- (1 row)
-
- SELECT  '(a&b) <-> !c'::tsquery;
-                   tsquery
- --------------------------------------------
-  'a' & 'b' & !( 'a' <-> 'c' & 'b' <-> 'c' )
- (1 row)
-
- SELECT  '!c <-> (a|b)'::tsquery;
-                      tsquery
- -------------------------------------------------
-  !( 'c' <-> 'a' ) & 'a' | !( 'c' <-> 'b' ) & 'b'
- (1 row)
-
- SELECT  '!c <-> (a&b)'::tsquery;
-                      tsquery
- -------------------------------------------------
-  !( 'c' <-> 'a' ) & 'a' & !( 'c' <-> 'b' ) & 'b'
- (1 row)
-
  --comparisons
  SELECT 'a' < 'b & c'::tsquery as "true";
   true
--- 366,371 ----
*************** SELECT 'foo & bar'::tsquery && 'asd | fg
*** 568,600 ****
  (1 row)

  SELECT 'a' <-> 'b & d'::tsquery;
!          ?column?
! ---------------------------
!  'a' <-> 'b' & 'a' <-> 'd'
  (1 row)

  SELECT 'a & g' <-> 'b & d'::tsquery;
!                        ?column?
! -------------------------------------------------------
!  'a' <-> 'b' & 'g' <-> 'b' & 'a' <-> 'd' & 'g' <-> 'd'
  (1 row)

  SELECT 'a & g' <-> 'b | d'::tsquery;
!                        ?column?
! -------------------------------------------------------
!  'a' <-> 'b' & 'g' <-> 'b' | 'a' <-> 'd' & 'g' <-> 'd'
  (1 row)

  SELECT 'a & g' <-> 'b <-> d'::tsquery;
!                      ?column?
! ---------------------------------------------------
!  'a' <-> ( 'b' <-> 'd' ) & 'g' <-> ( 'b' <-> 'd' )
  (1 row)

  SELECT tsquery_phrase('a <3> g', 'b & d', 10);
!                tsquery_phrase
! ---------------------------------------------
!  'a' <3> 'g' <10> 'b' & 'a' <3> 'g' <10> 'd'
  (1 row)

  -- tsvector-tsquery operations
--- 441,473 ----
  (1 row)

  SELECT 'a' <-> 'b & d'::tsquery;
!        ?column?
! -----------------------
!  'a' <-> ( 'b' & 'd' )
  (1 row)

  SELECT 'a & g' <-> 'b & d'::tsquery;
!             ?column?
! ---------------------------------
!  ( 'a' & 'g' ) <-> ( 'b' & 'd' )
  (1 row)

  SELECT 'a & g' <-> 'b | d'::tsquery;
!             ?column?
! ---------------------------------
!  ( 'a' & 'g' ) <-> ( 'b' | 'd' )
  (1 row)

  SELECT 'a & g' <-> 'b <-> d'::tsquery;
!              ?column?
! -----------------------------------
!  ( 'a' & 'g' ) <-> ( 'b' <-> 'd' )
  (1 row)

  SELECT tsquery_phrase('a <3> g', 'b & d', 10);
!          tsquery_phrase
! --------------------------------
!  'a' <3> 'g' <10> ( 'b' & 'd' )
  (1 row)

  -- tsvector-tsquery operations
*************** SELECT to_tsvector('simple', '1 2 3 4')
*** 749,773 ****
   t
  (1 row)

! SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <-> (2 <-> 3)' AS "false";
   false
  -------
   f
  (1 row)

! SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <2> (2 <-> 3)' AS "true";
   true
  ------
   t
  (1 row)

! SELECT to_tsvector('simple', '1 2 1 2 3 4') @@ '(1 <-> 2) <-> 3' AS "true";
   true
  ------
   t
  (1 row)

! SELECT to_tsvector('simple', '1 2 1 2 3 4') @@ '1 <-> 2 <-> 3' AS "true";
   true
  ------
   t
--- 622,773 ----
   t
  (1 row)

! SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <-> (2 <-> 3)' AS "true";
!  true
! ------
!  t
! (1 row)
!
! SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <2> (2 <-> 3)' AS "false";
   false
  -------
   f
  (1 row)

! SELECT to_tsvector('simple', '1 2 1 2 3 4') @@ '(1 <-> 2) <-> 3' AS "true";
   true
  ------
   t
  (1 row)

! SELECT to_tsvector('simple', '1 2 1 2 3 4') @@ '1 <-> 2 <-> 3' AS "true";
   true
  ------
   t
  (1 row)

! -- without position data, phrase search does not match
! SELECT strip(to_tsvector('simple', '1 2 3 4')) @@ '1 <-> 2 <-> 3' AS "false";
!  false
! -------
!  f
! (1 row)
!
! select to_tsvector('simple', 'q x q y') @@ 'q <-> (x & y)' AS "false";
!  false
! -------
!  f
! (1 row)
!
! select to_tsvector('simple', 'q x') @@ 'q <-> (x | y <-> z)' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'q y') @@ 'q <-> (x | y <-> z)' AS "false";
!  false
! -------
!  f
! (1 row)
!
! select to_tsvector('simple', 'q y z') @@ 'q <-> (x | y <-> z)' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'q y x') @@ 'q <-> (x | y <-> z)' AS "false";
!  false
! -------
!  f
! (1 row)
!
! select to_tsvector('simple', 'q x y') @@ 'q <-> (x | y <-> z)' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'q x') @@ '(x | y <-> z) <-> q' AS "false";
!  false
! -------
!  f
! (1 row)
!
! select to_tsvector('simple', 'x q') @@ '(x | y <-> z) <-> q' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'x y q') @@ '(x | y <-> z) <-> q' AS "false";
!  false
! -------
!  f
! (1 row)
!
! select to_tsvector('simple', 'x y z') @@ '(x | y <-> z) <-> q' AS "false";
!  false
! -------
!  f
! (1 row)
!
! select to_tsvector('simple', 'x y z q') @@ '(x | y <-> z) <-> q' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'y z q') @@ '(x | y <-> z) <-> q' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'y y q') @@ '(x | y <-> z) <-> q' AS "false";
!  false
! -------
!  f
! (1 row)
!
! select to_tsvector('simple', 'y y q') @@ '(!x | y <-> z) <-> q' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'x y q') @@ '(!x | y <-> z) <-> q' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'y y q') @@ '(x | y <-> !z) <-> q' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'x q') @@ '(x | y <-> !z) <-> q' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'x q') @@ '(!x | y <-> z) <-> q' AS "false";
!  false
! -------
!  f
! (1 row)
!
! select to_tsvector('simple', 'z q') @@ '(!x | y <-> z) <-> q' AS "true";
!  true
! ------
!  t
! (1 row)
!
! select to_tsvector('simple', 'x y q y') @@ '!x <-> y' AS "true";
   true
  ------
   t
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index de43860..1255f69 100644
*** a/src/test/regress/sql/tsearch.sql
--- b/src/test/regress/sql/tsearch.sql
*************** SELECT ts_rewrite( query, 'SELECT keywor
*** 447,452 ****
--- 447,458 ----
  SELECT ts_rewrite( query, 'SELECT keyword, sample FROM test_tsquery' ) FROM to_tsquery('english', 'moscow & hotel')
ASquery; 
  SELECT ts_rewrite( query, 'SELECT keyword, sample FROM test_tsquery' ) FROM to_tsquery('english', 'bar &  new & qq &
foo& york') AS query; 

+ SELECT ts_rewrite(tsquery_phrase('foo', 'foo'), 'foo', 'bar | baz');
+ SELECT to_tsvector('foo bar') @@
+   ts_rewrite(tsquery_phrase('foo', 'foo'), 'foo', 'bar | baz');
+ SELECT to_tsvector('bar baz') @@
+   ts_rewrite(tsquery_phrase('foo', 'foo'), 'foo', 'bar | baz');
+
  RESET enable_seqscan;

  --test GUC
diff --git a/src/test/regress/sql/tstypes.sql b/src/test/regress/sql/tstypes.sql
index 9ea93a2..d593395 100644
*** a/src/test/regress/sql/tstypes.sql
--- b/src/test/regress/sql/tstypes.sql
*************** SELECT 'a & !!b'::tsquery;
*** 64,97 ****
  SELECT '!!a & b'::tsquery;
  SELECT '!!a & !!b'::tsquery;

- -- phrase transformation
- SELECT 'a <-> (b|c)'::tsquery;
- SELECT '(a|b) <-> c'::tsquery;
- SELECT '(a|b) <-> (d|c)'::tsquery;
-
- SELECT 'a <-> (b&c)'::tsquery;
- SELECT '(a&b) <-> c'::tsquery;
- SELECT '(a&b) <-> (d&c)'::tsquery;
-
- SELECT 'a <-> !b'::tsquery;
- SELECT '!a <-> b'::tsquery;
- SELECT '!a <-> !b'::tsquery;
-
- SELECT 'a <-> !(b&c)'::tsquery;
- SELECT 'a <-> !(b|c)'::tsquery;
- SELECT  '!(a&b) <-> c'::tsquery;
- SELECT  '!(a|b) <-> c'::tsquery;
-
- SELECT  '(!a|b) <-> c'::tsquery;
- SELECT  '(!a&b) <-> c'::tsquery;
- SELECT  'c <-> (!a|b)'::tsquery;
- SELECT  'c <-> (!a&b)'::tsquery;
-
- SELECT  '(a|b) <-> !c'::tsquery;
- SELECT  '(a&b) <-> !c'::tsquery;
- SELECT  '!c <-> (a|b)'::tsquery;
- SELECT  '!c <-> (a&b)'::tsquery;
-
  --comparisons
  SELECT 'a' < 'b & c'::tsquery as "true";
  SELECT 'a' > 'b & c'::tsquery as "false";
--- 64,69 ----
*************** SELECT to_tsvector('simple', '1 2 11 3')
*** 146,155 ****

  SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <-> 2 <-> 3' AS "true";
  SELECT to_tsvector('simple', '1 2 3 4') @@ '(1 <-> 2) <-> 3' AS "true";
! SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <-> (2 <-> 3)' AS "false";
! SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <2> (2 <-> 3)' AS "true";
  SELECT to_tsvector('simple', '1 2 1 2 3 4') @@ '(1 <-> 2) <-> 3' AS "true";
  SELECT to_tsvector('simple', '1 2 1 2 3 4') @@ '1 <-> 2 <-> 3' AS "true";

  --ranking
  SELECT ts_rank(' a:1 s:2C d g'::tsvector, 'a | s');
--- 118,150 ----

  SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <-> 2 <-> 3' AS "true";
  SELECT to_tsvector('simple', '1 2 3 4') @@ '(1 <-> 2) <-> 3' AS "true";
! SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <-> (2 <-> 3)' AS "true";
! SELECT to_tsvector('simple', '1 2 3 4') @@ '1 <2> (2 <-> 3)' AS "false";
  SELECT to_tsvector('simple', '1 2 1 2 3 4') @@ '(1 <-> 2) <-> 3' AS "true";
  SELECT to_tsvector('simple', '1 2 1 2 3 4') @@ '1 <-> 2 <-> 3' AS "true";
+ -- without position data, phrase search does not match
+ SELECT strip(to_tsvector('simple', '1 2 3 4')) @@ '1 <-> 2 <-> 3' AS "false";
+
+ select to_tsvector('simple', 'q x q y') @@ 'q <-> (x & y)' AS "false";
+ select to_tsvector('simple', 'q x') @@ 'q <-> (x | y <-> z)' AS "true";
+ select to_tsvector('simple', 'q y') @@ 'q <-> (x | y <-> z)' AS "false";
+ select to_tsvector('simple', 'q y z') @@ 'q <-> (x | y <-> z)' AS "true";
+ select to_tsvector('simple', 'q y x') @@ 'q <-> (x | y <-> z)' AS "false";
+ select to_tsvector('simple', 'q x y') @@ 'q <-> (x | y <-> z)' AS "true";
+ select to_tsvector('simple', 'q x') @@ '(x | y <-> z) <-> q' AS "false";
+ select to_tsvector('simple', 'x q') @@ '(x | y <-> z) <-> q' AS "true";
+ select to_tsvector('simple', 'x y q') @@ '(x | y <-> z) <-> q' AS "false";
+ select to_tsvector('simple', 'x y z') @@ '(x | y <-> z) <-> q' AS "false";
+ select to_tsvector('simple', 'x y z q') @@ '(x | y <-> z) <-> q' AS "true";
+ select to_tsvector('simple', 'y z q') @@ '(x | y <-> z) <-> q' AS "true";
+ select to_tsvector('simple', 'y y q') @@ '(x | y <-> z) <-> q' AS "false";
+ select to_tsvector('simple', 'y y q') @@ '(!x | y <-> z) <-> q' AS "true";
+ select to_tsvector('simple', 'x y q') @@ '(!x | y <-> z) <-> q' AS "true";
+ select to_tsvector('simple', 'y y q') @@ '(x | y <-> !z) <-> q' AS "true";
+ select to_tsvector('simple', 'x q') @@ '(x | y <-> !z) <-> q' AS "true";
+ select to_tsvector('simple', 'x q') @@ '(!x | y <-> z) <-> q' AS "false";
+ select to_tsvector('simple', 'z q') @@ '(!x | y <-> z) <-> q' AS "true";
+ select to_tsvector('simple', 'x y q y') @@ '!x <-> y' AS "true";

  --ranking
  SELECT ts_rank(' a:1 s:2C d g'::tsvector, 'a | s');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rethinking our fulltext phrase-search implementation

От
Artur Zakirov
Дата:
Hello Tom,

On 17.12.2016 21:36, Tom Lane wrote:
>
> 4. The transformations are wrong anyway.  The OR case I showed above is
> all right, but as I argued in <24331.1480199636@sss.pgh.pa.us>, the AND
> case is not:
>
> regression=# select 'a <-> (b & c)'::tsquery;
>           tsquery
> ---------------------------
>  'a' <-> 'b' & 'a' <-> 'c'
> (1 row)
>
> This matches 'a b a c', because 'a <-> b' and 'a <-> c' can each be
> matched at different places in that text; but it seems highly unlikely to
> me that that's what the writer of such a query wanted.  (If she did want
> that, she would write it that way to start with.)  NOT is not very nice
> either:

If I'm not mistaken PostgreSQL 9.6 and master with patch 
"fix-phrase-search.patch" return false for the query:

select 'a b a c' @@ 'a <-> (b & c)'::tsquery; ?column?
---------- f
(1 row)

I agree that such query is confusing. Maybe it is better to return true 
for such queries?
Otherwise it seems that queries like 'a <-> (b & c)' will always return 
false. Then we need maybe some warning message.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Rethinking our fulltext phrase-search implementation

От
Tom Lane
Дата:
Artur Zakirov <a.zakirov@postgrespro.ru> writes:
> Otherwise it seems that queries like 'a <-> (b & c)' will always return 
> false. Then we need maybe some warning message.

Well, the query as written is pointless, but it could be useful with
something other than "b" and "c" as the AND-ed terms.  In this usage
"&" is equivalent to "<0>", which we know has corner-case uses.

I'm not inclined to issue any sort of warning for unsatisfiable queries.
We don't issue a warning when a SQL WHERE condition collapses to constant
FALSE, and that seems like exactly the same sort of situation.

It strikes me though that the documentation should point this out.
        regards, tom lane