Re: [PATCH] fix GIN index search sometimes losing results

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [PATCH] fix GIN index search sometimes losing results
Дата
Msg-id 28738.1589670849@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [PATCH] fix GIN index search sometimes losing results  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [PATCH] fix GIN index search sometimes losing results  (Pavel Borisov <pashkin.elfe@gmail.com>)
Список pgsql-hackers
I wrote:
> I think the root of the problem is that if we have a query using
> weights, and we are testing tsvector data that lacks positions/weights,
> we can never say there's definitely a match.  I don't see any decently
> clean way to fix this without redefining the TSExecuteCallback API
> to return a tri-state YES/NO/MAYBE result, because really we need to
> decide that it's MAYBE at the level of processing the QI_VAL node,
> not later on.  I'd tried to avoid that in e81e5741a, but maybe we
> should just bite that bullet, and not worry about whether there's
> any third-party code providing its own TSExecuteCallback routine.
> codesearch.debian.net suggests that there are no external callers
> of TS_execute, so maybe we can get away with that.

0001 attached is a proposed patch that does it that way.  Given the
API break involved, it's not quite clear what to do with this.
ISTM we have three options:

1. Ignore the API issue and back-patch.  Given the apparent lack of
external callers of TS_execute, maybe we can get away with that;
but I wonder if we'd get pushback from distros that have automatic
ABI-break detectors in place.

2. Assume we can't backpatch, but it's still OK to slip this into
v13.  (This option clearly has a limited shelf life, but I think
we could get away with it until late beta.)

3. Assume we'd better hold this till v14.

I find #3 unduly conservative, seeing that this is clearly a bug
fix, but on the other hand #1 is a bit scary.  Aside from the API
issue, it's not impossible that this has introduced some corner
case behavioral changes that we'd consider to be new bugs rather
than bug fixes.

Anyway, some notes for reviewers:

* The core idea of the patch is to make the TS_execute callbacks
have ternary results and to insist they return TS_MAYBE in any
case where the correct result is uncertain.

* That fixes the bug at hand, and it also allows getting rid of
some kluges at higher levels.  The GIN code no longer needs its
own TS_execute_ternary implementation, and the GIST code no longer
needs to suppose that it can't trust NOT results.

* I put some effort into not leaking memory within tsvector_op.c's
checkclass_str and checkcondition_str.  (The final output array
can still get leaked, I believe.  Fixing that seems like material
for a different patch, and it might not be worth any trouble.)

* The new test cases in tstypes.sql are to verify that we didn't
change behavior of the basic tsvector @@ tsquery code.  There wasn't
any coverage of these cases before, and the logic for checkclass_str
without position info had to be tweaked to preserve this behavior.

* The new cases in tsearch verify that the GIN and GIST code gives
the same results as the basic operator.

Now, as for the 0002 patch attached: after 0001, the only TS_execute()
callers that are not specifying TS_EXEC_CALC_NOT are hlCover(),
which I'd already complained is probably a bug, and the first of
the two calls in tsrank.c's Cover().  It seems difficult to me to
argue that it's not a bug for Cover() to process NOT in one call
but not the other --- moreover, if there was any argument for that
once upon a time, it probably falls to the ground now that (a) we
have a less buggy implementation of NOT and (b) the presence of
phrase queries significantly raises the importance of not taking
short-cuts.  Therefore, 0002 attached rips out the TS_EXEC_CALC_NOT
flag and has TS_execute compute NOT expressions accurately all the
time.

As it stands, 0002 changes no regression test results, which I'm
afraid speaks more to our crummy test coverage than anything else;
tests that exercise those two functions with NOT-using queries
would easily show that there is a difference.

Even if we decide to back-patch 0001, I would not suggest
back-patching 0002, as it's more nearly a definitional change
than a bug fix.  But I think it's a good idea anyway.

I'll stick this in the queue for the July commitfest, in case
anybody wants to review it.

            regards, tom lane

diff --git a/src/backend/tsearch/wparser_def.c b/src/backend/tsearch/wparser_def.c
index 48e55e1..bfbc8d5 100644
--- a/src/backend/tsearch/wparser_def.c
+++ b/src/backend/tsearch/wparser_def.c
@@ -1969,7 +1969,7 @@ typedef struct
 /*
  * TS_execute callback for matching a tsquery operand to headline words
  */
-static bool
+static TSTernaryValue
 checkcondition_HL(void *opaque, QueryOperand *val, ExecPhraseData *data)
 {
     hlCheck    *checkval = (hlCheck *) opaque;
@@ -1982,7 +1982,7 @@ checkcondition_HL(void *opaque, QueryOperand *val, ExecPhraseData *data)
         {
             /* if data == NULL, don't need to report positions */
             if (!data)
-                return true;
+                return TS_YES;

             if (!data->pos)
             {
@@ -1999,9 +1999,9 @@ checkcondition_HL(void *opaque, QueryOperand *val, ExecPhraseData *data)
     }

     if (data && data->npos > 0)
-        return true;
+        return TS_YES;

-    return false;
+    return TS_NO;
 }

 /*
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index 2d65616..3128f0a 100644
--- a/src/backend/utils/adt/tsginidx.c
+++ b/src/backend/utils/adt/tsginidx.c
@@ -178,9 +178,13 @@ typedef struct
     bool       *need_recheck;
 } GinChkVal;

-static GinTernaryValue
-checkcondition_gin_internal(GinChkVal *gcv, QueryOperand *val, ExecPhraseData *data)
+/*
+ * TS_execute callback for matching a tsquery operand to GIN index data
+ */
+static TSTernaryValue
+checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
 {
+    GinChkVal  *gcv = (GinChkVal *) checkval;
     int            j;

     /*
@@ -193,112 +197,22 @@ checkcondition_gin_internal(GinChkVal *gcv, QueryOperand *val, ExecPhraseData *d
     /* convert item's number to corresponding entry's (operand's) number */
     j = gcv->map_item_operand[((QueryItem *) val) - gcv->first_item];

-    /* return presence of current entry in indexed value */
-    return gcv->check[j];
-}
-
-/*
- * Wrapper of check condition function for TS_execute.
- */
-static bool
-checkcondition_gin(void *checkval, QueryOperand *val, ExecPhraseData *data)
-{
-    return checkcondition_gin_internal((GinChkVal *) checkval,
-                                       val,
-                                       data) != GIN_FALSE;
-}
-
-/*
- * Evaluate tsquery boolean expression using ternary logic.
- *
- * Note: the reason we can't use TS_execute() for this is that its API
- * for the checkcondition callback doesn't allow a MAYBE result to be
- * returned, but we might have MAYBEs in the gcv->check array.
- * Perhaps we should change that API.
- */
-static GinTernaryValue
-TS_execute_ternary(GinChkVal *gcv, QueryItem *curitem, bool in_phrase)
-{
-    GinTernaryValue val1,
-                val2,
-                result;
-
-    /* since this function recurses, it could be driven to stack overflow */
-    check_stack_depth();
-
-    if (curitem->type == QI_VAL)
-        return
-            checkcondition_gin_internal(gcv,
-                                        (QueryOperand *) curitem,
-                                        NULL /* don't have position info */ );
-
-    switch (curitem->qoperator.oper)
+    /*
+     * return presence of current entry in indexed value; but TRUE becomes
+     * MAYBE in the presence of a query requiring recheck
+     */
+    if (gcv->check[j] == GIN_TRUE)
     {
-        case OP_NOT:
-
-            /*
-             * Below a phrase search, force NOT's result to MAYBE.  We cannot
-             * invert a TRUE result from the subexpression to FALSE, since
-             * TRUE only says that the subexpression matches somewhere, not
-             * that it matches everywhere, so there might be positions where
-             * the NOT will match.  We could invert FALSE to TRUE, but there's
-             * little point in distinguishing TRUE from MAYBE, since a recheck
-             * will have been forced already.
-             */
-            if (in_phrase)
-                return GIN_MAYBE;
-
-            result = TS_execute_ternary(gcv, curitem + 1, in_phrase);
-            if (result == GIN_MAYBE)
-                return result;
-            return !result;
-
-        case OP_PHRASE:
-
-            /*
-             * GIN doesn't contain any information about positions, so treat
-             * OP_PHRASE as OP_AND with recheck requirement, and always
-             * reporting MAYBE not TRUE.
-             */
-            *(gcv->need_recheck) = true;
-            /* Pass down in_phrase == true in case there's a NOT below */
-            in_phrase = true;
-
-            /* FALL THRU */
-
-        case OP_AND:
-            val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
-                                      in_phrase);
-            if (val1 == GIN_FALSE)
-                return GIN_FALSE;
-            val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
-            if (val2 == GIN_FALSE)
-                return GIN_FALSE;
-            if (val1 == GIN_TRUE && val2 == GIN_TRUE &&
-                curitem->qoperator.oper != OP_PHRASE)
-                return GIN_TRUE;
-            else
-                return GIN_MAYBE;
-
-        case OP_OR:
-            val1 = TS_execute_ternary(gcv, curitem + curitem->qoperator.left,
-                                      in_phrase);
-            if (val1 == GIN_TRUE)
-                return GIN_TRUE;
-            val2 = TS_execute_ternary(gcv, curitem + 1, in_phrase);
-            if (val2 == GIN_TRUE)
-                return GIN_TRUE;
-            if (val1 == GIN_FALSE && val2 == GIN_FALSE)
-                return GIN_FALSE;
-            else
-                return GIN_MAYBE;
-
-        default:
-            elog(ERROR, "unrecognized operator: %d", curitem->qoperator.oper);
+        if (val->weight != 0 || data != NULL)
+            return TS_MAYBE;
     }

-    /* not reachable, but keep compiler quiet */
-    return false;
+    /*
+     * We rely on GinTernaryValue and TSTernaryValue using equivalent value
+     * assignments.  We could use a switch statement to map the values if that
+     * ever stops being true, but it seems unlikely to happen.
+     */
+    return (TSTernaryValue) gcv->check[j];
 }

 Datum
@@ -370,10 +284,11 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)
         gcv.map_item_operand = (int *) (extra_data[0]);
         gcv.need_recheck = &recheck;

-        res = TS_execute_ternary(&gcv, GETQUERY(query), false);
-
-        if (res == GIN_TRUE && recheck)
-            res = GIN_MAYBE;
+        if (TS_execute(GETQUERY(query),
+                       &gcv,
+                       TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_NO_POS,
+                       checkcondition_gin))
+            res = recheck ? GIN_MAYBE : GIN_TRUE;
     }

     PG_RETURN_GIN_TERNARY_VALUE(res);
diff --git a/src/backend/utils/adt/tsgistidx.c b/src/backend/utils/adt/tsgistidx.c
index c3f2580..927aed9 100644
--- a/src/backend/utils/adt/tsgistidx.c
+++ b/src/backend/utils/adt/tsgistidx.c
@@ -273,9 +273,9 @@ typedef struct
 } CHKVAL;

 /*
- * is there value 'val' in array or not ?
+ * TS_execute callback for matching a tsquery operand to GIST leaf-page data
  */
-static bool
+static TSTernaryValue
 checkcondition_arr(void *checkval, QueryOperand *val, ExecPhraseData *data)
 {
     int32       *StopLow = ((CHKVAL *) checkval)->arrb;
@@ -288,23 +288,26 @@ checkcondition_arr(void *checkval, QueryOperand *val, ExecPhraseData *data)
      * we are not able to find a prefix by hash value
      */
     if (val->prefix)
-        return true;
+        return TS_MAYBE;

     while (StopLow < StopHigh)
     {
         StopMiddle = StopLow + (StopHigh - StopLow) / 2;
         if (*StopMiddle == val->valcrc)
-            return true;
+            return TS_MAYBE;
         else if (*StopMiddle < val->valcrc)
             StopLow = StopMiddle + 1;
         else
             StopHigh = StopMiddle;
     }

-    return false;
+    return TS_NO;
 }

-static bool
+/*
+ * TS_execute callback for matching a tsquery operand to GIST non-leaf data
+ */
+static TSTernaryValue
 checkcondition_bit(void *checkval, QueryOperand *val, ExecPhraseData *data)
 {
     void       *key = (SignTSVector *) checkval;
@@ -313,8 +316,12 @@ checkcondition_bit(void *checkval, QueryOperand *val, ExecPhraseData *data)
      * we are not able to find a prefix in signature tree
      */
     if (val->prefix)
-        return true;
-    return GETBIT(GETSIGN(key), HASHVAL(val->valcrc, GETSIGLEN(key)));
+        return TS_MAYBE;
+
+    if (GETBIT(GETSIGN(key), HASHVAL(val->valcrc, GETSIGLEN(key))))
+        return TS_MAYBE;
+    else
+        return TS_NO;
 }

 Datum
@@ -339,10 +346,9 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
         if (ISALLTRUE(key))
             PG_RETURN_BOOL(true);

-        /* since signature is lossy, cannot specify CALC_NOT here */
         PG_RETURN_BOOL(TS_execute(GETQUERY(query),
                                   key,
-                                  TS_EXEC_PHRASE_NO_POS,
+                                  TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
                                   checkcondition_bit));
     }
     else
diff --git a/src/backend/utils/adt/tsrank.c b/src/backend/utils/adt/tsrank.c
index 07251dd..cbd97ab 100644
--- a/src/backend/utils/adt/tsrank.c
+++ b/src/backend/utils/adt/tsrank.c
@@ -556,14 +556,18 @@ typedef struct
 #define QR_GET_OPERAND_DATA(q, v) \
     ( (q)->operandData + (((QueryItem*)(v)) - GETQUERY((q)->query)) )

-static bool
-checkcondition_QueryOperand(void *checkval, QueryOperand *val, ExecPhraseData *data)
+/*
+ * TS_execute callback for matching a tsquery operand to QueryRepresentation
+ */
+static TSTernaryValue
+checkcondition_QueryOperand(void *checkval, QueryOperand *val,
+                            ExecPhraseData *data)
 {
     QueryRepresentation *qr = (QueryRepresentation *) checkval;
     QueryRepresentationOperand *opData = QR_GET_OPERAND_DATA(qr, val);

     if (!opData->operandexists)
-        return false;
+        return TS_NO;

     if (data)
     {
@@ -573,7 +577,7 @@ checkcondition_QueryOperand(void *checkval, QueryOperand *val, ExecPhraseData *d
             data->pos += MAXQROPOS - opData->npos;
     }

-    return true;
+    return TS_YES;
 }

 typedef struct
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 51619c3..6df943a 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -67,14 +67,6 @@ typedef struct
     StatEntry  *root;
 } TSVectorStat;

-/* TS_execute requires ternary logic to handle NOT with phrase matches */
-typedef enum
-{
-    TS_NO,                        /* definitely no match */
-    TS_YES,                        /* definitely does match */
-    TS_MAYBE                    /* can't verify match for lack of pos data */
-} TSTernaryValue;
-

 static TSTernaryValue TS_execute_recurse(QueryItem *curitem, void *arg,
                                          uint32 flags,
@@ -1188,13 +1180,15 @@ tsCompareString(char *a, int lena, char *b, int lenb, bool prefix)
 /*
  * Check weight info or/and fill 'data' with the required positions
  */
-static bool
+static TSTernaryValue
 checkclass_str(CHKVAL *chkval, WordEntry *entry, QueryOperand *val,
                ExecPhraseData *data)
 {
-    bool        result = false;
+    TSTernaryValue result = TS_NO;

-    if (entry->haspos && (val->weight || data))
+    Assert(data == NULL || data->npos == 0);
+
+    if (entry->haspos)
     {
         WordEntryPosVector *posvec;

@@ -1232,7 +1226,13 @@ checkclass_str(CHKVAL *chkval, WordEntry *entry, QueryOperand *val,
             data->npos = dptr - data->pos;

             if (data->npos > 0)
-                result = true;
+                result = TS_YES;
+            else
+            {
+                pfree(data->pos);
+                data->pos = NULL;
+                data->allocated = false;
+            }
         }
         else if (val->weight)
         {
@@ -1243,40 +1243,57 @@ checkclass_str(CHKVAL *chkval, WordEntry *entry, QueryOperand *val,
             {
                 if (val->weight & (1 << WEP_GETWEIGHT(*posvec_iter)))
                 {
-                    result = true;
+                    result = TS_YES;
                     break;        /* no need to go further */
                 }

                 posvec_iter++;
             }
         }
-        else                    /* data != NULL */
+        else if (data)
         {
             data->npos = posvec->npos;
             data->pos = posvec->pos;
             data->allocated = false;
-            result = true;
+            result = TS_YES;
+        }
+        else
+        {
+            /* simplest case: no weight check, positions not needed */
+            result = TS_YES;
         }
     }
     else
     {
-        result = true;
+        /*
+         * Position info is lacking, so if the caller requires it, we can only
+         * say that maybe there is a match.
+         *
+         * Notice, however, that we *don't* check val->weight here.
+         * Historically, stripped tsvectors are considered to match queries
+         * whether or not the query has a weight restriction; that's a little
+         * dubious but we'll preserve the behavior.
+         */
+        if (data)
+            result = TS_MAYBE;
+        else
+            result = TS_YES;
     }

     return result;
 }

 /*
- * is there value 'val' in array or not ?
+ * TS_execute callback for matching a tsquery operand to plain tsvector data
  */
-static bool
+static TSTernaryValue
 checkcondition_str(void *checkval, QueryOperand *val, ExecPhraseData *data)
 {
     CHKVAL       *chkval = (CHKVAL *) checkval;
     WordEntry  *StopLow = chkval->arrb;
     WordEntry  *StopHigh = chkval->arre;
     WordEntry  *StopMiddle = StopHigh;
-    bool        res = false;
+    TSTernaryValue res = TS_NO;

     /* Loop invariant: StopLow <= val < StopHigh */
     while (StopLow < StopHigh)
@@ -1302,36 +1319,69 @@ checkcondition_str(void *checkval, QueryOperand *val, ExecPhraseData *data)
             StopHigh = StopMiddle;
     }

-    if ((!res || data) && val->prefix)
+    /*
+     * If it's a prefix search, we should also consider lexemes that the
+     * search term is a prefix of (which will necessarily immediately follow
+     * the place we found in the above loop).  But we can skip them if there
+     * was a definite match on the exact term AND the caller doesn't need
+     * position info.
+     */
+    if (val->prefix && (res != TS_YES || data))
     {
         WordEntryPos *allpos = NULL;
         int            npos = 0,
                     totalpos = 0;

-        /*
-         * there was a failed exact search, so we should scan further to find
-         * a prefix match. We also need to do so if caller needs position info
-         */
+        /* adjust start position for corner case */
         if (StopLow >= StopHigh)
             StopMiddle = StopHigh;

-        while ((!res || data) && StopMiddle < chkval->arre &&
+        /* we don't try to re-use any data from the initial match */
+        if (data)
+        {
+            if (data->allocated)
+                pfree(data->pos);
+            data->pos = NULL;
+            data->allocated = false;
+            data->npos = 0;
+        }
+        res = TS_NO;
+
+        while ((res != TS_YES || data) &&
+               StopMiddle < chkval->arre &&
                tsCompareString(chkval->operand + val->distance,
                                val->length,
                                chkval->values + StopMiddle->pos,
                                StopMiddle->len,
                                true) == 0)
         {
-            if (data)
-            {
-                /*
-                 * We need to join position information
-                 */
-                res = checkclass_str(chkval, StopMiddle, val, data);
+            TSTernaryValue subres;
+
+            subres = checkclass_str(chkval, StopMiddle, val, data);

-                if (res)
+            if (subres != TS_NO)
+            {
+                if (data)
                 {
-                    while (npos + data->npos >= totalpos)
+                    /*
+                     * We need to join position information
+                     */
+                    if (subres == TS_MAYBE)
+                    {
+                        /*
+                         * No position info for this match, so we must report
+                         * MAYBE overall.
+                         */
+                        res = TS_MAYBE;
+                        /* forget any previous positions */
+                        npos = 0;
+                        /* don't leak storage */
+                        if (allpos)
+                            pfree(allpos);
+                        break;
+                    }
+
+                    while (npos + data->npos > totalpos)
                     {
                         if (totalpos == 0)
                         {
@@ -1347,22 +1397,27 @@ checkcondition_str(void *checkval, QueryOperand *val, ExecPhraseData *data)

                     memcpy(allpos + npos, data->pos, sizeof(WordEntryPos) * data->npos);
                     npos += data->npos;
+
+                    /* don't leak storage from individual matches */
+                    if (data->allocated)
+                        pfree(data->pos);
+                    data->pos = NULL;
+                    data->allocated = false;
+                    /* it's important to reset data->npos before next loop */
+                    data->npos = 0;
                 }
                 else
                 {
-                    /* at loop exit, res must be true if we found matches */
-                    res = (npos > 0);
+                    /* Don't need positions, just handle YES/MAYBE */
+                    if (subres == TS_YES || res == TS_NO)
+                        res = subres;
                 }
             }
-            else
-            {
-                res = checkclass_str(chkval, StopMiddle, val, NULL);
-            }

             StopMiddle++;
         }

-        if (res && data)
+        if (data && npos > 0)
         {
             /* Sort and make unique array of found positions */
             data->pos = allpos;
@@ -1370,6 +1425,7 @@ checkcondition_str(void *checkval, QueryOperand *val, ExecPhraseData *data)
             data->npos = qunique(data->pos, npos, sizeof(WordEntryPos),
                                  compareWordEntryPos);
             data->allocated = true;
+            res = TS_YES;
         }
     }

@@ -1561,14 +1617,7 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
     check_stack_depth();

     if (curitem->type == QI_VAL)
-    {
-        if (!chkcond(arg, (QueryOperand *) curitem, data))
-            return TS_NO;
-        if (data->npos > 0 || data->negate)
-            return TS_YES;
-        /* If we have no position data, we must return TS_MAYBE */
-        return TS_MAYBE;
-    }
+        return chkcond(arg, (QueryOperand *) curitem, data);

     switch (curitem->qoperator.oper)
     {
@@ -1821,7 +1870,7 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,

     if (curitem->type == QI_VAL)
         return chkcond(arg, (QueryOperand *) curitem,
-                       NULL /* don't need position info */ ) ? TS_YES : TS_NO;
+                       NULL /* don't need position info */ );

     switch (curitem->qoperator.oper)
     {
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index f78fbd9..609b0c7 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -124,13 +124,21 @@ extern text *generateHeadline(HeadlineParsedText *prs);
  * whether a given primitive tsquery value is matched in the data.
  */

+/* TS_execute requires ternary logic to handle NOT with phrase matches */
+typedef enum
+{
+    TS_NO,                        /* definitely no match */
+    TS_YES,                        /* definitely does match */
+    TS_MAYBE                    /* can't verify match for lack of pos data */
+} TSTernaryValue;
+
 /*
  * struct ExecPhraseData is passed to a TSExecuteCallback function if we need
  * lexeme position data (because of a phrase-match operator in the tsquery).
- * The callback should fill in position data when it returns true (success).
- * If it cannot return position data, it may leave "data" unchanged, but
- * then the caller of TS_execute() must pass the TS_EXEC_PHRASE_NO_POS flag
- * and must arrange for a later recheck with position data available.
+ * The callback should fill in position data when it returns TS_YES (success).
+ * If it cannot return position data, it should leave "data" unchanged and
+ * return TS_MAYBE.  The caller of TS_execute() must then arrange for a later
+ * recheck with position data available.
  *
  * The reported lexeme positions must be sorted and unique.  Callers must only
  * consult the position bits of the pos array, ie, WEP_GETPOS(data->pos[i]).
@@ -162,12 +170,13 @@ typedef struct ExecPhraseData
  * val: lexeme to test for presence of
  * data: to be filled with lexeme positions; NULL if position data not needed
  *
- * Return true if lexeme is present in data, else false.  If data is not
- * NULL, it should be filled with lexeme positions, but function can leave
- * it as zeroes if position data is not available.
+ * Return TS_YES if lexeme is present in data, TS_MAYBE if it might be
+ * present, TS_NO if it definitely is not present.  If data is not NULL,
+ * it must be filled with lexeme positions if available.  If position data
+ * is not available, leave *data as zeroes and return TS_MAYBE, never TS_YES.
  */
-typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
-                                   ExecPhraseData *data);
+typedef TSTernaryValue (*TSExecuteCallback) (void *arg, QueryOperand *val,
+                                             ExecPhraseData *data);

 /*
  * Flag bits for TS_execute
@@ -175,10 +184,7 @@ typedef bool (*TSExecuteCallback) (void *arg, QueryOperand *val,
 #define TS_EXEC_EMPTY            (0x00)
 /*
  * If TS_EXEC_CALC_NOT is not set, then NOT expressions are automatically
- * evaluated to be true.  Useful in cases where NOT cannot be accurately
- * computed (GiST) or it isn't important (ranking).  From TS_execute's
- * perspective, !CALC_NOT means that the TSExecuteCallback function might
- * return false-positive indications of a lexeme's presence.
+ * evaluated to be true.  Useful in cases where NOT isn't important (ranking).
  */
 #define TS_EXEC_CALC_NOT        (0x01)
 /*
diff --git a/src/test/regress/expected/tsearch.out b/src/test/regress/expected/tsearch.out
index 7105c67..0110b4d 100644
--- a/src/test/regress/expected/tsearch.out
+++ b/src/test/regress/expected/tsearch.out
@@ -176,6 +176,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
    507
 (1 row)

+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+    56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+    58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+   452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+   450
+(1 row)
+
 create index wowidx on test_tsvector using gist (a);
 SET enable_seqscan=OFF;
 SET enable_indexscan=ON;
@@ -308,6 +332,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
    507
 (1 row)

+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+    56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+    58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+   452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+   450
+(1 row)
+
 SET enable_indexscan=OFF;
 SET enable_bitmapscan=ON;
 explain (costs off) SELECT count(*) FROM test_tsvector WHERE a @@ 'wr|qh';
@@ -440,6 +488,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
    507
 (1 row)

+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+    56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+    58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+   452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+   450
+(1 row)
+
 -- Test siglen parameter of GiST tsvector_ops
 CREATE INDEX wowidx1 ON test_tsvector USING gist (a tsvector_ops(foo=1));
 ERROR:  unrecognized parameter "foo"
@@ -595,6 +667,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
    507
 (1 row)

+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+    56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+    58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+   452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+   450
+(1 row)
+
 DROP INDEX wowidx2;
 CREATE INDEX wowidx ON test_tsvector USING gist (a tsvector_ops(siglen=484));
 \d test_tsvector
@@ -736,6 +832,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
    507
 (1 row)

+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+    56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+    58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+   452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+   450
+(1 row)
+
 RESET enable_seqscan;
 RESET enable_indexscan;
 RESET enable_bitmapscan;
@@ -873,6 +993,30 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
    507
 (1 row)

+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+ count
+-------
+    56
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+ count
+-------
+    58
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+ count
+-------
+   452
+(1 row)
+
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';
+ count
+-------
+   450
+(1 row)
+
 -- Test optimization of non-empty GIN_SEARCH_MODE_ALL queries
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM test_tsvector WHERE a @@ '!qh';
diff --git a/src/test/regress/expected/tstypes.out b/src/test/regress/expected/tstypes.out
index ee4a249..2601e31 100644
--- a/src/test/regress/expected/tstypes.out
+++ b/src/test/regress/expected/tstypes.out
@@ -551,6 +551,55 @@ SELECT 'wa:1A wb:2D'::tsvector @@ 'w:*D <-> w:*A'::tsquery as "false";
  f
 (1 row)

+SELECT 'wa:1A'::tsvector @@ 'w:*A'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+SELECT 'wa:1A'::tsvector @@ 'w:*D'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
+SELECT 'wa:1A'::tsvector @@ '!w:*A'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
+SELECT 'wa:1A'::tsvector @@ '!w:*D'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+-- historically, a stripped tsvector matches queries ignoring weights:
+SELECT strip('wa:1A'::tsvector) @@ 'w:*A'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+SELECT strip('wa:1A'::tsvector) @@ 'w:*D'::tsquery as "true";
+ true
+------
+ t
+(1 row)
+
+SELECT strip('wa:1A'::tsvector) @@ '!w:*A'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
+SELECT strip('wa:1A'::tsvector) @@ '!w:*D'::tsquery as "false";
+ false
+-------
+ f
+(1 row)
+
 SELECT 'supernova'::tsvector @@ 'super'::tsquery AS "false";
  false
 -------
diff --git a/src/test/regress/sql/tsearch.sql b/src/test/regress/sql/tsearch.sql
index e53e44f..8a27fcd 100644
--- a/src/test/regress/sql/tsearch.sql
+++ b/src/test/regress/sql/tsearch.sql
@@ -61,6 +61,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';

 create index wowidx on test_tsvector using gist (a);

@@ -90,6 +94,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';

 SET enable_indexscan=OFF;
 SET enable_bitmapscan=ON;
@@ -116,6 +124,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';

 -- Test siglen parameter of GiST tsvector_ops
 CREATE INDEX wowidx1 ON test_tsvector USING gist (a tsvector_ops(foo=1));
@@ -152,6 +164,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';

 DROP INDEX wowidx2;

@@ -181,6 +197,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';

 RESET enable_seqscan;
 RESET enable_indexscan;
@@ -215,6 +235,10 @@ SELECT count(*) FROM test_tsvector WHERE a @@ '!qe <2> qt';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(pl <-> yh)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(yh <-> pl)';
 SELECT count(*) FROM test_tsvector WHERE a @@ '!(qe <2> qt)';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ 'wd:D';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:A';
+SELECT count(*) FROM test_tsvector WHERE a @@ '!wd:D';

 -- Test optimization of non-empty GIN_SEARCH_MODE_ALL queries
 EXPLAIN (COSTS OFF)
diff --git a/src/test/regress/sql/tstypes.sql b/src/test/regress/sql/tstypes.sql
index 50b4359..30c8c70 100644
--- a/src/test/regress/sql/tstypes.sql
+++ b/src/test/regress/sql/tstypes.sql
@@ -104,6 +104,15 @@ SELECT 'a b:89  ca:23A,64c cb:80b d:34c'::tsvector @@ 'd:AC & c:*B' as "true";
 SELECT 'wa:1D wb:2A'::tsvector @@ 'w:*D & w:*A'::tsquery as "true";
 SELECT 'wa:1D wb:2A'::tsvector @@ 'w:*D <-> w:*A'::tsquery as "true";
 SELECT 'wa:1A wb:2D'::tsvector @@ 'w:*D <-> w:*A'::tsquery as "false";
+SELECT 'wa:1A'::tsvector @@ 'w:*A'::tsquery as "true";
+SELECT 'wa:1A'::tsvector @@ 'w:*D'::tsquery as "false";
+SELECT 'wa:1A'::tsvector @@ '!w:*A'::tsquery as "false";
+SELECT 'wa:1A'::tsvector @@ '!w:*D'::tsquery as "true";
+-- historically, a stripped tsvector matches queries ignoring weights:
+SELECT strip('wa:1A'::tsvector) @@ 'w:*A'::tsquery as "true";
+SELECT strip('wa:1A'::tsvector) @@ 'w:*D'::tsquery as "true";
+SELECT strip('wa:1A'::tsvector) @@ '!w:*A'::tsquery as "false";
+SELECT strip('wa:1A'::tsvector) @@ '!w:*D'::tsquery as "false";

 SELECT 'supernova'::tsvector @@ 'super'::tsquery AS "false";
 SELECT 'supeanova supernova'::tsvector @@ 'super'::tsquery AS "false";
diff --git a/src/backend/utils/adt/tsginidx.c b/src/backend/utils/adt/tsginidx.c
index 3128f0a..b3e3ffc 100644
--- a/src/backend/utils/adt/tsginidx.c
+++ b/src/backend/utils/adt/tsginidx.c
@@ -248,7 +248,7 @@ gin_tsquery_consistent(PG_FUNCTION_ARGS)

         res = TS_execute(GETQUERY(query),
                          &gcv,
-                         TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_NO_POS,
+                         TS_EXEC_PHRASE_NO_POS,
                          checkcondition_gin);
     }

@@ -286,7 +286,7 @@ gin_tsquery_triconsistent(PG_FUNCTION_ARGS)

         if (TS_execute(GETQUERY(query),
                        &gcv,
-                       TS_EXEC_CALC_NOT | TS_EXEC_PHRASE_NO_POS,
+                       TS_EXEC_PHRASE_NO_POS,
                        checkcondition_gin))
             res = recheck ? GIN_MAYBE : GIN_TRUE;
     }
diff --git a/src/backend/utils/adt/tsgistidx.c b/src/backend/utils/adt/tsgistidx.c
index 927aed9..a601965 100644
--- a/src/backend/utils/adt/tsgistidx.c
+++ b/src/backend/utils/adt/tsgistidx.c
@@ -348,7 +348,7 @@ gtsvector_consistent(PG_FUNCTION_ARGS)

         PG_RETURN_BOOL(TS_execute(GETQUERY(query),
                                   key,
-                                  TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
+                                  TS_EXEC_PHRASE_NO_POS,
                                   checkcondition_bit));
     }
     else
@@ -359,7 +359,7 @@ gtsvector_consistent(PG_FUNCTION_ARGS)
         chkval.arre = chkval.arrb + ARRNELEM(key);
         PG_RETURN_BOOL(TS_execute(GETQUERY(query),
                                   (void *) &chkval,
-                                  TS_EXEC_PHRASE_NO_POS | TS_EXEC_CALC_NOT,
+                                  TS_EXEC_PHRASE_NO_POS,
                                   checkcondition_arr));
     }
 }
diff --git a/src/backend/utils/adt/tsrank.c b/src/backend/utils/adt/tsrank.c
index cbd97ab..c88ebfc 100644
--- a/src/backend/utils/adt/tsrank.c
+++ b/src/backend/utils/adt/tsrank.c
@@ -697,7 +697,7 @@ Cover(DocRepresentation *doc, int len, QueryRepresentation *qr, CoverExt *ext)
         fillQueryRepresentationData(qr, ptr);

         if (TS_execute(GETQUERY(qr->query), (void *) qr,
-                       TS_EXEC_CALC_NOT, checkcondition_QueryOperand))
+                       TS_EXEC_EMPTY, checkcondition_QueryOperand))
         {
             if (WEP_GETPOS(ptr->pos) < ext->p)
             {
diff --git a/src/backend/utils/adt/tsvector_op.c b/src/backend/utils/adt/tsvector_op.c
index 6df943a..2faba46 100644
--- a/src/backend/utils/adt/tsvector_op.c
+++ b/src/backend/utils/adt/tsvector_op.c
@@ -1627,13 +1627,6 @@ TS_phrase_execute(QueryItem *curitem, void *arg, uint32 flags,
              * We need not touch data->width, since a NOT operation does not
              * change the match width.
              */
-            if (!(flags & TS_EXEC_CALC_NOT))
-            {
-                /* without CALC_NOT, report NOT as "match everywhere" */
-                Assert(data->npos == 0 && !data->negate);
-                data->negate = true;
-                return TS_YES;
-            }
             switch (TS_phrase_execute(curitem + 1, arg, flags, chkcond, data))
             {
                 case TS_NO:
@@ -1875,8 +1868,6 @@ TS_execute_recurse(QueryItem *curitem, void *arg, uint32 flags,
     switch (curitem->qoperator.oper)
     {
         case OP_NOT:
-            if (!(flags & TS_EXEC_CALC_NOT))
-                return TS_YES;
             switch (TS_execute_recurse(curitem + 1, arg, flags, chkcond))
             {
                 case TS_NO:
@@ -2038,7 +2029,7 @@ ts_match_vq(PG_FUNCTION_ARGS)
     chkval.operand = GETOPERAND(query);
     result = TS_execute(GETQUERY(query),
                         &chkval,
-                        TS_EXEC_CALC_NOT,
+                        TS_EXEC_EMPTY,
                         checkcondition_str);

     PG_FREE_IF_COPY(val, 0);
diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index 609b0c7..9ff9fcf 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -183,11 +183,6 @@ typedef TSTernaryValue (*TSExecuteCallback) (void *arg, QueryOperand *val,
  */
 #define TS_EXEC_EMPTY            (0x00)
 /*
- * If TS_EXEC_CALC_NOT is not set, then NOT expressions are automatically
- * evaluated to be true.  Useful in cases where NOT isn't important (ranking).
- */
-#define TS_EXEC_CALC_NOT        (0x01)
-/*
  * If TS_EXEC_PHRASE_NO_POS is set, allow OP_PHRASE to be executed lossily
  * in the absence of position information: a true result indicates that the
  * phrase might be present.  Without this flag, OP_PHRASE always returns

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Erik Rijkers
Дата:
Сообщение: Re: Add A Glossary
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Potentially misleading name of libpq pass phrase hook