Re: The documentation for READ COMMITTED may be incomplete or wrong

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: The documentation for READ COMMITTED may be incomplete or wrong
Дата
Msg-id 4111741.1684454234@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: The documentation for READ COMMITTED may be incomplete or wrong  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: The documentation for READ COMMITTED may be incomplete or wrong  (Aleksander Alekseev <aleksander@timescale.com>)
Re: The documentation for READ COMMITTED may be incomplete or wrong  (Amit Langote <amitlangote09@gmail.com>)
Список pgsql-hackers
I wrote:
> Debian Code Search doesn't know of any outside code touching
> relsubs_done, so I think we are safe in dropping that code in
> ExecScanReScan.  It seems quite pointless anyway considering
> that up to now, EvalPlanQualBegin has always zeroed the whole
> array.

Oh, belay that.  What I'd forgotten is that it's possible that
the target relation is on the inside of a nestloop, meaning that
we might need to fetch the EPQ substitute tuple more than once.
So there are three possible states: blocked (never return a
tuple), ready to return a tuple, and done returning a tuple
for this scan.  ExecScanReScan needs to reset "done" to "ready",
but not touch the "blocked" state.  The attached v2 mechanizes
that using two bool arrays.

What I'm thinking about doing to back-patch this is to replace
one of the pointer fields in EPQState with a pointer to a
subsidiary palloc'd structure, where we can put the new fields
along with the cannibalized old one.  We've done something
similar before, and it seems a lot safer than having basically
different logic in v16 than earlier branches.

            regards, tom lane

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0186be452c..c76fdf59ec 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2469,7 +2469,7 @@ ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist)
  *    relation - table containing tuple
  *    rti - rangetable index of table containing tuple
  *    inputslot - tuple for processing - this can be the slot from
- *        EvalPlanQualSlot(), for the increased efficiency.
+ *        EvalPlanQualSlot() for this rel, for increased efficiency.
  *
  * This tests whether the tuple in inputslot still matches the relevant
  * quals. For that result to be useful, typically the input tuple has to be
@@ -2503,6 +2503,14 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
     if (testslot != inputslot)
         ExecCopySlot(testslot, inputslot);

+    /*
+     * Mark that an EPQ tuple is available for this relation.  (If there is
+     * more than one result relation, the others remain marked as having no
+     * tuple available.)
+     */
+    epqstate->relsubs_done[rti - 1] = false;
+    epqstate->relsubs_blocked[rti - 1] = false;
+
     /*
      * Run the EPQ query.  We assume it will return at most one tuple.
      */
@@ -2519,11 +2527,12 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
         ExecMaterializeSlot(slot);

     /*
-     * Clear out the test tuple.  This is needed in case the EPQ query is
-     * re-used to test a tuple for a different relation.  (Not clear that can
-     * really happen, but let's be safe.)
+     * Clear out the test tuple, and mark that no tuple is available here.
+     * This is needed in case the EPQ state is re-used to test a tuple for a
+     * different target relation.
      */
     ExecClearTuple(testslot);
+    epqstate->relsubs_blocked[rti - 1] = true;

     return slot;
 }
@@ -2532,18 +2541,26 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
  * EvalPlanQualInit -- initialize during creation of a plan state node
  * that might need to invoke EPQ processing.
  *
+ * If the caller intends to use EvalPlanQual(), resultRelations should be
+ * a list of RT indexes of potential target relations for EvalPlanQual(),
+ * and we will arrange that the other listed relations don't return any
+ * tuple during an EvalPlanQual() call.  Otherwise resultRelations
+ * should be NIL.
+ *
  * Note: subplan/auxrowmarks can be NULL/NIL if they will be set later
  * with EvalPlanQualSetPlan.
  */
 void
 EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
-                 Plan *subplan, List *auxrowmarks, int epqParam)
+                 Plan *subplan, List *auxrowmarks,
+                 int epqParam, List *resultRelations)
 {
     Index        rtsize = parentestate->es_range_table_size;

     /* initialize data not changing over EPQState's lifetime */
     epqstate->parentestate = parentestate;
     epqstate->epqParam = epqParam;
+    epqstate->resultRelations = resultRelations;

     /*
      * Allocate space to reference a slot for each potential rti - do so now
@@ -2566,6 +2583,7 @@ EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
     epqstate->recheckplanstate = NULL;
     epqstate->relsubs_rowmark = NULL;
     epqstate->relsubs_done = NULL;
+    epqstate->relsubs_blocked = NULL;
 }

 /*
@@ -2763,7 +2781,13 @@ EvalPlanQualBegin(EPQState *epqstate)
         Index        rtsize = parentestate->es_range_table_size;
         PlanState  *rcplanstate = epqstate->recheckplanstate;

-        MemSet(epqstate->relsubs_done, 0, rtsize * sizeof(bool));
+        /*
+         * Reset the relsubs_done[] flags to equal relsubs_blocked[], so that
+         * the EPQ run will never attempt to fetch tuples from blocked target
+         * relations.
+         */
+        memcpy(epqstate->relsubs_done, epqstate->relsubs_blocked,
+               rtsize * sizeof(bool));

         /* Recopy current values of parent parameters */
         if (parentestate->es_plannedstmt->paramExecTypes != NIL)
@@ -2931,10 +2955,22 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
     }

     /*
-     * Initialize per-relation EPQ tuple states to not-fetched.
+     * Initialize per-relation EPQ tuple states.  Result relations, if any,
+     * get marked as blocked; others as not-fetched.
      */
-    epqstate->relsubs_done = (bool *)
-        palloc0(rtsize * sizeof(bool));
+    epqstate->relsubs_done = palloc_array(bool, rtsize);
+    epqstate->relsubs_blocked = palloc0_array(bool, rtsize);
+
+    foreach(l, epqstate->resultRelations)
+    {
+        int            rtindex = lfirst_int(l);
+
+        Assert(rtindex > 0 && rtindex <= rtsize);
+        epqstate->relsubs_blocked[rtindex - 1] = true;
+    }
+
+    memcpy(epqstate->relsubs_done, epqstate->relsubs_blocked,
+           rtsize * sizeof(bool));

     /*
      * Initialize the private state information for all the nodes in the part
@@ -3010,4 +3046,5 @@ EvalPlanQualEnd(EPQState *epqstate)
     epqstate->recheckplanstate = NULL;
     epqstate->relsubs_rowmark = NULL;
     epqstate->relsubs_done = NULL;
+    epqstate->relsubs_blocked = NULL;
 }
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index cf1871b0f5..a47c8f5f71 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -69,13 +69,12 @@ ExecScanFetch(ScanState *node,
         else if (epqstate->relsubs_done[scanrelid - 1])
         {
             /*
-             * Return empty slot, as we already performed an EPQ substitution
-             * for this relation.
+             * Return empty slot, as either there is no EPQ tuple for this rel
+             * or we already returned it.
              */

             TupleTableSlot *slot = node->ss_ScanTupleSlot;

-            /* Return empty slot, as we already returned a tuple */
             return ExecClearTuple(slot);
         }
         else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
@@ -88,7 +87,7 @@ ExecScanFetch(ScanState *node,

             Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);

-            /* Mark to remember that we shouldn't return more */
+            /* Mark to remember that we shouldn't return it again */
             epqstate->relsubs_done[scanrelid - 1] = true;

             /* Return empty slot if we haven't got a test tuple */
@@ -306,14 +305,18 @@ ExecScanReScan(ScanState *node)
      */
     ExecClearTuple(node->ss_ScanTupleSlot);

-    /* Rescan EvalPlanQual tuple if we're inside an EvalPlanQual recheck */
+    /*
+     * Rescan EvalPlanQual tuple(s) if we're inside an EvalPlanQual recheck.
+     * But don't lose the "blocked" status of blocked target relations.
+     */
     if (estate->es_epq_active != NULL)
     {
         EPQState   *epqstate = estate->es_epq_active;
         Index        scanrelid = ((Scan *) node->ps.plan)->scanrelid;

         if (scanrelid > 0)
-            epqstate->relsubs_done[scanrelid - 1] = false;
+            epqstate->relsubs_done[scanrelid - 1] =
+                epqstate->relsubs_blocked[scanrelid - 1];
         else
         {
             Bitmapset  *relids;
@@ -335,7 +338,8 @@ ExecScanReScan(ScanState *node)
             while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
             {
                 Assert(rtindex > 0);
-                epqstate->relsubs_done[rtindex - 1] = false;
+                epqstate->relsubs_done[rtindex - 1] =
+                    epqstate->relsubs_blocked[rtindex - 1];
             }
         }
     }
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 407414fc0c..e459971d32 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -108,7 +108,6 @@ lnext:
                 /* this child is inactive right now */
                 erm->ermActive = false;
                 ItemPointerSetInvalid(&(erm->curCtid));
-                ExecClearTuple(markSlot);
                 continue;
             }
         }
@@ -370,7 +369,7 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)

     /* Now we have the info needed to set up EPQ state */
     EvalPlanQualInit(&lrstate->lr_epqstate, estate,
-                     outerPlan, epq_arowmarks, node->epqParam);
+                     outerPlan, epq_arowmarks, node->epqParam, NIL);

     return lrstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index dc1a2ec551..7f5002527f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3985,7 +3985,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
     }

     /* set up epqstate with dummy subplan data for the moment */
-    EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
+    EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
+                     node->epqParam, node->resultRelations);
     mtstate->fireBSTriggers = true;

     /*
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 879309b316..4b67098814 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2674,7 +2674,7 @@ apply_handle_update_internal(ApplyExecutionData *edata,
     bool        found;
     MemoryContext oldctx;

-    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
+    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
     ExecOpenIndices(relinfo, false);

     found = FindReplTupleInLocalRel(estate, localrel,
@@ -2827,7 +2827,7 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
     TupleTableSlot *localslot;
     bool        found;

-    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
+    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
     ExecOpenIndices(relinfo, false);

     found = FindReplTupleInLocalRel(estate, localrel, remoterel, localindexoid,
@@ -3054,7 +3054,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
                      */
                     EPQState    epqstate;

-                    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
+                    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
                     ExecOpenIndices(partrelinfo, false);

                     EvalPlanQualSetSlot(&epqstate, remoteslot_part);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f9e6bf3d4a..ac02247947 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -233,7 +233,8 @@ extern ExecAuxRowMark *ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist);
 extern TupleTableSlot *EvalPlanQual(EPQState *epqstate, Relation relation,
                                     Index rti, TupleTableSlot *inputslot);
 extern void EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
-                             Plan *subplan, List *auxrowmarks, int epqParam);
+                             Plan *subplan, List *auxrowmarks,
+                             int epqParam, List *resultRelations);
 extern void EvalPlanQualSetPlan(EPQState *epqstate,
                                 Plan *subplan, List *auxrowmarks);
 extern TupleTableSlot *EvalPlanQualSlot(EPQState *epqstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61b3517906..68ddcd6794 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1169,15 +1169,16 @@ typedef struct PlanState
  */
 typedef struct EPQState
 {
-    /* Initialized at EvalPlanQualInit() time: */
-
+    /* These are initialized by EvalPlanQualInit() and do not change later: */
     EState       *parentestate;    /* main query's EState */
     int            epqParam;        /* ID of Param to force scan node re-eval */
+    List       *resultRelations;    /* integer list of RT indexes, or NIL */

     /*
-     * Tuples to be substituted by scan nodes. They need to set up, before
-     * calling EvalPlanQual()/EvalPlanQualNext(), into the slot returned by
-     * EvalPlanQualSlot(scanrelid). The array is indexed by scanrelid - 1.
+     * relsubs_slot[scanrelid - 1] holds the EPQ test tuple to be returned by
+     * the scan node for the scanrelid'th RT index, in place of performing an
+     * actual table scan.  Callers should use EvalPlanQualSlot() to fetch
+     * these slots.
      */
     List       *tuple_table;    /* tuple table for relsubs_slot */
     TupleTableSlot **relsubs_slot;
@@ -1211,11 +1212,17 @@ typedef struct EPQState
     ExecAuxRowMark **relsubs_rowmark;

     /*
-     * True if a relation's EPQ tuple has been fetched for relation, indexed
-     * by scanrelid - 1.
+     * relsubs_done[scanrelid - 1] is true if there is no EPQ tuple for this
+     * target relation or it has already been fetched in the current EPQ run.
      */
     bool       *relsubs_done;

+    /*
+     * relsubs_blocked[scanrelid - 1] is true if there is no EPQ tuple for
+     * this target relation.
+     */
+    bool       *relsubs_blocked;
+
     PlanState  *recheckplanstate;    /* EPQ specific exec nodes, for ->plan */
 } EPQState;

diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index 6af262ec5d..f2a50f46a3 100644
--- a/src/test/isolation/expected/eval-plan-qual.out
+++ b/src/test/isolation/expected/eval-plan-qual.out
@@ -842,6 +842,90 @@ step c1: COMMIT;
 step writep3b: <... completed>
 step c2: COMMIT;

+starting permutation: writep4a writep4b c1 c2 readp
+step writep4a: UPDATE p SET c = 4 WHERE c = 0;
+step writep4b: UPDATE p SET b = -4 WHERE c = 0; <waiting ...>
+step c1: COMMIT;
+step writep4b: <... completed>
+step c2: COMMIT;
+step readp: SELECT tableoid::regclass, ctid, * FROM p;
+tableoid|ctid  |a|b|c
+--------+------+-+-+-
+c1      |(0,2) |0|0|1
+c1      |(0,3) |0|0|2
+c1      |(0,5) |0|1|1
+c1      |(0,6) |0|1|2
+c1      |(0,8) |0|2|1
+c1      |(0,9) |0|2|2
+c1      |(0,11)|0|0|4
+c1      |(0,12)|0|1|4
+c1      |(0,13)|0|2|4
+c1      |(0,14)|0|3|4
+c2      |(0,2) |1|0|1
+c2      |(0,3) |1|0|2
+c2      |(0,5) |1|1|1
+c2      |(0,6) |1|1|2
+c2      |(0,8) |1|2|1
+c2      |(0,9) |1|2|2
+c2      |(0,11)|1|0|4
+c2      |(0,12)|1|1|4
+c2      |(0,13)|1|2|4
+c2      |(0,14)|1|3|4
+c3      |(0,2) |2|0|1
+c3      |(0,3) |2|0|2
+c3      |(0,5) |2|1|1
+c3      |(0,6) |2|1|2
+c3      |(0,8) |2|2|1
+c3      |(0,9) |2|2|2
+c3      |(0,11)|2|0|4
+c3      |(0,12)|2|1|4
+c3      |(0,13)|2|2|4
+c3      |(0,14)|2|3|4
+(30 rows)
+
+
+starting permutation: writep4a deletep4 c1 c2 readp
+step writep4a: UPDATE p SET c = 4 WHERE c = 0;
+step deletep4: DELETE FROM p WHERE c = 0; <waiting ...>
+step c1: COMMIT;
+step deletep4: <... completed>
+step c2: COMMIT;
+step readp: SELECT tableoid::regclass, ctid, * FROM p;
+tableoid|ctid  |a|b|c
+--------+------+-+-+-
+c1      |(0,2) |0|0|1
+c1      |(0,3) |0|0|2
+c1      |(0,5) |0|1|1
+c1      |(0,6) |0|1|2
+c1      |(0,8) |0|2|1
+c1      |(0,9) |0|2|2
+c1      |(0,11)|0|0|4
+c1      |(0,12)|0|1|4
+c1      |(0,13)|0|2|4
+c1      |(0,14)|0|3|4
+c2      |(0,2) |1|0|1
+c2      |(0,3) |1|0|2
+c2      |(0,5) |1|1|1
+c2      |(0,6) |1|1|2
+c2      |(0,8) |1|2|1
+c2      |(0,9) |1|2|2
+c2      |(0,11)|1|0|4
+c2      |(0,12)|1|1|4
+c2      |(0,13)|1|2|4
+c2      |(0,14)|1|3|4
+c3      |(0,2) |2|0|1
+c3      |(0,3) |2|0|2
+c3      |(0,5) |2|1|1
+c3      |(0,6) |2|1|2
+c3      |(0,8) |2|2|1
+c3      |(0,9) |2|2|2
+c3      |(0,11)|2|0|4
+c3      |(0,12)|2|1|4
+c3      |(0,13)|2|2|4
+c3      |(0,14)|2|3|4
+(30 rows)
+
+
 starting permutation: wx2 partiallock c2 c1 read
 step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
 balance
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 768f7098b9..cb56578208 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -102,11 +102,15 @@ step upsert1    {
 # when the first updated tuple was in a non-first child table.
 # writep2/returningp1 tests a memory allocation issue
 # writep3a/writep3b tests updates touching more than one table
+# writep4a/writep4b tests a case where matches in another table confused EPQ
+# writep4a/deletep4 tests the same case in the DELETE path

+step readp        { SELECT tableoid::regclass, ctid, * FROM p; }
 step readp1        { SELECT tableoid::regclass, ctid, * FROM p WHERE b IN (0, 1) AND c = 0 FOR UPDATE; }
 step writep1    { UPDATE p SET b = -1 WHERE a = 1 AND b = 1 AND c = 0; }
 step writep2    { UPDATE p SET b = -b WHERE a = 1 AND c = 0; }
 step writep3a    { UPDATE p SET b = -b WHERE c = 0; }
+step writep4a    { UPDATE p SET c = 4 WHERE c = 0; }
 step c1        { COMMIT; }
 step r1        { ROLLBACK; }

@@ -210,6 +214,8 @@ step returningp1 {
       SELECT * FROM u;
 }
 step writep3b    { UPDATE p SET b = -b WHERE c = 0; }
+step writep4b    { UPDATE p SET b = -4 WHERE c = 0; }
+step deletep4    { DELETE FROM p WHERE c = 0; }
 step readforss    {
     SELECT ta.id AS ta_id, ta.value AS ta_value,
         (SELECT ROW(tb.id, tb.value)
@@ -347,6 +353,8 @@ permutation upsert1 upsert2 c1 c2 read
 permutation readp1 writep1 readp2 c1 c2
 permutation writep2 returningp1 c1 c2
 permutation writep3a writep3b c1 c2
+permutation writep4a writep4b c1 c2 readp
+permutation writep4a deletep4 c1 c2 readp
 permutation wx2 partiallock c2 c1 read
 permutation wx2 lockwithvalues c2 c1 read
 permutation wx2_ext partiallock_ext c2 c1 read_ext

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Adding SHOW CREATE TABLE
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Missing update of all_hasnulls in BRIN opclasses