Re: The documentation for READ COMMITTED may be incomplete or wrong

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: The documentation for READ COMMITTED may be incomplete or wrong
Дата
Msg-id 4024583.1684448782@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: The documentation for READ COMMITTED may be incomplete or wrong  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: The documentation for READ COMMITTED may be incomplete or wrong  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Nathan Bossart <nathandbossart@gmail.com> writes:
> On Thu, May 18, 2023 at 04:03:36PM -0400, Tom Lane wrote:
>> What we need to do, I think, is set epqstate->relsubs_done[] for
>> all target relations except the one we are stuffing a tuple into.

> This seems generally reasonable to me.

Here's a draft patch for this.  I think it's OK for HEAD, but we are
going to need some different details for the back branches, because
adding a field to struct EPQState would create an ABI break.  Our
standard trick of shoving the field to the end in the back branches
won't help, because struct EPQState is actually embedded in
struct ModifyTableState, meaning that all the subsequent fields
therein will move.  Maybe we can get away with that, but I bet not.

I think what the back branches will have to do is reinitialize the
relsubs_done array every time through EvalPlanQual(), which is a bit sad
but probably doesn't amount to anything compared to the startup overhead
of the sub-executor.

Debian Code Search doesn't know of any outside code touching
relsubs_done, so I think we are safe in dropping that code in
ExecScanReScan.  It seems quite pointless anyway considering
that up to now, EvalPlanQualBegin has always zeroed the whole
array.

            regards, tom lane

diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0186be452c..c518daa7d0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2469,7 +2469,7 @@ ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist)
  *    relation - table containing tuple
  *    rti - rangetable index of table containing tuple
  *    inputslot - tuple for processing - this can be the slot from
- *        EvalPlanQualSlot(), for the increased efficiency.
+ *        EvalPlanQualSlot() for this rel, for increased efficiency.
  *
  * This tests whether the tuple in inputslot still matches the relevant
  * quals. For that result to be useful, typically the input tuple has to be
@@ -2503,6 +2503,13 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
     if (testslot != inputslot)
         ExecCopySlot(testslot, inputslot);

+    /*
+     * Mark that an EPQ tuple is available for this relation.  (If there is
+     * more than one result relation, the others remain marked as having no
+     * tuple available.)
+     */
+    epqstate->relsubs_done[rti - 1] = false;
+
     /*
      * Run the EPQ query.  We assume it will return at most one tuple.
      */
@@ -2519,11 +2526,12 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
         ExecMaterializeSlot(slot);

     /*
-     * Clear out the test tuple.  This is needed in case the EPQ query is
-     * re-used to test a tuple for a different relation.  (Not clear that can
-     * really happen, but let's be safe.)
+     * Clear out the test tuple, and mark that no tuple is available here.
+     * This is needed in case the EPQ state is re-used to test a tuple for a
+     * different target relation.
      */
     ExecClearTuple(testslot);
+    epqstate->relsubs_done[rti - 1] = true;

     return slot;
 }
@@ -2532,18 +2540,26 @@ EvalPlanQual(EPQState *epqstate, Relation relation,
  * EvalPlanQualInit -- initialize during creation of a plan state node
  * that might need to invoke EPQ processing.
  *
+ * If the caller intends to use EvalPlanQual(), resultRelations should be
+ * a list of RT indexes of potential target relations for EvalPlanQual(),
+ * and we will arrange that the other listed relations don't return any
+ * tuple during an EvalPlanQual() call.  Otherwise resultRelations
+ * should be NIL.
+ *
  * Note: subplan/auxrowmarks can be NULL/NIL if they will be set later
  * with EvalPlanQualSetPlan.
  */
 void
 EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
-                 Plan *subplan, List *auxrowmarks, int epqParam)
+                 Plan *subplan, List *auxrowmarks,
+                 int epqParam, List *resultRelations)
 {
     Index        rtsize = parentestate->es_range_table_size;

     /* initialize data not changing over EPQState's lifetime */
     epqstate->parentestate = parentestate;
     epqstate->epqParam = epqParam;
+    epqstate->resultRelations = resultRelations;

     /*
      * Allocate space to reference a slot for each potential rti - do so now
@@ -2760,10 +2776,20 @@ EvalPlanQualBegin(EPQState *epqstate)
         /*
          * We already have a suitable child EPQ tree, so just reset it.
          */
-        Index        rtsize = parentestate->es_range_table_size;
         PlanState  *rcplanstate = epqstate->recheckplanstate;

-        MemSet(epqstate->relsubs_done, 0, rtsize * sizeof(bool));
+        /*
+         * We reset the relsubs_done[] flags only when the caller of
+         * EvalPlanQualInit provided no result relations.  Otherwise, we keep
+         * the current state (which should have all result relations marked as
+         * "done", and others not).
+         */
+        if (epqstate->resultRelations == NIL)
+        {
+            Index        rtsize = parentestate->es_range_table_size;
+
+            memset(epqstate->relsubs_done, 0, rtsize * sizeof(bool));
+        }

         /* Recopy current values of parent parameters */
         if (parentestate->es_plannedstmt->paramExecTypes != NIL)
@@ -2931,11 +2957,20 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
     }

     /*
-     * Initialize per-relation EPQ tuple states to not-fetched.
+     * Initialize per-relation EPQ tuple states.  Result relations, if any,
+     * get marked as already-fetched; others as not-fetched.
      */
     epqstate->relsubs_done = (bool *)
         palloc0(rtsize * sizeof(bool));

+    foreach(l, epqstate->resultRelations)
+    {
+        int            rtindex = lfirst_int(l);
+
+        Assert(rtindex > 0 && rtindex <= rtsize);
+        epqstate->relsubs_done[rtindex - 1] = true;
+    }
+
     /*
      * Initialize the private state information for all the nodes in the part
      * of the plan tree we need to run.  This opens files, allocates storage
diff --git a/src/backend/executor/execScan.c b/src/backend/executor/execScan.c
index cf1871b0f5..ea8c5ab149 100644
--- a/src/backend/executor/execScan.c
+++ b/src/backend/executor/execScan.c
@@ -69,13 +69,12 @@ ExecScanFetch(ScanState *node,
         else if (epqstate->relsubs_done[scanrelid - 1])
         {
             /*
-             * Return empty slot, as we already performed an EPQ substitution
-             * for this relation.
+             * Return empty slot, as either there is no EPQ tuple for this rel
+             * or we already returned it.
              */

             TupleTableSlot *slot = node->ss_ScanTupleSlot;

-            /* Return empty slot, as we already returned a tuple */
             return ExecClearTuple(slot);
         }
         else if (epqstate->relsubs_slot[scanrelid - 1] != NULL)
@@ -88,7 +87,7 @@ ExecScanFetch(ScanState *node,

             Assert(epqstate->relsubs_rowmark[scanrelid - 1] == NULL);

-            /* Mark to remember that we shouldn't return more */
+            /* Mark to remember that we shouldn't return it again */
             epqstate->relsubs_done[scanrelid - 1] = true;

             /* Return empty slot if we haven't got a test tuple */
@@ -298,45 +297,9 @@ ExecAssignScanProjectionInfoWithVarno(ScanState *node, int varno)
 void
 ExecScanReScan(ScanState *node)
 {
-    EState       *estate = node->ps.state;
-
     /*
      * We must clear the scan tuple so that observers (e.g., execCurrent.c)
      * can tell that this plan node is not positioned on a tuple.
      */
     ExecClearTuple(node->ss_ScanTupleSlot);
-
-    /* Rescan EvalPlanQual tuple if we're inside an EvalPlanQual recheck */
-    if (estate->es_epq_active != NULL)
-    {
-        EPQState   *epqstate = estate->es_epq_active;
-        Index        scanrelid = ((Scan *) node->ps.plan)->scanrelid;
-
-        if (scanrelid > 0)
-            epqstate->relsubs_done[scanrelid - 1] = false;
-        else
-        {
-            Bitmapset  *relids;
-            int            rtindex = -1;
-
-            /*
-             * If an FDW or custom scan provider has replaced the join with a
-             * scan, there are multiple RTIs; reset the epqScanDone flag for
-             * all of them.
-             */
-            if (IsA(node->ps.plan, ForeignScan))
-                relids = ((ForeignScan *) node->ps.plan)->fs_base_relids;
-            else if (IsA(node->ps.plan, CustomScan))
-                relids = ((CustomScan *) node->ps.plan)->custom_relids;
-            else
-                elog(ERROR, "unexpected scan node: %d",
-                     (int) nodeTag(node->ps.plan));
-
-            while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
-            {
-                Assert(rtindex > 0);
-                epqstate->relsubs_done[rtindex - 1] = false;
-            }
-        }
-    }
 }
diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c
index 407414fc0c..e459971d32 100644
--- a/src/backend/executor/nodeLockRows.c
+++ b/src/backend/executor/nodeLockRows.c
@@ -108,7 +108,6 @@ lnext:
                 /* this child is inactive right now */
                 erm->ermActive = false;
                 ItemPointerSetInvalid(&(erm->curCtid));
-                ExecClearTuple(markSlot);
                 continue;
             }
         }
@@ -370,7 +369,7 @@ ExecInitLockRows(LockRows *node, EState *estate, int eflags)

     /* Now we have the info needed to set up EPQ state */
     EvalPlanQualInit(&lrstate->lr_epqstate, estate,
-                     outerPlan, epq_arowmarks, node->epqParam);
+                     outerPlan, epq_arowmarks, node->epqParam, NIL);

     return lrstate;
 }
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index dc1a2ec551..7f5002527f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3985,7 +3985,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
     }

     /* set up epqstate with dummy subplan data for the moment */
-    EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL, node->epqParam);
+    EvalPlanQualInit(&mtstate->mt_epqstate, estate, NULL, NIL,
+                     node->epqParam, node->resultRelations);
     mtstate->fireBSTriggers = true;

     /*
diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logical/worker.c
index 879309b316..4b67098814 100644
--- a/src/backend/replication/logical/worker.c
+++ b/src/backend/replication/logical/worker.c
@@ -2674,7 +2674,7 @@ apply_handle_update_internal(ApplyExecutionData *edata,
     bool        found;
     MemoryContext oldctx;

-    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
+    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
     ExecOpenIndices(relinfo, false);

     found = FindReplTupleInLocalRel(estate, localrel,
@@ -2827,7 +2827,7 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
     TupleTableSlot *localslot;
     bool        found;

-    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
+    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
     ExecOpenIndices(relinfo, false);

     found = FindReplTupleInLocalRel(estate, localrel, remoterel, localindexoid,
@@ -3054,7 +3054,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
                      */
                     EPQState    epqstate;

-                    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
+                    EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1, NIL);
                     ExecOpenIndices(partrelinfo, false);

                     EvalPlanQualSetSlot(&epqstate, remoteslot_part);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index f9e6bf3d4a..ac02247947 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -233,7 +233,8 @@ extern ExecAuxRowMark *ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist);
 extern TupleTableSlot *EvalPlanQual(EPQState *epqstate, Relation relation,
                                     Index rti, TupleTableSlot *inputslot);
 extern void EvalPlanQualInit(EPQState *epqstate, EState *parentestate,
-                             Plan *subplan, List *auxrowmarks, int epqParam);
+                             Plan *subplan, List *auxrowmarks,
+                             int epqParam, List *resultRelations);
 extern void EvalPlanQualSetPlan(EPQState *epqstate,
                                 Plan *subplan, List *auxrowmarks);
 extern TupleTableSlot *EvalPlanQualSlot(EPQState *epqstate,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 61b3517906..155186ac1f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1169,15 +1169,16 @@ typedef struct PlanState
  */
 typedef struct EPQState
 {
-    /* Initialized at EvalPlanQualInit() time: */
-
+    /* These are initialized by EvalPlanQualInit() and do not change later: */
     EState       *parentestate;    /* main query's EState */
     int            epqParam;        /* ID of Param to force scan node re-eval */
+    List       *resultRelations;    /* integer list of RT indexes, or NIL */

     /*
-     * Tuples to be substituted by scan nodes. They need to set up, before
-     * calling EvalPlanQual()/EvalPlanQualNext(), into the slot returned by
-     * EvalPlanQualSlot(scanrelid). The array is indexed by scanrelid - 1.
+     * relsubs_slot[scanrelid - 1] holds the EPQ test tuple to be returned by
+     * the scan node for the scanrelid'th RT index, in place of performing an
+     * actual table scan.  Callers should use EvalPlanQualSlot() to fetch
+     * these slots.
      */
     List       *tuple_table;    /* tuple table for relsubs_slot */
     TupleTableSlot **relsubs_slot;
@@ -1211,8 +1212,8 @@ typedef struct EPQState
     ExecAuxRowMark **relsubs_rowmark;

     /*
-     * True if a relation's EPQ tuple has been fetched for relation, indexed
-     * by scanrelid - 1.
+     * relsubs_done[scanrelid - 1] is true if there is no EPQ tuple for this
+     * target relation or it has already been fetched in the current EPQ run.
      */
     bool       *relsubs_done;

diff --git a/src/test/isolation/expected/eval-plan-qual.out b/src/test/isolation/expected/eval-plan-qual.out
index 6af262ec5d..f2a50f46a3 100644
--- a/src/test/isolation/expected/eval-plan-qual.out
+++ b/src/test/isolation/expected/eval-plan-qual.out
@@ -842,6 +842,90 @@ step c1: COMMIT;
 step writep3b: <... completed>
 step c2: COMMIT;

+starting permutation: writep4a writep4b c1 c2 readp
+step writep4a: UPDATE p SET c = 4 WHERE c = 0;
+step writep4b: UPDATE p SET b = -4 WHERE c = 0; <waiting ...>
+step c1: COMMIT;
+step writep4b: <... completed>
+step c2: COMMIT;
+step readp: SELECT tableoid::regclass, ctid, * FROM p;
+tableoid|ctid  |a|b|c
+--------+------+-+-+-
+c1      |(0,2) |0|0|1
+c1      |(0,3) |0|0|2
+c1      |(0,5) |0|1|1
+c1      |(0,6) |0|1|2
+c1      |(0,8) |0|2|1
+c1      |(0,9) |0|2|2
+c1      |(0,11)|0|0|4
+c1      |(0,12)|0|1|4
+c1      |(0,13)|0|2|4
+c1      |(0,14)|0|3|4
+c2      |(0,2) |1|0|1
+c2      |(0,3) |1|0|2
+c2      |(0,5) |1|1|1
+c2      |(0,6) |1|1|2
+c2      |(0,8) |1|2|1
+c2      |(0,9) |1|2|2
+c2      |(0,11)|1|0|4
+c2      |(0,12)|1|1|4
+c2      |(0,13)|1|2|4
+c2      |(0,14)|1|3|4
+c3      |(0,2) |2|0|1
+c3      |(0,3) |2|0|2
+c3      |(0,5) |2|1|1
+c3      |(0,6) |2|1|2
+c3      |(0,8) |2|2|1
+c3      |(0,9) |2|2|2
+c3      |(0,11)|2|0|4
+c3      |(0,12)|2|1|4
+c3      |(0,13)|2|2|4
+c3      |(0,14)|2|3|4
+(30 rows)
+
+
+starting permutation: writep4a deletep4 c1 c2 readp
+step writep4a: UPDATE p SET c = 4 WHERE c = 0;
+step deletep4: DELETE FROM p WHERE c = 0; <waiting ...>
+step c1: COMMIT;
+step deletep4: <... completed>
+step c2: COMMIT;
+step readp: SELECT tableoid::regclass, ctid, * FROM p;
+tableoid|ctid  |a|b|c
+--------+------+-+-+-
+c1      |(0,2) |0|0|1
+c1      |(0,3) |0|0|2
+c1      |(0,5) |0|1|1
+c1      |(0,6) |0|1|2
+c1      |(0,8) |0|2|1
+c1      |(0,9) |0|2|2
+c1      |(0,11)|0|0|4
+c1      |(0,12)|0|1|4
+c1      |(0,13)|0|2|4
+c1      |(0,14)|0|3|4
+c2      |(0,2) |1|0|1
+c2      |(0,3) |1|0|2
+c2      |(0,5) |1|1|1
+c2      |(0,6) |1|1|2
+c2      |(0,8) |1|2|1
+c2      |(0,9) |1|2|2
+c2      |(0,11)|1|0|4
+c2      |(0,12)|1|1|4
+c2      |(0,13)|1|2|4
+c2      |(0,14)|1|3|4
+c3      |(0,2) |2|0|1
+c3      |(0,3) |2|0|2
+c3      |(0,5) |2|1|1
+c3      |(0,6) |2|1|2
+c3      |(0,8) |2|2|1
+c3      |(0,9) |2|2|2
+c3      |(0,11)|2|0|4
+c3      |(0,12)|2|1|4
+c3      |(0,13)|2|2|4
+c3      |(0,14)|2|3|4
+(30 rows)
+
+
 starting permutation: wx2 partiallock c2 c1 read
 step wx2: UPDATE accounts SET balance = balance + 450 WHERE accountid = 'checking' RETURNING balance;
 balance
diff --git a/src/test/isolation/specs/eval-plan-qual.spec b/src/test/isolation/specs/eval-plan-qual.spec
index 768f7098b9..cb56578208 100644
--- a/src/test/isolation/specs/eval-plan-qual.spec
+++ b/src/test/isolation/specs/eval-plan-qual.spec
@@ -102,11 +102,15 @@ step upsert1    {
 # when the first updated tuple was in a non-first child table.
 # writep2/returningp1 tests a memory allocation issue
 # writep3a/writep3b tests updates touching more than one table
+# writep4a/writep4b tests a case where matches in another table confused EPQ
+# writep4a/deletep4 tests the same case in the DELETE path

+step readp        { SELECT tableoid::regclass, ctid, * FROM p; }
 step readp1        { SELECT tableoid::regclass, ctid, * FROM p WHERE b IN (0, 1) AND c = 0 FOR UPDATE; }
 step writep1    { UPDATE p SET b = -1 WHERE a = 1 AND b = 1 AND c = 0; }
 step writep2    { UPDATE p SET b = -b WHERE a = 1 AND c = 0; }
 step writep3a    { UPDATE p SET b = -b WHERE c = 0; }
+step writep4a    { UPDATE p SET c = 4 WHERE c = 0; }
 step c1        { COMMIT; }
 step r1        { ROLLBACK; }

@@ -210,6 +214,8 @@ step returningp1 {
       SELECT * FROM u;
 }
 step writep3b    { UPDATE p SET b = -b WHERE c = 0; }
+step writep4b    { UPDATE p SET b = -4 WHERE c = 0; }
+step deletep4    { DELETE FROM p WHERE c = 0; }
 step readforss    {
     SELECT ta.id AS ta_id, ta.value AS ta_value,
         (SELECT ROW(tb.id, tb.value)
@@ -347,6 +353,8 @@ permutation upsert1 upsert2 c1 c2 read
 permutation readp1 writep1 readp2 c1 c2
 permutation writep2 returningp1 c1 c2
 permutation writep3a writep3b c1 c2
+permutation writep4a writep4b c1 c2 readp
+permutation writep4a deletep4 c1 c2 readp
 permutation wx2 partiallock c2 c1 read
 permutation wx2 lockwithvalues c2 c1 read
 permutation wx2_ext partiallock_ext c2 c1 read_ext

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: PG 16 draft release notes ready
Следующее
От: Melanie Plageman
Дата:
Сообщение: Re: Memory leak from ExecutorState context?