Re: [BUGS] BUG #14808: V10-beta4, backend abort

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [BUGS] BUG #14808: V10-beta4, backend abort
Дата
Msg-id 630.1505425555@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [BUGS] BUG #14808: V10-beta4, backend abort  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [BUGS] BUG #14808: V10-beta4, backend abort  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-bugs
Attached is a draft patch for this.

Thomas Munro <thomas.munro@enterprisedb.com> writes:
> 1.  Merging the transition tables when there are multiple wCTEs
> referencing the same table.  Here's one idea:  Rename
> MakeTransitionCaptureState() to GetTransitionCaptureState() and use a
> hash table keyed by table OID in
> afterTriggers.transition_capture_states[afterTriggers.query_depth] to
> find the TCS for the given TriggerDesc or create it if not found, so
> that all wCTEs find the same TransitionCaptureState object.  The all
> existing callers continue to do what they're doing now, but they'll be
> sharing TCSs appropriately with other things in the plan.  Note that
> TransitionCaptureState already holds tuplestores for each operation
> (INSERT, UPDATE, DELETE) so the OID of the table alone is a suitable
> key for the hash table (assuming we are ignoring the column-list part
> of the spec as you suggested).

It seems unsafe to merge the TCS objects themselves, because the callers
assume that they can munge the tcs_map and tcs_original_insert_tuple
fields freely without regard for any other callers.  So as I have it,
we still have a TCS for each caller, but the TCSes point at tuplestores
that can be shared across multiple callers for the same event type.
The tuplestores themselves are managed by the AfterTrigger data
structures.  Also, because the TCS structs are just throwaway per-caller
data, it's uncool to reference them in the trigger event lists.
So I replaced ats_transition_capture with two pointers to the actual
tuplestores.  That bloats AfterTriggerSharedData a bit but I think it's
okay; we don't expect a lot of those structs in a normal query.

I chose to make the persistent state (AfterTriggersTableData) independent
for each operation type.  We could have done that differently perhaps, but
it seemed more complicated and less likely to match the spec's semantics.

The INSERT ON CONFLICT UPDATE mess is handled by creating two separate
TCSes with two different underlying AfterTriggersTableData structs.
The insertion tuplestore sees only the inserted tuples, the update
tuplestores see only the updated-pre-existing tuples.  That adds a little
code to nodeModifyTable but it seems conceptually much cleaner.

> 2.  Hiding the fact that we implement fk CASCADE using another level
> of queries.   Perhaps we could arrange for
> afterTriggers.transition_capture_states[afterTriggers.query_depth] to
> point to the same hash table as query_depth - 1, so that the effects
> of statements at this implementation-internal level appear to the user
> as part of the the level below?

That already happens, because query_depth doesn't increment for an FK
enforcement query --- we never call AfterTriggerBegin/EndQuery for it.

> 3.  Merging the invocation after statement firing so that if you
> updated the same table directly and also via a wCTE and also
> indirectly via fk ON DELETE/UPDATE trigger then you still only get one
> invocation of the after statement trigger.  Not sure exactly how...

What I did here was to use the AfterTriggersTableData structs to hold
a flag saying we'd already queued statement triggers for this rel and
cmdType.  There's probably more than one way to do that, but this seemed
convenient.

One thing I don't like too much about that is that it means there are
cases where the single statement trigger firing would occur before some
AFTER ROW trigger firings.  Not sure if we promise anything about the
ordering in the docs.  It looks quite expensive/complicated to try to
make it always happen afterwards, though, and it might well be totally
impossible if triggers cause more table updates to occur.

Because MakeTransitionCaptureState now depends on the trigger query
level being active, I had to relocate the AfterTriggerBeginQuery calls
to occur before it.

In passing, I refactored the AfterTriggers data structures a bit so
that we don't need to do as many palloc calls to manage them.  Instead
of several independent arrays there's now one array of structs.

            regards, tom lane

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfa3f05..c6fa445 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** CopyFrom(CopyState cstate)
*** 2432,2443 ****
      /* Triggers might need a slot as well */
      estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);

      /*
       * If there are any triggers with transition tables on the named relation,
       * we need to be prepared to capture transition tuples.
       */
      cstate->transition_capture =
!         MakeTransitionCaptureState(cstate->rel->trigdesc);

      /*
       * If the named relation is a partitioned table, initialize state for
--- 2432,2448 ----
      /* Triggers might need a slot as well */
      estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);

+     /* Prepare to catch AFTER triggers. */
+     AfterTriggerBeginQuery();
+
      /*
       * If there are any triggers with transition tables on the named relation,
       * we need to be prepared to capture transition tuples.
       */
      cstate->transition_capture =
!         MakeTransitionCaptureState(cstate->rel->trigdesc,
!                                    RelationGetRelid(cstate->rel),
!                                    CMD_INSERT);

      /*
       * If the named relation is a partitioned table, initialize state for
*************** CopyFrom(CopyState cstate)
*** 2513,2521 ****
          bufferedTuples = palloc(MAX_BUFFERED_TUPLES * sizeof(HeapTuple));
      }

-     /* Prepare to catch AFTER triggers. */
-     AfterTriggerBeginQuery();
-
      /*
       * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
       * should do this for COPY, since it's not really an "INSERT" statement as
--- 2518,2523 ----
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 269c9e1..ee2ff04 100644
*** a/src/backend/commands/trigger.c
--- b/src/backend/commands/trigger.c
*************** CreateTrigger(CreateTrigStmt *stmt, cons
*** 234,239 ****
--- 234,244 ----
                              RelationGetRelationName(rel)),
                       errdetail("Foreign tables cannot have TRUNCATE triggers.")));

+         /*
+          * We disallow constraint triggers to protect the assumption that
+          * triggers on FKs can't be deferred.  See notes with AfterTriggers
+          * data structures, below.
+          */
          if (stmt->isconstraint)
              ereport(ERROR,
                      (errcode(ERRCODE_WRONG_OBJECT_TYPE),
*************** CreateTrigger(CreateTrigStmt *stmt, cons
*** 418,423 ****
--- 423,448 ----
                          (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                           errmsg("transition tables cannot be specified for triggers with more than one event")));

+             /*
+              * We currently don't allow column-specific triggers with
+              * transition tables.  Per spec, that seems to require
+              * accumulating separate transition tables for each combination of
+              * columns, which is a lot of work for a rather marginal feature.
+              */
+             if (stmt->columns != NIL)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                          errmsg("transition tables cannot be specified for triggers with column lists")));
+
+             /*
+              * We disallow constraint triggers with transition tables, to
+              * protect the assumption that such triggers can't be deferred.
+              * See notes with AfterTriggers data structures, below.
+              *
+              * Currently this is enforced by the grammar, so just Assert here.
+              */
+             Assert(!stmt->isconstraint);
+
              if (tt->isNew)
              {
                  if (!(TRIGGER_FOR_INSERT(tgtype) ||
*************** FindTriggerIncompatibleWithInheritance(T
*** 2086,2181 ****
  }

  /*
-  * Make a TransitionCaptureState object from a given TriggerDesc.  The
-  * resulting object holds the flags which control whether transition tuples
-  * are collected when tables are modified, and the tuplestores themselves.
-  * Note that we copy the flags from a parent table into this struct (rather
-  * than using each relation's TriggerDesc directly) so that we can use it to
-  * control the collection of transition tuples from child tables.
-  *
-  * If there are no triggers with transition tables configured for 'trigdesc',
-  * then return NULL.
-  *
-  * The resulting object can be passed to the ExecAR* functions.  The caller
-  * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
-  * with child tables.
-  */
- TransitionCaptureState *
- MakeTransitionCaptureState(TriggerDesc *trigdesc)
- {
-     TransitionCaptureState *state = NULL;
-
-     if (trigdesc != NULL &&
-         (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table ||
-          trigdesc->trig_update_new_table || trigdesc->trig_insert_new_table))
-     {
-         MemoryContext oldcxt;
-         ResourceOwner saveResourceOwner;
-
-         /*
-          * Normally DestroyTransitionCaptureState should be called after
-          * executing all AFTER triggers for the current statement.
-          *
-          * To handle error cleanup, TransitionCaptureState and the tuplestores
-          * it contains will live in the current [sub]transaction's memory
-          * context.  Likewise for the current resource owner, because we also
-          * want to clean up temporary files spilled to disk by the tuplestore
-          * in that scenario.  This scope is sufficient, because AFTER triggers
-          * with transition tables cannot be deferred (only constraint triggers
-          * can be deferred, and constraint triggers cannot have transition
-          * tables).  The AFTER trigger queue may contain pointers to this
-          * TransitionCaptureState, but any such entries will be processed or
-          * discarded before the end of the current [sub]transaction.
-          *
-          * If a future release allows deferred triggers with transition
-          * tables, we'll need to reconsider the scope of the
-          * TransitionCaptureState object.
-          */
-         oldcxt = MemoryContextSwitchTo(CurTransactionContext);
-         saveResourceOwner = CurrentResourceOwner;
-
-         state = (TransitionCaptureState *)
-             palloc0(sizeof(TransitionCaptureState));
-         state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
-         state->tcs_update_old_table = trigdesc->trig_update_old_table;
-         state->tcs_update_new_table = trigdesc->trig_update_new_table;
-         state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
-         PG_TRY();
-         {
-             CurrentResourceOwner = CurTransactionResourceOwner;
-             if (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table)
-                 state->tcs_old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
-             if (trigdesc->trig_insert_new_table)
-                 state->tcs_insert_tuplestore = tuplestore_begin_heap(false, false, work_mem);
-             if (trigdesc->trig_update_new_table)
-                 state->tcs_update_tuplestore = tuplestore_begin_heap(false, false, work_mem);
-         }
-         PG_CATCH();
-         {
-             CurrentResourceOwner = saveResourceOwner;
-             PG_RE_THROW();
-         }
-         PG_END_TRY();
-         CurrentResourceOwner = saveResourceOwner;
-         MemoryContextSwitchTo(oldcxt);
-     }
-
-     return state;
- }
-
- void
- DestroyTransitionCaptureState(TransitionCaptureState *tcs)
- {
-     if (tcs->tcs_insert_tuplestore != NULL)
-         tuplestore_end(tcs->tcs_insert_tuplestore);
-     if (tcs->tcs_update_tuplestore != NULL)
-         tuplestore_end(tcs->tcs_update_tuplestore);
-     if (tcs->tcs_old_tuplestore != NULL)
-         tuplestore_end(tcs->tcs_old_tuplestore);
-     pfree(tcs);
- }
-
- /*
   * Call a trigger function.
   *
   *        trigdata: trigger descriptor.
--- 2111,2116 ----
*************** TriggerEnabled(EState *estate, ResultRel
*** 3338,3346 ****
   * during the current transaction tree.  (BEFORE triggers are fired
   * immediately so we don't need any persistent state about them.)  The struct
   * and most of its subsidiary data are kept in TopTransactionContext; however
!  * the individual event records are kept in a separate sub-context.  This is
!  * done mainly so that it's easy to tell from a memory context dump how much
!  * space is being eaten by trigger events.
   *
   * Because the list of pending events can grow large, we go to some
   * considerable effort to minimize per-event memory consumption.  The event
--- 3273,3283 ----
   * during the current transaction tree.  (BEFORE triggers are fired
   * immediately so we don't need any persistent state about them.)  The struct
   * and most of its subsidiary data are kept in TopTransactionContext; however
!  * some data that can be discarded sooner appears in the CurTransactionContext
!  * of the relevant subtransaction.  Also, the individual event records are
!  * kept in a separate sub-context of TopTransactionContext.  This is done
!  * mainly so that it's easy to tell from a memory context dump how much space
!  * is being eaten by trigger events.
   *
   * Because the list of pending events can grow large, we go to some
   * considerable effort to minimize per-event memory consumption.  The event
*************** typedef SetConstraintStateData *SetConst
*** 3400,3405 ****
--- 3337,3349 ----
   * tuple(s).  This permits storing tuples once regardless of the number of
   * row-level triggers on a foreign table.
   *
+  * Note that we need triggers on foreign tables to be fired in exactly the
+  * order they were queued, so that the tuples come out of the tuplestore in
+  * the right order.  To ensure that, we forbid deferrable (constraint)
+  * triggers on foreign tables.  This also ensures that such triggers do not
+  * get deferred into outer trigger query levels, meaning that it's okay to
+  * destroy the tuplestore at the end of the query level.
+  *
   * Statement-level triggers always bear AFTER_TRIGGER_1CTID, though they
   * require no ctid field.  We lack the flag bit space to neatly represent that
   * distinct case, and it seems unlikely to be worth much trouble.
*************** typedef struct AfterTriggerSharedData
*** 3433,3439 ****
      Oid            ats_tgoid;        /* the trigger's ID */
      Oid            ats_relid;        /* the relation it's on */
      CommandId    ats_firing_id;    /* ID for firing cycle */
!     TransitionCaptureState *ats_transition_capture;
  } AfterTriggerSharedData;

  typedef struct AfterTriggerEventData *AfterTriggerEvent;
--- 3377,3384 ----
      Oid            ats_tgoid;        /* the trigger's ID */
      Oid            ats_relid;        /* the relation it's on */
      CommandId    ats_firing_id;    /* ID for firing cycle */
!     Tuplestorestate *ats_old_tuplestore;    /* possible OLD transition table */
!     Tuplestorestate *ats_new_tuplestore;    /* possible NEW transition table */
  } AfterTriggerSharedData;

  typedef struct AfterTriggerEventData *AfterTriggerEvent;
*************** typedef struct AfterTriggerEventList
*** 3529,3588 ****
   * query_depth is the current depth of nested AfterTriggerBeginQuery calls
   * (-1 when the stack is empty).
   *
!  * query_stack[query_depth] is a list of AFTER trigger events queued by the
!  * current query (and the query_stack entries below it are lists of trigger
!  * events queued by calling queries).  None of these are valid until the
!  * matching AfterTriggerEndQuery call occurs.  At that point we fire
!  * immediate-mode triggers, and append any deferred events to the main events
!  * list.
   *
!  * fdw_tuplestores[query_depth] is a tuplestore containing the foreign tuples
!  * needed for the current query.
   *
!  * maxquerydepth is just the allocated length of query_stack and the
!  * tuplestores.
   *
!  * state_stack is a stack of pointers to saved copies of the SET CONSTRAINTS
!  * state data; each subtransaction level that modifies that state first
   * saves a copy, which we use to restore the state if we abort.
   *
!  * events_stack is a stack of copies of the events head/tail pointers,
   * which we use to restore those values during subtransaction abort.
   *
!  * depth_stack is a stack of copies of subtransaction-start-time query_depth,
   * which we similarly use to clean up at subtransaction abort.
   *
!  * firing_stack is a stack of copies of subtransaction-start-time
!  * firing_counter.  We use this to recognize which deferred triggers were
!  * fired (or marked for firing) within an aborted subtransaction.
   *
   * We use GetCurrentTransactionNestLevel() to determine the correct array
!  * index in these stacks.  maxtransdepth is the number of allocated entries in
!  * each stack.  (By not keeping our own stack pointer, we can avoid trouble
   * in cases where errors during subxact abort cause multiple invocations
   * of AfterTriggerEndSubXact() at the same nesting depth.)
   */
  typedef struct AfterTriggersData
  {
      CommandId    firing_counter; /* next firing ID to assign */
      SetConstraintState state;    /* the active S C state */
      AfterTriggerEventList events;    /* deferred-event list */
-     int            query_depth;    /* current query list index */
-     AfterTriggerEventList *query_stack; /* events pending from each query */
-     Tuplestorestate **fdw_tuplestores;    /* foreign tuples for one row from
-                                          * each query */
-     int            maxquerydepth;    /* allocated len of above array */
      MemoryContext event_cxt;    /* memory context for events, if any */

!     /* these fields are just for resetting at subtrans abort: */

!     SetConstraintState *state_stack;    /* stacked S C states */
!     AfterTriggerEventList *events_stack;    /* stacked list pointers */
!     int           *depth_stack;    /* stacked query_depths */
!     CommandId  *firing_stack;    /* stacked firing_counters */
!     int            maxtransdepth;    /* allocated len of above arrays */
  } AfterTriggersData;

  static AfterTriggersData afterTriggers;

  static void AfterTriggerExecute(AfterTriggerEvent event,
--- 3474,3579 ----
   * query_depth is the current depth of nested AfterTriggerBeginQuery calls
   * (-1 when the stack is empty).
   *
!  * query_stack[query_depth] is the per-query-level data, including these fields:
   *
!  * events is a list of AFTER trigger events queued by the current query.
!  * None of these are valid until the matching AfterTriggerEndQuery call
!  * occurs.  At that point we fire immediate-mode triggers, and append any
!  * deferred events to the main events list.
   *
!  * fdw_tuplestore is a tuplestore containing the foreign-table tuples
!  * needed by events queued by the current query.  (Note: we use just one
!  * tuplestore even though more than one foreign table might be involved.
!  * This is okay because tuplestores don't really care what's in the tuples
!  * they store; but it's possible that someday it'd break.)
   *
!  * tables is a List of AfterTriggersTableData structs for target tables
!  * of the current query (see below).
!  *
!  * maxquerydepth is just the allocated length of query_stack.
!  *
!  * trans_stack holds per-subtransaction data, including these fields:
!  *
!  * state is NULL or a pointer to a saved copy of the SET CONSTRAINTS
!  * state data.  Each subtransaction level that modifies that state first
   * saves a copy, which we use to restore the state if we abort.
   *
!  * events is a copy of the events head/tail pointers,
   * which we use to restore those values during subtransaction abort.
   *
!  * query_depth is the subtransaction-start-time value of query_depth,
   * which we similarly use to clean up at subtransaction abort.
   *
!  * firing_counter is the subtransaction-start-time value of firing_counter.
!  * We use this to recognize which deferred triggers were fired (or marked
!  * for firing) within an aborted subtransaction.
   *
   * We use GetCurrentTransactionNestLevel() to determine the correct array
!  * index in trans_stack.  maxtransdepth is the number of allocated entries in
!  * trans_stack.  (By not keeping our own stack pointer, we can avoid trouble
   * in cases where errors during subxact abort cause multiple invocations
   * of AfterTriggerEndSubXact() at the same nesting depth.)
+  *
+  * We create an AfterTriggersTableData struct for each target table of the
+  * current query, and each operation mode (INSERT/UPDATE/DELETE), that has
+  * either transition tables or AFTER STATEMENT triggers.  This is used to
+  * hold the relevant transition tables, as well as a flag showing whether
+  * we already queued the AFTER STATEMENT triggers.  We need the flag so
+  * that cases like multiple FK enforcement sub-queries targeting the same
+  * table don't fire such triggers more than once.  These structs, along with
+  * the transition table tuplestores, live in the (sub)transaction's
+  * CurTransactionContext.  That's sufficient lifespan because we don't allow
+  * transition tables to be used by deferrable triggers, so they only need
+  * to survive until AfterTriggerEndQuery.
   */
+ typedef struct AfterTriggersQueryData AfterTriggersQueryData;
+ typedef struct AfterTriggersTransData AfterTriggersTransData;
+ typedef struct AfterTriggersTableData AfterTriggersTableData;
+
  typedef struct AfterTriggersData
  {
      CommandId    firing_counter; /* next firing ID to assign */
      SetConstraintState state;    /* the active S C state */
      AfterTriggerEventList events;    /* deferred-event list */
      MemoryContext event_cxt;    /* memory context for events, if any */

!     /* per-query-level data: */
!     AfterTriggersQueryData *query_stack;    /* array of structs shown above */
!     int            query_depth;    /* current index in above array */
!     int            maxquerydepth;    /* allocated len of above array */

!     /* per-subtransaction-level data: */
!     AfterTriggersTransData *trans_stack;    /* array of structs shown above */
!     int            maxtransdepth;    /* allocated len of above array */
  } AfterTriggersData;

+ struct AfterTriggersQueryData
+ {
+     AfterTriggerEventList events;    /* events pending from this query */
+     Tuplestorestate *fdw_tuplestore;    /* foreign tuples for said events */
+     List       *tables;            /* list of AfterTriggersTableData */
+ };
+
+ struct AfterTriggersTransData
+ {
+     /* these fields are just for resetting at subtrans abort: */
+     SetConstraintState state;    /* saved S C state, or NULL if not yet saved */
+     AfterTriggerEventList events;    /* saved list pointer */
+     int            query_depth;    /* saved query_depth */
+     CommandId    firing_counter; /* saved firing_counter */
+ };
+
+ struct AfterTriggersTableData
+ {
+     /* relid + cmdType form the lookup key for these structs: */
+     Oid            relid;            /* target table's OID */
+     CmdType        cmdType;        /* event type, CMD_INSERT/UPDATE/DELETE */
+     bool        closed;            /* true when no longer OK to add tuples */
+     bool        stmt_trig_done; /* did we already queue stmt-level triggers? */
+     Tuplestorestate *old_tuplestore;    /* "old" transition table, if any */
+     Tuplestorestate *new_tuplestore;    /* "new" transition table, if any */
+ };
+
  static AfterTriggersData afterTriggers;

  static void AfterTriggerExecute(AfterTriggerEvent event,
*************** static void AfterTriggerExecute(AfterTri
*** 3591,3598 ****
                      Instrumentation *instr,
                      MemoryContext per_tuple_context,
                      TupleTableSlot *trig_tuple_slot1,
!                     TupleTableSlot *trig_tuple_slot2,
!                     TransitionCaptureState *transition_capture);
  static SetConstraintState SetConstraintStateCreate(int numalloc);
  static SetConstraintState SetConstraintStateCopy(SetConstraintState state);
  static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
--- 3582,3591 ----
                      Instrumentation *instr,
                      MemoryContext per_tuple_context,
                      TupleTableSlot *trig_tuple_slot1,
!                     TupleTableSlot *trig_tuple_slot2);
! static AfterTriggersTableData *GetAfterTriggersTableData(Oid relid,
!                           CmdType cmdType);
! static void AfterTriggerFreeQuery(AfterTriggersQueryData *qs);
  static SetConstraintState SetConstraintStateCreate(int numalloc);
  static SetConstraintState SetConstraintStateCopy(SetConstraintState state);
  static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
*************** static SetConstraintState SetConstraintS
*** 3600,3628 ****


  /*
!  * Gets a current query transition tuplestore and initializes it if necessary.
   */
  static Tuplestorestate *
! GetTriggerTransitionTuplestore(Tuplestorestate **tss)
  {
      Tuplestorestate *ret;

!     ret = tss[afterTriggers.query_depth];
      if (ret == NULL)
      {
          MemoryContext oldcxt;
          ResourceOwner saveResourceOwner;

          /*
!          * Make the tuplestore valid until end of transaction.  This is the
!          * allocation lifespan of the associated events list, but we really
           * only need it until AfterTriggerEndQuery().
           */
!         oldcxt = MemoryContextSwitchTo(TopTransactionContext);
          saveResourceOwner = CurrentResourceOwner;
          PG_TRY();
          {
!             CurrentResourceOwner = TopTransactionResourceOwner;
              ret = tuplestore_begin_heap(false, false, work_mem);
          }
          PG_CATCH();
--- 3593,3621 ----


  /*
!  * Get the FDW tuplestore for the current trigger query level, creating it
!  * if necessary.
   */
  static Tuplestorestate *
! GetCurrentFDWTuplestore(void)
  {
      Tuplestorestate *ret;

!     ret = afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore;
      if (ret == NULL)
      {
          MemoryContext oldcxt;
          ResourceOwner saveResourceOwner;

          /*
!          * Make the tuplestore valid until end of subtransaction.  We really
           * only need it until AfterTriggerEndQuery().
           */
!         oldcxt = MemoryContextSwitchTo(CurTransactionContext);
          saveResourceOwner = CurrentResourceOwner;
          PG_TRY();
          {
!             CurrentResourceOwner = CurTransactionResourceOwner;
              ret = tuplestore_begin_heap(false, false, work_mem);
          }
          PG_CATCH();
*************** GetTriggerTransitionTuplestore(Tuplestor
*** 3634,3640 ****
          CurrentResourceOwner = saveResourceOwner;
          MemoryContextSwitchTo(oldcxt);

!         tss[afterTriggers.query_depth] = ret;
      }

      return ret;
--- 3627,3633 ----
          CurrentResourceOwner = saveResourceOwner;
          MemoryContextSwitchTo(oldcxt);

!         afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore = ret;
      }

      return ret;
*************** afterTriggerAddEvent(AfterTriggerEventLi
*** 3780,3786 ****
          if (newshared->ats_tgoid == evtshared->ats_tgoid &&
              newshared->ats_relid == evtshared->ats_relid &&
              newshared->ats_event == evtshared->ats_event &&
!             newshared->ats_transition_capture == evtshared->ats_transition_capture &&
              newshared->ats_firing_id == 0)
              break;
      }
--- 3773,3780 ----
          if (newshared->ats_tgoid == evtshared->ats_tgoid &&
              newshared->ats_relid == evtshared->ats_relid &&
              newshared->ats_event == evtshared->ats_event &&
!             newshared->ats_old_tuplestore == evtshared->ats_old_tuplestore &&
!             newshared->ats_new_tuplestore == evtshared->ats_new_tuplestore &&
              newshared->ats_firing_id == 0)
              break;
      }
*************** AfterTriggerExecute(AfterTriggerEvent ev
*** 3892,3899 ****
                      FmgrInfo *finfo, Instrumentation *instr,
                      MemoryContext per_tuple_context,
                      TupleTableSlot *trig_tuple_slot1,
!                     TupleTableSlot *trig_tuple_slot2,
!                     TransitionCaptureState *transition_capture)
  {
      AfterTriggerShared evtshared = GetTriggerSharedData(event);
      Oid            tgoid = evtshared->ats_tgoid;
--- 3886,3892 ----
                      FmgrInfo *finfo, Instrumentation *instr,
                      MemoryContext per_tuple_context,
                      TupleTableSlot *trig_tuple_slot1,
!                     TupleTableSlot *trig_tuple_slot2)
  {
      AfterTriggerShared evtshared = GetTriggerSharedData(event);
      Oid            tgoid = evtshared->ats_tgoid;
*************** AfterTriggerExecute(AfterTriggerEvent ev
*** 3934,3942 ****
      {
          case AFTER_TRIGGER_FDW_FETCH:
              {
!                 Tuplestorestate *fdw_tuplestore =
!                 GetTriggerTransitionTuplestore
!                 (afterTriggers.fdw_tuplestores);

                  if (!tuplestore_gettupleslot(fdw_tuplestore, true, false,
                                               trig_tuple_slot1))
--- 3927,3933 ----
      {
          case AFTER_TRIGGER_FDW_FETCH:
              {
!                 Tuplestorestate *fdw_tuplestore = GetCurrentFDWTuplestore();

                  if (!tuplestore_gettupleslot(fdw_tuplestore, true, false,
                                               trig_tuple_slot1))
*************** AfterTriggerExecute(AfterTriggerEvent ev
*** 4008,4043 ****
      /*
       * Set up the tuplestore information.
       */
!     LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
!     if (transition_capture != NULL)
!     {
!         if (LocTriggerData.tg_trigger->tgoldtable)
!             LocTriggerData.tg_oldtable = transition_capture->tcs_old_tuplestore;
!         if (LocTriggerData.tg_trigger->tgnewtable)
!         {
!             /*
!              * Currently a trigger with transition tables may only be defined
!              * for a single event type (here AFTER INSERT or AFTER UPDATE, but
!              * not AFTER INSERT OR ...).
!              */
!             Assert((TRIGGER_FOR_INSERT(LocTriggerData.tg_trigger->tgtype) != 0) ^
!                    (TRIGGER_FOR_UPDATE(LocTriggerData.tg_trigger->tgtype) != 0));

!             /*
!              * Show either the insert or update new tuple images, depending on
!              * which event type the trigger was registered for.  A single
!              * statement may have produced both in the case of INSERT ... ON
!              * CONFLICT ... DO UPDATE, and in that case the event determines
!              * which tuplestore the trigger sees as the NEW TABLE.
!              */
!             if (TRIGGER_FOR_INSERT(LocTriggerData.tg_trigger->tgtype))
!                 LocTriggerData.tg_newtable =
!                     transition_capture->tcs_insert_tuplestore;
!             else
!                 LocTriggerData.tg_newtable =
!                     transition_capture->tcs_update_tuplestore;
!         }
!     }

      /*
       * Setup the remaining trigger information
--- 3999,4013 ----
      /*
       * Set up the tuplestore information.
       */
!     if (LocTriggerData.tg_trigger->tgoldtable)
!         LocTriggerData.tg_oldtable = evtshared->ats_old_tuplestore;
!     else
!         LocTriggerData.tg_oldtable = NULL;

!     if (LocTriggerData.tg_trigger->tgnewtable)
!         LocTriggerData.tg_newtable = evtshared->ats_new_tuplestore;
!     else
!         LocTriggerData.tg_newtable = NULL;

      /*
       * Setup the remaining trigger information
*************** afterTriggerInvokeEvents(AfterTriggerEve
*** 4245,4252 ****
                   * won't try to re-fire it.
                   */
                  AfterTriggerExecute(event, rel, trigdesc, finfo, instr,
!                                     per_tuple_context, slot1, slot2,
!                                     evtshared->ats_transition_capture);

                  /*
                   * Mark the event as done.
--- 4215,4221 ----
                   * won't try to re-fire it.
                   */
                  AfterTriggerExecute(event, rel, trigdesc, finfo, instr,
!                                     per_tuple_context, slot1, slot2);

                  /*
                   * Mark the event as done.
*************** afterTriggerInvokeEvents(AfterTriggerEve
*** 4296,4301 ****
--- 4265,4433 ----
  }


+ /*
+  * GetAfterTriggersTableData
+  *
+  * Find or create an AfterTriggersTableData struct for the specified
+  * trigger event (relation + operation type).  Ignore existing structs
+  * marked "closed"; we don't want to put any additional tuples into them,
+  * nor change their triggers-fired flags.
+  *
+  * Note: the AfterTriggersTableData list is allocated in the current
+  * (sub)transaction's CurTransactionContext.  This is OK because
+  * we don't need it to live past AfterTriggerEndQuery.
+  */
+ static AfterTriggersTableData *
+ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
+ {
+     AfterTriggersTableData *table;
+     AfterTriggersQueryData *qs;
+     MemoryContext oldcxt;
+     ListCell   *lc;
+
+     /* Caller should have ensured query_depth is OK. */
+     Assert(afterTriggers.query_depth >= 0 &&
+            afterTriggers.query_depth < afterTriggers.maxquerydepth);
+     qs = &afterTriggers.query_stack[afterTriggers.query_depth];
+
+     foreach(lc, qs->tables)
+     {
+         table = (AfterTriggersTableData *) lfirst(lc);
+         if (table->relid == relid && table->cmdType == cmdType &&
+             !table->closed)
+             return table;
+     }
+
+     oldcxt = MemoryContextSwitchTo(CurTransactionContext);
+
+     table = (AfterTriggersTableData *) palloc0(sizeof(AfterTriggersTableData));
+     table->relid = relid;
+     table->cmdType = cmdType;
+     qs->tables = lappend(qs->tables, table);
+
+     MemoryContextSwitchTo(oldcxt);
+
+     return table;
+ }
+
+
+ /*
+  * MakeTransitionCaptureState
+  *
+  * Make a TransitionCaptureState object for the given TriggerDesc, target
+  * relation, and operation type.  The TCS object holds all the state needed
+  * to decide whether to capture tuples in transition tables.
+  *
+  * If there are no triggers in 'trigdesc' that request relevant transition
+  * tables, then return NULL.
+  *
+  * The resulting object can be passed to the ExecAR* functions.  The caller
+  * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
+  * with child tables.
+  *
+  * Note that we copy the flags from a parent table into this struct (rather
+  * than subsequently using the relation's TriggerDesc directly) so that we can
+  * use it to control collection of transition tuples from child tables.
+  *
+  * Per SQL spec, all operations of the same kind (INSERT/UPDATE/DELETE)
+  * on the same table during one query should share one transition table.
+  * Therefore, the Tuplestores are owned by an AfterTriggersTableData struct
+  * looked up using the table OID + CmdType, and are merely referenced by
+  * the TransitionCaptureState objects we hand out to callers.
+  */
+ TransitionCaptureState *
+ MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType)
+ {
+     TransitionCaptureState *state;
+     bool        need_old,
+                 need_new;
+     AfterTriggersTableData *table;
+     MemoryContext oldcxt;
+     ResourceOwner saveResourceOwner;
+
+     if (trigdesc == NULL)
+         return NULL;
+
+     /* Detect which table(s) we need. */
+     switch (cmdType)
+     {
+         case CMD_INSERT:
+             need_old = false;
+             need_new = trigdesc->trig_insert_new_table;
+             break;
+         case CMD_UPDATE:
+             need_old = trigdesc->trig_update_old_table;
+             need_new = trigdesc->trig_update_new_table;
+             break;
+         case CMD_DELETE:
+             need_old = trigdesc->trig_delete_old_table;
+             need_new = false;
+             break;
+         default:
+             elog(ERROR, "unexpected CmdType: %d", (int) cmdType);
+             need_old = need_new = false;    /* keep compiler quiet */
+             break;
+     }
+     if (!need_old && !need_new)
+         return NULL;
+
+     /* Check state, like AfterTriggerSaveEvent. */
+     if (afterTriggers.query_depth < 0)
+         elog(ERROR, "MakeTransitionCaptureState() called outside of query");
+
+     /* Be sure we have enough space to record events at this query depth. */
+     if (afterTriggers.query_depth >= afterTriggers.maxquerydepth)
+         AfterTriggerEnlargeQueryState();
+
+     /*
+      * Find or create an AfterTriggersTableData struct to hold the
+      * tuplestore(s).  If there's a matching struct but it's marked closed,
+      * ignore it; we need a newer one.
+      *
+      * Note: the AfterTriggersTableData list, as well as the tuplestores, are
+      * allocated in the current (sub)transaction's CurTransactionContext, and
+      * the tuplestores are managed by the (sub)transaction's resource owner.
+      * This is sufficient lifespan because we do not allow triggers using
+      * transition tables to be deferrable; they will be fired during
+      * AfterTriggerEndQuery, after which it's okay to delete the data.
+      */
+     table = GetAfterTriggersTableData(relid, cmdType);
+
+     /* Now create required tuplestore(s), if we don't have them already. */
+     oldcxt = MemoryContextSwitchTo(CurTransactionContext);
+     saveResourceOwner = CurrentResourceOwner;
+     PG_TRY();
+     {
+         CurrentResourceOwner = CurTransactionResourceOwner;
+         if (need_old && table->old_tuplestore == NULL)
+             table->old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+         if (need_new && table->new_tuplestore == NULL)
+             table->new_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+     }
+     PG_CATCH();
+     {
+         CurrentResourceOwner = saveResourceOwner;
+         PG_RE_THROW();
+     }
+     PG_END_TRY();
+     CurrentResourceOwner = saveResourceOwner;
+     MemoryContextSwitchTo(oldcxt);
+
+     /* Now build the TransitionCaptureState struct, in caller's context */
+     state = (TransitionCaptureState *) palloc0(sizeof(TransitionCaptureState));
+     state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
+     state->tcs_update_old_table = trigdesc->trig_update_old_table;
+     state->tcs_update_new_table = trigdesc->trig_update_new_table;
+     state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
+     if (need_old)
+         state->tcs_old_tuplestore = table->old_tuplestore;
+     if (need_new)
+         state->tcs_new_tuplestore = table->new_tuplestore;
+
+     return state;
+ }
+
+
  /* ----------
   * AfterTriggerBeginXact()
   *
*************** AfterTriggerBeginXact(void)
*** 4319,4332 ****
       */
      Assert(afterTriggers.state == NULL);
      Assert(afterTriggers.query_stack == NULL);
-     Assert(afterTriggers.fdw_tuplestores == NULL);
      Assert(afterTriggers.maxquerydepth == 0);
      Assert(afterTriggers.event_cxt == NULL);
      Assert(afterTriggers.events.head == NULL);
!     Assert(afterTriggers.state_stack == NULL);
!     Assert(afterTriggers.events_stack == NULL);
!     Assert(afterTriggers.depth_stack == NULL);
!     Assert(afterTriggers.firing_stack == NULL);
      Assert(afterTriggers.maxtransdepth == 0);
  }

--- 4451,4460 ----
       */
      Assert(afterTriggers.state == NULL);
      Assert(afterTriggers.query_stack == NULL);
      Assert(afterTriggers.maxquerydepth == 0);
      Assert(afterTriggers.event_cxt == NULL);
      Assert(afterTriggers.events.head == NULL);
!     Assert(afterTriggers.trans_stack == NULL);
      Assert(afterTriggers.maxtransdepth == 0);
  }

*************** AfterTriggerBeginQuery(void)
*** 4362,4370 ****
  void
  AfterTriggerEndQuery(EState *estate)
  {
-     AfterTriggerEventList *events;
-     Tuplestorestate *fdw_tuplestore;
-
      /* Must be inside a query, too */
      Assert(afterTriggers.query_depth >= 0);

--- 4490,4495 ----
*************** AfterTriggerEndQuery(EState *estate)
*** 4393,4430 ****
       * will instead fire any triggers in a dedicated query level.  Foreign key
       * enforcement triggers do add to the current query level, thanks to their
       * passing fire_triggers = false to SPI_execute_snapshot().  Other
!      * C-language triggers might do likewise.  Be careful here: firing a
!      * trigger could result in query_stack being repalloc'd, so we can't save
!      * its address across afterTriggerInvokeEvents calls.
       *
       * If we find no firable events, we don't have to increment
       * firing_counter.
       */
      for (;;)
      {
!         events = &afterTriggers.query_stack[afterTriggers.query_depth];
!         if (afterTriggerMarkEvents(events, &afterTriggers.events, true))
          {
              CommandId    firing_id = afterTriggers.firing_counter++;

              /* OK to delete the immediate events after processing them */
!             if (afterTriggerInvokeEvents(events, firing_id, estate, true))
                  break;            /* all fired */
          }
          else
              break;
      }

!     /* Release query-local storage for events, including tuplestore if any */
!     fdw_tuplestore = afterTriggers.fdw_tuplestores[afterTriggers.query_depth];
!     if (fdw_tuplestore)
      {
!         tuplestore_end(fdw_tuplestore);
!         afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
      }
-     afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);

!     afterTriggers.query_depth--;
  }


--- 4518,4616 ----
       * will instead fire any triggers in a dedicated query level.  Foreign key
       * enforcement triggers do add to the current query level, thanks to their
       * passing fire_triggers = false to SPI_execute_snapshot().  Other
!      * C-language triggers might do likewise.
       *
       * If we find no firable events, we don't have to increment
       * firing_counter.
       */
      for (;;)
      {
!         AfterTriggersQueryData *qs;
!         ListCell   *lc;
!
!         /*
!          * Firing a trigger could result in query_stack being repalloc'd, so
!          * we must recalculate qs after each afterTriggerInvokeEvents call.
!          */
!         qs = &afterTriggers.query_stack[afterTriggers.query_depth];
!
!         /*
!          * Before each firing cycle, mark all existing transition tables
!          * "closed", so that their contents can't change anymore.  If someone
!          * causes more updates, they'll go into new transition tables.
!          */
!         foreach(lc, qs->tables)
!         {
!             AfterTriggersTableData *table = (AfterTriggersTableData *) lfirst(lc);
!
!             table->closed = true;
!         }
!
!         if (afterTriggerMarkEvents(&qs->events, &afterTriggers.events, true))
          {
              CommandId    firing_id = afterTriggers.firing_counter++;

              /* OK to delete the immediate events after processing them */
!             if (afterTriggerInvokeEvents(&qs->events, firing_id, estate, true))
                  break;            /* all fired */
          }
          else
              break;
      }

!     /* Release query-level-local storage, including tuplestores if any */
!     AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
!
!     afterTriggers.query_depth--;
! }
!
!
! /*
!  * AfterTriggerFreeQuery
!  *    Release subsidiary storage for a trigger query level.
!  *    This includes closing down tuplestores.
!  *    Note: it's important for this to be safe if interrupted by an error
!  *    and then called again for the same query level.
!  */
! static void
! AfterTriggerFreeQuery(AfterTriggersQueryData *qs)
! {
!     Tuplestorestate *ts;
!     List       *tables;
!     ListCell   *lc;
!
!     /* Drop the trigger events */
!     afterTriggerFreeEventList(&qs->events);
!
!     /* Drop FDW tuplestore if any */
!     ts = qs->fdw_tuplestore;
!     qs->fdw_tuplestore = NULL;
!     if (ts)
!         tuplestore_end(ts);
!
!     /* Release per-table subsidiary storage */
!     tables = qs->tables;
!     foreach(lc, tables)
      {
!         AfterTriggersTableData *table = (AfterTriggersTableData *) lfirst(lc);
!
!         ts = table->old_tuplestore;
!         table->old_tuplestore = NULL;
!         if (ts)
!             tuplestore_end(ts);
!         ts = table->new_tuplestore;
!         table->new_tuplestore = NULL;
!         if (ts)
!             tuplestore_end(ts);
      }

!     /*
!      * Now free the AfterTriggersTableData structs and list cells.  Reset list
!      * pointer first; if list_free_deep somehow gets an error, better to leak
!      * that storage than have an infinite loop.
!      */
!     qs->tables = NIL;
!     list_free_deep(tables);
  }


*************** AfterTriggerEndXact(bool isCommit)
*** 4521,4530 ****
       * large, we let the eventual reset of TopTransactionContext free the
       * memory instead of doing it here.
       */
!     afterTriggers.state_stack = NULL;
!     afterTriggers.events_stack = NULL;
!     afterTriggers.depth_stack = NULL;
!     afterTriggers.firing_stack = NULL;
      afterTriggers.maxtransdepth = 0;


--- 4707,4713 ----
       * large, we let the eventual reset of TopTransactionContext free the
       * memory instead of doing it here.
       */
!     afterTriggers.trans_stack = NULL;
      afterTriggers.maxtransdepth = 0;


*************** AfterTriggerEndXact(bool isCommit)
*** 4534,4540 ****
       * memory here.
       */
      afterTriggers.query_stack = NULL;
-     afterTriggers.fdw_tuplestores = NULL;
      afterTriggers.maxquerydepth = 0;
      afterTriggers.state = NULL;

--- 4717,4722 ----
*************** AfterTriggerBeginSubXact(void)
*** 4553,4600 ****
      int            my_level = GetCurrentTransactionNestLevel();

      /*
!      * Allocate more space in the stacks if needed.  (Note: because the
       * minimum nest level of a subtransaction is 2, we waste the first couple
!      * entries of each array; not worth the notational effort to avoid it.)
       */
      while (my_level >= afterTriggers.maxtransdepth)
      {
          if (afterTriggers.maxtransdepth == 0)
          {
!             MemoryContext old_cxt;
!
!             old_cxt = MemoryContextSwitchTo(TopTransactionContext);
!
! #define DEFTRIG_INITALLOC 8
!             afterTriggers.state_stack = (SetConstraintState *)
!                 palloc(DEFTRIG_INITALLOC * sizeof(SetConstraintState));
!             afterTriggers.events_stack = (AfterTriggerEventList *)
!                 palloc(DEFTRIG_INITALLOC * sizeof(AfterTriggerEventList));
!             afterTriggers.depth_stack = (int *)
!                 palloc(DEFTRIG_INITALLOC * sizeof(int));
!             afterTriggers.firing_stack = (CommandId *)
!                 palloc(DEFTRIG_INITALLOC * sizeof(CommandId));
!             afterTriggers.maxtransdepth = DEFTRIG_INITALLOC;
!
!             MemoryContextSwitchTo(old_cxt);
          }
          else
          {
!             /* repalloc will keep the stacks in the same context */
              int            new_alloc = afterTriggers.maxtransdepth * 2;

!             afterTriggers.state_stack = (SetConstraintState *)
!                 repalloc(afterTriggers.state_stack,
!                          new_alloc * sizeof(SetConstraintState));
!             afterTriggers.events_stack = (AfterTriggerEventList *)
!                 repalloc(afterTriggers.events_stack,
!                          new_alloc * sizeof(AfterTriggerEventList));
!             afterTriggers.depth_stack = (int *)
!                 repalloc(afterTriggers.depth_stack,
!                          new_alloc * sizeof(int));
!             afterTriggers.firing_stack = (CommandId *)
!                 repalloc(afterTriggers.firing_stack,
!                          new_alloc * sizeof(CommandId));
              afterTriggers.maxtransdepth = new_alloc;
          }
      }
--- 4735,4762 ----
      int            my_level = GetCurrentTransactionNestLevel();

      /*
!      * Allocate more space in the trans_stack if needed.  (Note: because the
       * minimum nest level of a subtransaction is 2, we waste the first couple
!      * entries of the array; not worth the notational effort to avoid it.)
       */
      while (my_level >= afterTriggers.maxtransdepth)
      {
          if (afterTriggers.maxtransdepth == 0)
          {
!             /* Arbitrarily initialize for max of 8 subtransaction levels */
!             afterTriggers.trans_stack = (AfterTriggersTransData *)
!                 MemoryContextAlloc(TopTransactionContext,
!                                    8 * sizeof(AfterTriggersTransData));
!             afterTriggers.maxtransdepth = 8;
          }
          else
          {
!             /* repalloc will keep the stack in the same context */
              int            new_alloc = afterTriggers.maxtransdepth * 2;

!             afterTriggers.trans_stack = (AfterTriggersTransData *)
!                 repalloc(afterTriggers.trans_stack,
!                          new_alloc * sizeof(AfterTriggersTransData));
              afterTriggers.maxtransdepth = new_alloc;
          }
      }
*************** AfterTriggerBeginSubXact(void)
*** 4604,4613 ****
       * is not saved until/unless changed.  Likewise, we don't make a
       * per-subtransaction event context until needed.
       */
!     afterTriggers.state_stack[my_level] = NULL;
!     afterTriggers.events_stack[my_level] = afterTriggers.events;
!     afterTriggers.depth_stack[my_level] = afterTriggers.query_depth;
!     afterTriggers.firing_stack[my_level] = afterTriggers.firing_counter;
  }

  /*
--- 4766,4775 ----
       * is not saved until/unless changed.  Likewise, we don't make a
       * per-subtransaction event context until needed.
       */
!     afterTriggers.trans_stack[my_level].state = NULL;
!     afterTriggers.trans_stack[my_level].events = afterTriggers.events;
!     afterTriggers.trans_stack[my_level].query_depth = afterTriggers.query_depth;
!     afterTriggers.trans_stack[my_level].firing_counter = afterTriggers.firing_counter;
  }

  /*
*************** AfterTriggerEndSubXact(bool isCommit)
*** 4631,4700 ****
      {
          Assert(my_level < afterTriggers.maxtransdepth);
          /* If we saved a prior state, we don't need it anymore */
!         state = afterTriggers.state_stack[my_level];
          if (state != NULL)
              pfree(state);
          /* this avoids double pfree if error later: */
!         afterTriggers.state_stack[my_level] = NULL;
          Assert(afterTriggers.query_depth ==
!                afterTriggers.depth_stack[my_level]);
      }
      else
      {
          /*
           * Aborting.  It is possible subxact start failed before calling
           * AfterTriggerBeginSubXact, in which case we mustn't risk touching
!          * stack levels that aren't there.
           */
          if (my_level >= afterTriggers.maxtransdepth)
              return;

          /*
!          * Release any event lists from queries being aborted, and restore
           * query_depth to its pre-subxact value.  This assumes that a
           * subtransaction will not add events to query levels started in a
           * earlier transaction state.
           */
!         while (afterTriggers.query_depth > afterTriggers.depth_stack[my_level])
          {
              if (afterTriggers.query_depth < afterTriggers.maxquerydepth)
!             {
!                 Tuplestorestate *ts;
!
!                 ts = afterTriggers.fdw_tuplestores[afterTriggers.query_depth];
!                 if (ts)
!                 {
!                     tuplestore_end(ts);
!                     afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
!                 }
!
!                 afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
!             }
!
              afterTriggers.query_depth--;
          }
          Assert(afterTriggers.query_depth ==
!                afterTriggers.depth_stack[my_level]);

          /*
           * Restore the global deferred-event list to its former length,
           * discarding any events queued by the subxact.
           */
          afterTriggerRestoreEventList(&afterTriggers.events,
!                                      &afterTriggers.events_stack[my_level]);

          /*
           * Restore the trigger state.  If the saved state is NULL, then this
           * subxact didn't save it, so it doesn't need restoring.
           */
!         state = afterTriggers.state_stack[my_level];
          if (state != NULL)
          {
              pfree(afterTriggers.state);
              afterTriggers.state = state;
          }
          /* this avoids double pfree if error later: */
!         afterTriggers.state_stack[my_level] = NULL;

          /*
           * Scan for any remaining deferred events that were marked DONE or IN
--- 4793,4850 ----
      {
          Assert(my_level < afterTriggers.maxtransdepth);
          /* If we saved a prior state, we don't need it anymore */
!         state = afterTriggers.trans_stack[my_level].state;
          if (state != NULL)
              pfree(state);
          /* this avoids double pfree if error later: */
!         afterTriggers.trans_stack[my_level].state = NULL;
          Assert(afterTriggers.query_depth ==
!                afterTriggers.trans_stack[my_level].query_depth);
      }
      else
      {
          /*
           * Aborting.  It is possible subxact start failed before calling
           * AfterTriggerBeginSubXact, in which case we mustn't risk touching
!          * trans_stack levels that aren't there.
           */
          if (my_level >= afterTriggers.maxtransdepth)
              return;

          /*
!          * Release query-level storage for queries being aborted, and restore
           * query_depth to its pre-subxact value.  This assumes that a
           * subtransaction will not add events to query levels started in a
           * earlier transaction state.
           */
!         while (afterTriggers.query_depth > afterTriggers.trans_stack[my_level].query_depth)
          {
              if (afterTriggers.query_depth < afterTriggers.maxquerydepth)
!                 AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
              afterTriggers.query_depth--;
          }
          Assert(afterTriggers.query_depth ==
!                afterTriggers.trans_stack[my_level].query_depth);

          /*
           * Restore the global deferred-event list to its former length,
           * discarding any events queued by the subxact.
           */
          afterTriggerRestoreEventList(&afterTriggers.events,
!                                      &afterTriggers.trans_stack[my_level].events);

          /*
           * Restore the trigger state.  If the saved state is NULL, then this
           * subxact didn't save it, so it doesn't need restoring.
           */
!         state = afterTriggers.trans_stack[my_level].state;
          if (state != NULL)
          {
              pfree(afterTriggers.state);
              afterTriggers.state = state;
          }
          /* this avoids double pfree if error later: */
!         afterTriggers.trans_stack[my_level].state = NULL;

          /*
           * Scan for any remaining deferred events that were marked DONE or IN
*************** AfterTriggerEndSubXact(bool isCommit)
*** 4704,4710 ****
           * (This essentially assumes that the current subxact includes all
           * subxacts started after it.)
           */
!         subxact_firing_id = afterTriggers.firing_stack[my_level];
          for_each_event_chunk(event, chunk, afterTriggers.events)
          {
              AfterTriggerShared evtshared = GetTriggerSharedData(event);
--- 4854,4860 ----
           * (This essentially assumes that the current subxact includes all
           * subxacts started after it.)
           */
!         subxact_firing_id = afterTriggers.trans_stack[my_level].firing_counter;
          for_each_event_chunk(event, chunk, afterTriggers.events)
          {
              AfterTriggerShared evtshared = GetTriggerSharedData(event);
*************** AfterTriggerEnlargeQueryState(void)
*** 4740,4751 ****
      {
          int            new_alloc = Max(afterTriggers.query_depth + 1, 8);

!         afterTriggers.query_stack = (AfterTriggerEventList *)
              MemoryContextAlloc(TopTransactionContext,
!                                new_alloc * sizeof(AfterTriggerEventList));
!         afterTriggers.fdw_tuplestores = (Tuplestorestate **)
!             MemoryContextAllocZero(TopTransactionContext,
!                                    new_alloc * sizeof(Tuplestorestate *));
          afterTriggers.maxquerydepth = new_alloc;
      }
      else
--- 4890,4898 ----
      {
          int            new_alloc = Max(afterTriggers.query_depth + 1, 8);

!         afterTriggers.query_stack = (AfterTriggersQueryData *)
              MemoryContextAlloc(TopTransactionContext,
!                                new_alloc * sizeof(AfterTriggersQueryData));
          afterTriggers.maxquerydepth = new_alloc;
      }
      else
*************** AfterTriggerEnlargeQueryState(void)
*** 4755,4781 ****
          int            new_alloc = Max(afterTriggers.query_depth + 1,
                                      old_alloc * 2);

!         afterTriggers.query_stack = (AfterTriggerEventList *)
              repalloc(afterTriggers.query_stack,
!                      new_alloc * sizeof(AfterTriggerEventList));
!         afterTriggers.fdw_tuplestores = (Tuplestorestate **)
!             repalloc(afterTriggers.fdw_tuplestores,
!                      new_alloc * sizeof(Tuplestorestate *));
!         /* Clear newly-allocated slots for subsequent lazy initialization. */
!         memset(afterTriggers.fdw_tuplestores + old_alloc,
!                0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
          afterTriggers.maxquerydepth = new_alloc;
      }

!     /* Initialize new query lists to empty */
      while (init_depth < afterTriggers.maxquerydepth)
      {
!         AfterTriggerEventList *events;

!         events = &afterTriggers.query_stack[init_depth];
!         events->head = NULL;
!         events->tail = NULL;
!         events->tailfree = NULL;

          ++init_depth;
      }
--- 4902,4923 ----
          int            new_alloc = Max(afterTriggers.query_depth + 1,
                                      old_alloc * 2);

!         afterTriggers.query_stack = (AfterTriggersQueryData *)
              repalloc(afterTriggers.query_stack,
!                      new_alloc * sizeof(AfterTriggersQueryData));
          afterTriggers.maxquerydepth = new_alloc;
      }

!     /* Initialize new array entries to empty */
      while (init_depth < afterTriggers.maxquerydepth)
      {
!         AfterTriggersQueryData *qs = &afterTriggers.query_stack[init_depth];

!         qs->events.head = NULL;
!         qs->events.tail = NULL;
!         qs->events.tailfree = NULL;
!         qs->fdw_tuplestore = NULL;
!         qs->tables = NIL;

          ++init_depth;
      }
*************** AfterTriggerSetState(ConstraintsSetStmt
*** 4873,4881 ****
       * save it so it can be restored if the subtransaction aborts.
       */
      if (my_level > 1 &&
!         afterTriggers.state_stack[my_level] == NULL)
      {
!         afterTriggers.state_stack[my_level] =
              SetConstraintStateCopy(afterTriggers.state);
      }

--- 5015,5023 ----
       * save it so it can be restored if the subtransaction aborts.
       */
      if (my_level > 1 &&
!         afterTriggers.trans_stack[my_level].state == NULL)
      {
!         afterTriggers.trans_stack[my_level].state =
              SetConstraintStateCopy(afterTriggers.state);
      }

*************** AfterTriggerPendingOnRel(Oid relid)
*** 5184,5190 ****
       */
      for (depth = 0; depth <= afterTriggers.query_depth && depth < afterTriggers.maxquerydepth; depth++)
      {
!         for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth])
          {
              AfterTriggerShared evtshared = GetTriggerSharedData(event);

--- 5326,5332 ----
       */
      for (depth = 0; depth <= afterTriggers.query_depth && depth < afterTriggers.maxquerydepth; depth++)
      {
!         for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth].events)
          {
              AfterTriggerShared evtshared = GetTriggerSharedData(event);

*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5229,5235 ****
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
      AfterTriggerEventData new_event;
      AfterTriggerSharedData new_shared;
!     char        relkind = relinfo->ri_RelationDesc->rd_rel->relkind;
      int            tgtype_event;
      int            tgtype_level;
      int            i;
--- 5371,5378 ----
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
      AfterTriggerEventData new_event;
      AfterTriggerSharedData new_shared;
!     char        relkind = rel->rd_rel->relkind;
!     AfterTriggersTableData *table;
      int            tgtype_event;
      int            tgtype_level;
      int            i;
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5284,5293 ****
              Tuplestorestate *new_tuplestore;

              Assert(newtup != NULL);
!             if (event == TRIGGER_EVENT_INSERT)
!                 new_tuplestore = transition_capture->tcs_insert_tuplestore;
!             else
!                 new_tuplestore = transition_capture->tcs_update_tuplestore;

              if (original_insert_tuple != NULL)
                  tuplestore_puttuple(new_tuplestore, original_insert_tuple);
--- 5427,5433 ----
              Tuplestorestate *new_tuplestore;

              Assert(newtup != NULL);
!             new_tuplestore = transition_capture->tcs_new_tuplestore;

              if (original_insert_tuple != NULL)
                  tuplestore_puttuple(new_tuplestore, original_insert_tuple);
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5316,5321 ****
--- 5456,5464 ----
       * The event code will be used both as a bitmask and an array offset, so
       * validation is important to make sure we don't walk off the edge of our
       * arrays.
+      *
+      * Also, if we're considering statement-level triggers, check whether we
+      * already queued them for this event set, and return if so.
       */
      switch (event)
      {
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5334,5339 ****
--- 5477,5487 ----
                  Assert(newtup == NULL);
                  ItemPointerSetInvalid(&(new_event.ate_ctid1));
                  ItemPointerSetInvalid(&(new_event.ate_ctid2));
+                 table = GetAfterTriggersTableData(RelationGetRelid(rel),
+                                                   CMD_INSERT);
+                 if (table->stmt_trig_done)
+                     return;
+                 table->stmt_trig_done = true;
              }
              break;
          case TRIGGER_EVENT_DELETE:
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5351,5356 ****
--- 5499,5509 ----
                  Assert(newtup == NULL);
                  ItemPointerSetInvalid(&(new_event.ate_ctid1));
                  ItemPointerSetInvalid(&(new_event.ate_ctid2));
+                 table = GetAfterTriggersTableData(RelationGetRelid(rel),
+                                                   CMD_DELETE);
+                 if (table->stmt_trig_done)
+                     return;
+                 table->stmt_trig_done = true;
              }
              break;
          case TRIGGER_EVENT_UPDATE:
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5368,5373 ****
--- 5521,5531 ----
                  Assert(newtup == NULL);
                  ItemPointerSetInvalid(&(new_event.ate_ctid1));
                  ItemPointerSetInvalid(&(new_event.ate_ctid2));
+                 table = GetAfterTriggersTableData(RelationGetRelid(rel),
+                                                   CMD_UPDATE);
+                 if (table->stmt_trig_done)
+                     return;
+                 table->stmt_trig_done = true;
              }
              break;
          case TRIGGER_EVENT_TRUNCATE:
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5407,5415 ****
          {
              if (fdw_tuplestore == NULL)
              {
!                 fdw_tuplestore =
!                     GetTriggerTransitionTuplestore
!                     (afterTriggers.fdw_tuplestores);
                  new_event.ate_flags = AFTER_TRIGGER_FDW_FETCH;
              }
              else
--- 5565,5571 ----
          {
              if (fdw_tuplestore == NULL)
              {
!                 fdw_tuplestore = GetCurrentFDWTuplestore();
                  new_event.ate_flags = AFTER_TRIGGER_FDW_FETCH;
              }
              else
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5474,5484 ****
          new_shared.ats_tgoid = trigger->tgoid;
          new_shared.ats_relid = RelationGetRelid(rel);
          new_shared.ats_firing_id = 0;
!         /* deferrable triggers cannot access transition data */
!         new_shared.ats_transition_capture =
!             trigger->tgdeferrable ? NULL : transition_capture;

!         afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth],
                               &new_event, &new_shared);
      }

--- 5630,5648 ----
          new_shared.ats_tgoid = trigger->tgoid;
          new_shared.ats_relid = RelationGetRelid(rel);
          new_shared.ats_firing_id = 0;
!         /* deferrable triggers cannot access transition tables */
!         if (trigger->tgdeferrable || transition_capture == NULL)
!         {
!             new_shared.ats_old_tuplestore = NULL;
!             new_shared.ats_new_tuplestore = NULL;
!         }
!         else
!         {
!             new_shared.ats_old_tuplestore = transition_capture->tcs_old_tuplestore;
!             new_shared.ats_new_tuplestore = transition_capture->tcs_new_tuplestore;
!         }

!         afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth].events,
                               &new_event, &new_shared);
      }

diff --git a/src/backend/executor/README b/src/backend/executor/README
index a004506..b3e74aa 100644
*** a/src/backend/executor/README
--- b/src/backend/executor/README
*************** This is a sketch of control flow for ful
*** 241,251 ****
          CreateExecutorState
              creates per-query context
          switch to per-query context to run ExecInitNode
          ExecInitNode --- recursively scans plan tree
              CreateExprContext
                  creates per-tuple context
              ExecInitExpr
-         AfterTriggerBeginQuery

      ExecutorRun
          ExecProcNode --- recursively called in per-query context
--- 241,251 ----
          CreateExecutorState
              creates per-query context
          switch to per-query context to run ExecInitNode
+         AfterTriggerBeginQuery
          ExecInitNode --- recursively scans plan tree
              CreateExprContext
                  creates per-tuple context
              ExecInitExpr

      ExecutorRun
          ExecProcNode --- recursively called in per-query context
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4b594d4..6255cc6 100644
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
*************** standard_ExecutorStart(QueryDesc *queryD
*** 252,268 ****
      estate->es_instrument = queryDesc->instrument_options;

      /*
-      * Initialize the plan state tree
-      */
-     InitPlan(queryDesc, eflags);
-
-     /*
       * Set up an AFTER-trigger statement context, unless told not to, or
       * unless it's EXPLAIN-only mode (when ExecutorFinish won't be called).
       */
      if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
          AfterTriggerBeginQuery();

      MemoryContextSwitchTo(oldcontext);
  }

--- 252,268 ----
      estate->es_instrument = queryDesc->instrument_options;

      /*
       * Set up an AFTER-trigger statement context, unless told not to, or
       * unless it's EXPLAIN-only mode (when ExecutorFinish won't be called).
       */
      if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
          AfterTriggerBeginQuery();

+     /*
+      * Initialize the plan state tree
+      */
+     InitPlan(queryDesc, eflags);
+
      MemoryContextSwitchTo(oldcontext);
  }

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 49586a3..845c409 100644
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
*************** ExecInsert(ModifyTableState *mtstate,
*** 343,348 ****
--- 343,351 ----
                  mtstate->mt_transition_capture->tcs_map = NULL;
              }
          }
+         if (mtstate->mt_oc_transition_capture != NULL)
+             mtstate->mt_oc_transition_capture->tcs_map =
+                 mtstate->mt_transition_tupconv_maps[leaf_part_index];

          /*
           * We might need to convert from the parent rowtype to the partition
*************** lreplace:;
*** 1158,1163 ****
--- 1161,1168 ----
      /* AFTER ROW UPDATE Triggers */
      ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, tuple,
                           recheckIndexes,
+                          mtstate->operation == CMD_INSERT ?
+                          mtstate->mt_oc_transition_capture :
                           mtstate->mt_transition_capture);

      list_free(recheckIndexes);
*************** fireASTriggers(ModifyTableState *node)
*** 1444,1450 ****
              if (node->mt_onconflict == ONCONFLICT_UPDATE)
                  ExecASUpdateTriggers(node->ps.state,
                                       resultRelInfo,
!                                      node->mt_transition_capture);
              ExecASInsertTriggers(node->ps.state, resultRelInfo,
                                   node->mt_transition_capture);
              break;
--- 1449,1455 ----
              if (node->mt_onconflict == ONCONFLICT_UPDATE)
                  ExecASUpdateTriggers(node->ps.state,
                                       resultRelInfo,
!                                      node->mt_oc_transition_capture);
              ExecASInsertTriggers(node->ps.state, resultRelInfo,
                                   node->mt_transition_capture);
              break;
*************** ExecSetupTransitionCaptureState(ModifyTa
*** 1474,1487 ****

      /* Check for transition tables on the directly targeted relation. */
      mtstate->mt_transition_capture =
!         MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc);

      /*
       * If we found that we need to collect transition tuples then we may also
       * need tuple conversion maps for any children that have TupleDescs that
!      * aren't compatible with the tuplestores.
       */
!     if (mtstate->mt_transition_capture != NULL)
      {
          ResultRelInfo *resultRelInfos;
          int            numResultRelInfos;
--- 1479,1502 ----

      /* Check for transition tables on the directly targeted relation. */
      mtstate->mt_transition_capture =
!         MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
!                                    RelationGetRelid(targetRelInfo->ri_RelationDesc),
!                                    mtstate->operation);
!     if (mtstate->operation == CMD_INSERT &&
!         mtstate->mt_onconflict == ONCONFLICT_UPDATE)
!         mtstate->mt_oc_transition_capture =
!             MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
!                                        RelationGetRelid(targetRelInfo->ri_RelationDesc),
!                                        CMD_UPDATE);

      /*
       * If we found that we need to collect transition tuples then we may also
       * need tuple conversion maps for any children that have TupleDescs that
!      * aren't compatible with the tuplestores.  (We can share these maps
!      * between the regular and ON CONFLICT cases.)
       */
!     if (mtstate->mt_transition_capture != NULL ||
!         mtstate->mt_oc_transition_capture != NULL)
      {
          ResultRelInfo *resultRelInfos;
          int            numResultRelInfos;
*************** ExecSetupTransitionCaptureState(ModifyTa
*** 1522,1531 ****
          /*
           * Install the conversion map for the first plan for UPDATE and DELETE
           * operations.  It will be advanced each time we switch to the next
!          * plan.  (INSERT operations set it every time.)
           */
!         mtstate->mt_transition_capture->tcs_map =
!             mtstate->mt_transition_tupconv_maps[0];
      }
  }

--- 1537,1548 ----
          /*
           * Install the conversion map for the first plan for UPDATE and DELETE
           * operations.  It will be advanced each time we switch to the next
!          * plan.  (INSERT operations set it every time, so we need not update
!          * mtstate->mt_oc_transition_capture here.)
           */
!         if (mtstate->mt_transition_capture)
!             mtstate->mt_transition_capture->tcs_map =
!                 mtstate->mt_transition_tupconv_maps[0];
      }
  }

*************** ExecModifyTable(PlanState *pstate)
*** 1629,1641 ****
                  estate->es_result_relation_info = resultRelInfo;
                  EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
                                      node->mt_arowmarks[node->mt_whichplan]);
                  if (node->mt_transition_capture != NULL)
                  {
-                     /* Prepare to convert transition tuples from this child. */
                      Assert(node->mt_transition_tupconv_maps != NULL);
                      node->mt_transition_capture->tcs_map =
                          node->mt_transition_tupconv_maps[node->mt_whichplan];
                  }
                  continue;
              }
              else
--- 1646,1664 ----
                  estate->es_result_relation_info = resultRelInfo;
                  EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
                                      node->mt_arowmarks[node->mt_whichplan]);
+                 /* Prepare to convert transition tuples from this child. */
                  if (node->mt_transition_capture != NULL)
                  {
                      Assert(node->mt_transition_tupconv_maps != NULL);
                      node->mt_transition_capture->tcs_map =
                          node->mt_transition_tupconv_maps[node->mt_whichplan];
                  }
+                 if (node->mt_oc_transition_capture != NULL)
+                 {
+                     Assert(node->mt_transition_tupconv_maps != NULL);
+                     node->mt_oc_transition_capture->tcs_map =
+                         node->mt_transition_tupconv_maps[node->mt_whichplan];
+                 }
                  continue;
              }
              else
*************** ExecInitModifyTable(ModifyTable *node, E
*** 1934,1941 ****
          mtstate->mt_partition_tuple_slot = partition_tuple_slot;
      }

!     /* Build state for collecting transition tuples */
!     ExecSetupTransitionCaptureState(mtstate, estate);

      /*
       * Initialize any WITH CHECK OPTION constraints if needed.
--- 1957,1968 ----
          mtstate->mt_partition_tuple_slot = partition_tuple_slot;
      }

!     /*
!      * Build state for collecting transition tuples.  This requires having a
!      * valid trigger query context, so skip it in explain-only mode.
!      */
!     if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
!         ExecSetupTransitionCaptureState(mtstate, estate);

      /*
       * Initialize any WITH CHECK OPTION constraints if needed.
*************** ExecEndModifyTable(ModifyTableState *nod
*** 2319,2334 ****
      int            i;

      /*
-      * Free transition tables, unless this query is being run in
-      * EXEC_FLAG_SKIP_TRIGGERS mode, which means that it may have queued AFTER
-      * triggers that won't be run till later.  In that case we'll just leak
-      * the transition tables till end of (sub)transaction.
-      */
-     if (node->mt_transition_capture != NULL &&
-         !(node->ps.state->es_top_eflags & EXEC_FLAG_SKIP_TRIGGERS))
-         DestroyTransitionCaptureState(node->mt_transition_capture);
-
-     /*
       * Allow any FDWs to shut down
       */
      for (i = 0; i < node->mt_nplans; i++)
--- 2346,2351 ----
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index aeb363f..c6a5684 100644
*** a/src/include/commands/trigger.h
--- b/src/include/commands/trigger.h
*************** typedef struct TriggerData
*** 43,49 ****

  /*
   * The state for capturing old and new tuples into transition tables for a
!  * single ModifyTable node.
   */
  typedef struct TransitionCaptureState
  {
--- 43,54 ----

  /*
   * The state for capturing old and new tuples into transition tables for a
!  * single ModifyTable node (or other operation source, e.g. copy.c).
!  *
!  * This is per-caller to avoid conflicts in setting tcs_map or
!  * tcs_original_insert_tuple.  Note, however, that the pointed-to
!  * tuplestores may be shared across multiple callers; they are managed
!  * by trigger.c.
   */
  typedef struct TransitionCaptureState
  {
*************** typedef struct TransitionCaptureState
*** 60,66 ****
       * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
       * new and old tuples from a child table's format to the format of the
       * relation named in a query so that it is compatible with the transition
!      * tuplestores.
       */
      TupleConversionMap *tcs_map;

--- 65,71 ----
       * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
       * new and old tuples from a child table's format to the format of the
       * relation named in a query so that it is compatible with the transition
!      * tuplestores.  The caller must store the conversion map here if so.
       */
      TupleConversionMap *tcs_map;

*************** typedef struct TransitionCaptureState
*** 74,90 ****
      HeapTuple    tcs_original_insert_tuple;

      /*
!      * The tuplestores backing the transition tables.  We use separate
!      * tuplestores for INSERT and UPDATE, because INSERT ... ON CONFLICT ...
!      * DO UPDATE causes INSERT and UPDATE triggers to fire and needs a way to
!      * keep track of the new tuple images resulting from the two cases
!      * separately.  We only need a single old image tuplestore, because there
!      * is no statement that can both update and delete at the same time.
       */
!     Tuplestorestate *tcs_old_tuplestore;    /* for DELETE and UPDATE old
!                                              * images */
!     Tuplestorestate *tcs_insert_tuplestore; /* for INSERT new images */
!     Tuplestorestate *tcs_update_tuplestore; /* for UPDATE new images */
  } TransitionCaptureState;

  /*
--- 79,91 ----
      HeapTuple    tcs_original_insert_tuple;

      /*
!      * The tuplestore(s) into which to insert tuples.  Either may be NULL if
!      * not needed for the operation type.
       */
!     Tuplestorestate *tcs_old_tuplestore;    /* for DELETE and UPDATE-old
!                                              * tuples */
!     Tuplestorestate *tcs_new_tuplestore;    /* for INSERT and UPDATE-new
!                                              * tuples */
  } TransitionCaptureState;

  /*
*************** extern void RelationBuildTriggers(Relati
*** 174,181 ****
  extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc);

  extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc);
! extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc);
! extern void DestroyTransitionCaptureState(TransitionCaptureState *tcs);

  extern void FreeTriggerDesc(TriggerDesc *trigdesc);

--- 175,183 ----
  extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc);

  extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc);
!
! extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc,
!                            Oid relid, CmdType cmdType);

  extern void FreeTriggerDesc(TriggerDesc *trigdesc);

diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 90a60ab..c6d3021 100644
*** a/src/include/nodes/execnodes.h
--- b/src/include/nodes/execnodes.h
*************** typedef struct ModifyTableState
*** 983,989 ****
      /* Per partition tuple conversion map */
      TupleTableSlot *mt_partition_tuple_slot;
      struct TransitionCaptureState *mt_transition_capture;
!     /* controls transition table population */
      TupleConversionMap **mt_transition_tupconv_maps;
      /* Per plan/partition tuple conversion */
  } ModifyTableState;
--- 983,991 ----
      /* Per partition tuple conversion map */
      TupleTableSlot *mt_partition_tuple_slot;
      struct TransitionCaptureState *mt_transition_capture;
!     /* controls transition table population for specified operation */
!     struct TransitionCaptureState *mt_oc_transition_capture;
!     /* controls transition table population for INSERT...ON CONFLICT UPDATE */
      TupleConversionMap **mt_transition_tupconv_maps;
      /* Per plan/partition tuple conversion */
  } ModifyTableState;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 620fac1..0987be5 100644
*** a/src/test/regress/expected/triggers.out
--- b/src/test/regress/expected/triggers.out
*************** with wcte as (insert into table1 values
*** 2217,2222 ****
--- 2217,2239 ----
    insert into table2 values ('hello world');
  NOTICE:  trigger = table2_trig, new table = ("hello world")
  NOTICE:  trigger = table1_trig, new table = (42)
+ with wcte as (insert into table1 values (43))
+   insert into table1 values (44);
+ NOTICE:  trigger = table1_trig, new table = (43), (44)
+ select * from table1;
+  a
+ ----
+  42
+  44
+  43
+ (3 rows)
+
+ select * from table2;
+       a
+ -------------
+  hello world
+ (1 row)
+
  drop table table1;
  drop table table2;
  --
*************** create trigger my_table_multievent_trig
*** 2256,2261 ****
--- 2273,2286 ----
    after insert or update on my_table referencing new table as new_table
    for each statement execute procedure dump_insert();
  ERROR:  transition tables cannot be specified for triggers with more than one event
+ --
+ -- Verify that you can't create a trigger with transition tables with
+ -- a column list.
+ --
+ create trigger my_table_col_update_trig
+   after update of b on my_table referencing new table as new_table
+   for each statement execute procedure dump_insert();
+ ERROR:  transition tables cannot be specified for triggers with column lists
  drop table my_table;
  --
  -- Test firing of triggers with transition tables by foreign key cascades
*************** select * from trig_table;
*** 2299,2306 ****
  (6 rows)

  delete from refd_table where length(b) = 3;
! NOTICE:  trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b")
! NOTICE:  trigger = trig_table_delete_trig, old table = (11,"one a"), (11,"one b")
  select * from trig_table;
   a |    b
  ---+---------
--- 2324,2330 ----
  (6 rows)

  delete from refd_table where length(b) = 3;
! NOTICE:  trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b"), (11,"one a"), (11,"one b")
  select * from trig_table;
   a |    b
  ---+---------
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index c6deb56..10eee76 100644
*** a/src/test/regress/sql/triggers.sql
--- b/src/test/regress/sql/triggers.sql
*************** create trigger table2_trig
*** 1729,1734 ****
--- 1729,1740 ----
  with wcte as (insert into table1 values (42))
    insert into table2 values ('hello world');

+ with wcte as (insert into table1 values (43))
+   insert into table1 values (44);
+
+ select * from table1;
+ select * from table2;
+
  drop table table1;
  drop table table2;

*************** create trigger my_table_multievent_trig
*** 1769,1774 ****
--- 1775,1789 ----
    after insert or update on my_table referencing new table as new_table
    for each statement execute procedure dump_insert();

+ --
+ -- Verify that you can't create a trigger with transition tables with
+ -- a column list.
+ --
+
+ create trigger my_table_col_update_trig
+   after update of b on my_table referencing new table as new_table
+   for each statement execute procedure dump_insert();
+
  drop table my_table;

  --

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [BUGS] BUG #14813: pg_get_serial_sequence does not return seqencename for IDENTITY columns
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: [BUGS] BUG #14808: V10-beta4, backend abort