Re: [BUGS] BUG #14808: V10-beta4, backend abort

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [BUGS] BUG #14808: V10-beta4, backend abort
Дата
Msg-id 32587.1505503595@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [BUGS] BUG #14808: V10-beta4, backend abort  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [BUGS] BUG #14808: V10-beta4, backend abort  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-bugs
I wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> What is going on here?
>> ...
>> insert into foo values (1, null), (2, 1);
>> 
>> postgres=# delete from foo where a = 1;
>> NOTICE:  trigger = foo_r_trig, old table = (1,), depth = 1
>> NOTICE:  trigger = foo_s_trig, old table = (1,), depth = 1
>> NOTICE:  trigger = foo_r_trig, old table = (2,1), depth = 1
>> NOTICE:  trigger = foo_s_trig, old table = (2,1), depth = 1
>> NOTICE:  trigger = foo_s_trig, old table = <NULL>, depth = 1
>> DELETE 1

> OK, now that I'm a little more awake, I looked at this, and I think it's
> actually an instance of the "closed transition table" behavior you were
> asking about.

Attached is an updated patch that incorporates the ideas you suggested.
I added an extended version of this example, which has an additional
level of FK cascade happening:

insert into self_ref values (1, null), (2, 1), (3, 2);
delete from self_ref where a = 1;
NOTICE:  trigger = self_ref_r_trig, old table = (1,), (2,1)
NOTICE:  trigger = self_ref_r_trig, old table = (1,), (2,1)
NOTICE:  trigger = self_ref_s_trig, old table = (1,), (2,1)
NOTICE:  trigger = self_ref_r_trig, old table = (3,2)
NOTICE:  trigger = self_ref_s_trig, old table = (3,2)

What happens here is that the outer delete queues AR triggers for
RI enforcement and self_ref_r_trig, plus an AS trigger for
self_ref_s_trig.  Then the RI enforcement trigger deletes (2,1)
and queues AR+AS triggers for that.  At this point the initial
transition table is still open, so (2,1) goes into that table,
and we look back and cancel the previous queuing of self_ref_s_trig.
Now it's time for the first firing of self_ref_r_trig, and so now
we mark the transition table closed.  Then we skip the cancelled
self_ref_s_trig call, and then it's time for the second RI enforcement
trigger to fire, which deletes (3,2) but has to put it into a new
transition table.  Again we queue AR+AS triggers, but this time we
can't cancel the preceding AS call.  Then we fire self_ref_r_trig
again (for the (2,1) row), and then fire self_ref_s_trig; both of
them see the same transition table the first self_ref_r_trig call
did.  Now it's time for the third RI enforcement trigger; it finds
nothing to delete, so it adds nothing to the second transition table,
but it does queue an AS trigger call (canceling the one added by the
second RI trigger).  Finally we have the AR call queued by the second
RI trigger, and then the AS call queued by the third RI trigger,
both looking at the second transition table.

This is pretty messy but I think it's the best we can do as long as
RI actions are intermixed with other AFTER ROW triggers.  Maybe with
Kevin's ideas about converting RI actions to be statement-level,
we could arrange for all three deletions to show up in one transition
table ... but I don't know how we cause that to happen before the
user's AFTER ROW triggers run.  In any case, nothing will look odd
unless you have AR triggers using transition tables, which seems like
a niche usage case in the first place.

I also realized that we could undo the bloat I added to
AfterTriggerSharedData by storing a pointer to the AfterTriggersTableData
rather than the tuplestores themselves.

I feel that this is probably committable, except that I still need
to look for documentation changes.

            regards, tom lane

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfa3f05..c6fa445 100644
*** a/src/backend/commands/copy.c
--- b/src/backend/commands/copy.c
*************** CopyFrom(CopyState cstate)
*** 2432,2443 ****
      /* Triggers might need a slot as well */
      estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);

      /*
       * If there are any triggers with transition tables on the named relation,
       * we need to be prepared to capture transition tuples.
       */
      cstate->transition_capture =
!         MakeTransitionCaptureState(cstate->rel->trigdesc);

      /*
       * If the named relation is a partitioned table, initialize state for
--- 2432,2448 ----
      /* Triggers might need a slot as well */
      estate->es_trig_tuple_slot = ExecInitExtraTupleSlot(estate);

+     /* Prepare to catch AFTER triggers. */
+     AfterTriggerBeginQuery();
+
      /*
       * If there are any triggers with transition tables on the named relation,
       * we need to be prepared to capture transition tuples.
       */
      cstate->transition_capture =
!         MakeTransitionCaptureState(cstate->rel->trigdesc,
!                                    RelationGetRelid(cstate->rel),
!                                    CMD_INSERT);

      /*
       * If the named relation is a partitioned table, initialize state for
*************** CopyFrom(CopyState cstate)
*** 2513,2521 ****
          bufferedTuples = palloc(MAX_BUFFERED_TUPLES * sizeof(HeapTuple));
      }

-     /* Prepare to catch AFTER triggers. */
-     AfterTriggerBeginQuery();
-
      /*
       * Check BEFORE STATEMENT insertion triggers. It's debatable whether we
       * should do this for COPY, since it's not really an "INSERT" statement as
--- 2518,2523 ----
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 269c9e1..dfe1348 100644
*** a/src/backend/commands/trigger.c
--- b/src/backend/commands/trigger.c
*************** CreateTrigger(CreateTrigStmt *stmt, cons
*** 234,239 ****
--- 234,244 ----
                              RelationGetRelationName(rel)),
                       errdetail("Foreign tables cannot have TRUNCATE triggers.")));

+         /*
+          * We disallow constraint triggers to protect the assumption that
+          * triggers on FKs can't be deferred.  See notes with AfterTriggers
+          * data structures, below.
+          */
          if (stmt->isconstraint)
              ereport(ERROR,
                      (errcode(ERRCODE_WRONG_OBJECT_TYPE),
*************** CreateTrigger(CreateTrigStmt *stmt, cons
*** 418,423 ****
--- 423,448 ----
                          (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                           errmsg("transition tables cannot be specified for triggers with more than one event")));

+             /*
+              * We currently don't allow column-specific triggers with
+              * transition tables.  Per spec, that seems to require
+              * accumulating separate transition tables for each combination of
+              * columns, which is a lot of work for a rather marginal feature.
+              */
+             if (stmt->columns != NIL)
+                 ereport(ERROR,
+                         (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                          errmsg("transition tables cannot be specified for triggers with column lists")));
+
+             /*
+              * We disallow constraint triggers with transition tables, to
+              * protect the assumption that such triggers can't be deferred.
+              * See notes with AfterTriggers data structures, below.
+              *
+              * Currently this is enforced by the grammar, so just Assert here.
+              */
+             Assert(!stmt->isconstraint);
+
              if (tt->isNew)
              {
                  if (!(TRIGGER_FOR_INSERT(tgtype) ||
*************** FindTriggerIncompatibleWithInheritance(T
*** 2086,2181 ****
  }

  /*
-  * Make a TransitionCaptureState object from a given TriggerDesc.  The
-  * resulting object holds the flags which control whether transition tuples
-  * are collected when tables are modified, and the tuplestores themselves.
-  * Note that we copy the flags from a parent table into this struct (rather
-  * than using each relation's TriggerDesc directly) so that we can use it to
-  * control the collection of transition tuples from child tables.
-  *
-  * If there are no triggers with transition tables configured for 'trigdesc',
-  * then return NULL.
-  *
-  * The resulting object can be passed to the ExecAR* functions.  The caller
-  * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
-  * with child tables.
-  */
- TransitionCaptureState *
- MakeTransitionCaptureState(TriggerDesc *trigdesc)
- {
-     TransitionCaptureState *state = NULL;
-
-     if (trigdesc != NULL &&
-         (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table ||
-          trigdesc->trig_update_new_table || trigdesc->trig_insert_new_table))
-     {
-         MemoryContext oldcxt;
-         ResourceOwner saveResourceOwner;
-
-         /*
-          * Normally DestroyTransitionCaptureState should be called after
-          * executing all AFTER triggers for the current statement.
-          *
-          * To handle error cleanup, TransitionCaptureState and the tuplestores
-          * it contains will live in the current [sub]transaction's memory
-          * context.  Likewise for the current resource owner, because we also
-          * want to clean up temporary files spilled to disk by the tuplestore
-          * in that scenario.  This scope is sufficient, because AFTER triggers
-          * with transition tables cannot be deferred (only constraint triggers
-          * can be deferred, and constraint triggers cannot have transition
-          * tables).  The AFTER trigger queue may contain pointers to this
-          * TransitionCaptureState, but any such entries will be processed or
-          * discarded before the end of the current [sub]transaction.
-          *
-          * If a future release allows deferred triggers with transition
-          * tables, we'll need to reconsider the scope of the
-          * TransitionCaptureState object.
-          */
-         oldcxt = MemoryContextSwitchTo(CurTransactionContext);
-         saveResourceOwner = CurrentResourceOwner;
-
-         state = (TransitionCaptureState *)
-             palloc0(sizeof(TransitionCaptureState));
-         state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
-         state->tcs_update_old_table = trigdesc->trig_update_old_table;
-         state->tcs_update_new_table = trigdesc->trig_update_new_table;
-         state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
-         PG_TRY();
-         {
-             CurrentResourceOwner = CurTransactionResourceOwner;
-             if (trigdesc->trig_delete_old_table || trigdesc->trig_update_old_table)
-                 state->tcs_old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
-             if (trigdesc->trig_insert_new_table)
-                 state->tcs_insert_tuplestore = tuplestore_begin_heap(false, false, work_mem);
-             if (trigdesc->trig_update_new_table)
-                 state->tcs_update_tuplestore = tuplestore_begin_heap(false, false, work_mem);
-         }
-         PG_CATCH();
-         {
-             CurrentResourceOwner = saveResourceOwner;
-             PG_RE_THROW();
-         }
-         PG_END_TRY();
-         CurrentResourceOwner = saveResourceOwner;
-         MemoryContextSwitchTo(oldcxt);
-     }
-
-     return state;
- }
-
- void
- DestroyTransitionCaptureState(TransitionCaptureState *tcs)
- {
-     if (tcs->tcs_insert_tuplestore != NULL)
-         tuplestore_end(tcs->tcs_insert_tuplestore);
-     if (tcs->tcs_update_tuplestore != NULL)
-         tuplestore_end(tcs->tcs_update_tuplestore);
-     if (tcs->tcs_old_tuplestore != NULL)
-         tuplestore_end(tcs->tcs_old_tuplestore);
-     pfree(tcs);
- }
-
- /*
   * Call a trigger function.
   *
   *        trigdata: trigger descriptor.
--- 2111,2116 ----
*************** TriggerEnabled(EState *estate, ResultRel
*** 3338,3346 ****
   * during the current transaction tree.  (BEFORE triggers are fired
   * immediately so we don't need any persistent state about them.)  The struct
   * and most of its subsidiary data are kept in TopTransactionContext; however
!  * the individual event records are kept in a separate sub-context.  This is
!  * done mainly so that it's easy to tell from a memory context dump how much
!  * space is being eaten by trigger events.
   *
   * Because the list of pending events can grow large, we go to some
   * considerable effort to minimize per-event memory consumption.  The event
--- 3273,3283 ----
   * during the current transaction tree.  (BEFORE triggers are fired
   * immediately so we don't need any persistent state about them.)  The struct
   * and most of its subsidiary data are kept in TopTransactionContext; however
!  * some data that can be discarded sooner appears in the CurTransactionContext
!  * of the relevant subtransaction.  Also, the individual event records are
!  * kept in a separate sub-context of TopTransactionContext.  This is done
!  * mainly so that it's easy to tell from a memory context dump how much space
!  * is being eaten by trigger events.
   *
   * Because the list of pending events can grow large, we go to some
   * considerable effort to minimize per-event memory consumption.  The event
*************** typedef SetConstraintStateData *SetConst
*** 3400,3405 ****
--- 3337,3349 ----
   * tuple(s).  This permits storing tuples once regardless of the number of
   * row-level triggers on a foreign table.
   *
+  * Note that we need triggers on foreign tables to be fired in exactly the
+  * order they were queued, so that the tuples come out of the tuplestore in
+  * the right order.  To ensure that, we forbid deferrable (constraint)
+  * triggers on foreign tables.  This also ensures that such triggers do not
+  * get deferred into outer trigger query levels, meaning that it's okay to
+  * destroy the tuplestore at the end of the query level.
+  *
   * Statement-level triggers always bear AFTER_TRIGGER_1CTID, though they
   * require no ctid field.  We lack the flag bit space to neatly represent that
   * distinct case, and it seems unlikely to be worth much trouble.
*************** typedef struct AfterTriggerSharedData
*** 3433,3439 ****
      Oid            ats_tgoid;        /* the trigger's ID */
      Oid            ats_relid;        /* the relation it's on */
      CommandId    ats_firing_id;    /* ID for firing cycle */
!     TransitionCaptureState *ats_transition_capture;
  } AfterTriggerSharedData;

  typedef struct AfterTriggerEventData *AfterTriggerEvent;
--- 3377,3383 ----
      Oid            ats_tgoid;        /* the trigger's ID */
      Oid            ats_relid;        /* the relation it's on */
      CommandId    ats_firing_id;    /* ID for firing cycle */
!     struct AfterTriggersTableData *ats_table;    /* transition table access */
  } AfterTriggerSharedData;

  typedef struct AfterTriggerEventData *AfterTriggerEvent;
*************** typedef struct AfterTriggerEventList
*** 3505,3510 ****
--- 3449,3462 ----
  #define for_each_event_chunk(eptr, cptr, evtlist) \
      for_each_chunk(cptr, evtlist) for_each_event(eptr, cptr)

+ /* Macros for iterating from a start point that might not be list start */
+ #define for_each_chunk_from(cptr) \
+     for (; cptr != NULL; cptr = cptr->next)
+ #define for_each_event_from(eptr, cptr) \
+     for (; \
+          (char *) eptr < (cptr)->freeptr; \
+          eptr = (AfterTriggerEvent) (((char *) eptr) + SizeofTriggerEvent(eptr)))
+

  /*
   * All per-transaction data for the AFTER TRIGGERS module.
*************** typedef struct AfterTriggerEventList
*** 3529,3588 ****
   * query_depth is the current depth of nested AfterTriggerBeginQuery calls
   * (-1 when the stack is empty).
   *
!  * query_stack[query_depth] is a list of AFTER trigger events queued by the
!  * current query (and the query_stack entries below it are lists of trigger
!  * events queued by calling queries).  None of these are valid until the
!  * matching AfterTriggerEndQuery call occurs.  At that point we fire
!  * immediate-mode triggers, and append any deferred events to the main events
!  * list.
   *
!  * fdw_tuplestores[query_depth] is a tuplestore containing the foreign tuples
!  * needed for the current query.
   *
!  * maxquerydepth is just the allocated length of query_stack and the
!  * tuplestores.
   *
!  * state_stack is a stack of pointers to saved copies of the SET CONSTRAINTS
!  * state data; each subtransaction level that modifies that state first
   * saves a copy, which we use to restore the state if we abort.
   *
!  * events_stack is a stack of copies of the events head/tail pointers,
   * which we use to restore those values during subtransaction abort.
   *
!  * depth_stack is a stack of copies of subtransaction-start-time query_depth,
   * which we similarly use to clean up at subtransaction abort.
   *
!  * firing_stack is a stack of copies of subtransaction-start-time
!  * firing_counter.  We use this to recognize which deferred triggers were
!  * fired (or marked for firing) within an aborted subtransaction.
   *
   * We use GetCurrentTransactionNestLevel() to determine the correct array
!  * index in these stacks.  maxtransdepth is the number of allocated entries in
!  * each stack.  (By not keeping our own stack pointer, we can avoid trouble
   * in cases where errors during subxact abort cause multiple invocations
   * of AfterTriggerEndSubXact() at the same nesting depth.)
   */
  typedef struct AfterTriggersData
  {
      CommandId    firing_counter; /* next firing ID to assign */
      SetConstraintState state;    /* the active S C state */
      AfterTriggerEventList events;    /* deferred-event list */
-     int            query_depth;    /* current query list index */
-     AfterTriggerEventList *query_stack; /* events pending from each query */
-     Tuplestorestate **fdw_tuplestores;    /* foreign tuples for one row from
-                                          * each query */
-     int            maxquerydepth;    /* allocated len of above array */
      MemoryContext event_cxt;    /* memory context for events, if any */

!     /* these fields are just for resetting at subtrans abort: */

!     SetConstraintState *state_stack;    /* stacked S C states */
!     AfterTriggerEventList *events_stack;    /* stacked list pointers */
!     int           *depth_stack;    /* stacked query_depths */
!     CommandId  *firing_stack;    /* stacked firing_counters */
!     int            maxtransdepth;    /* allocated len of above arrays */
  } AfterTriggersData;

  static AfterTriggersData afterTriggers;

  static void AfterTriggerExecute(AfterTriggerEvent event,
--- 3481,3587 ----
   * query_depth is the current depth of nested AfterTriggerBeginQuery calls
   * (-1 when the stack is empty).
   *
!  * query_stack[query_depth] is the per-query-level data, including these fields:
   *
!  * events is a list of AFTER trigger events queued by the current query.
!  * None of these are valid until the matching AfterTriggerEndQuery call
!  * occurs.  At that point we fire immediate-mode triggers, and append any
!  * deferred events to the main events list.
   *
!  * fdw_tuplestore is a tuplestore containing the foreign-table tuples
!  * needed by events queued by the current query.  (Note: we use just one
!  * tuplestore even though more than one foreign table might be involved.
!  * This is okay because tuplestores don't really care what's in the tuples
!  * they store; but it's possible that someday it'd break.)
   *
!  * tables is a List of AfterTriggersTableData structs for target tables
!  * of the current query (see below).
!  *
!  * maxquerydepth is just the allocated length of query_stack.
!  *
!  * trans_stack holds per-subtransaction data, including these fields:
!  *
!  * state is NULL or a pointer to a saved copy of the SET CONSTRAINTS
!  * state data.  Each subtransaction level that modifies that state first
   * saves a copy, which we use to restore the state if we abort.
   *
!  * events is a copy of the events head/tail pointers,
   * which we use to restore those values during subtransaction abort.
   *
!  * query_depth is the subtransaction-start-time value of query_depth,
   * which we similarly use to clean up at subtransaction abort.
   *
!  * firing_counter is the subtransaction-start-time value of firing_counter.
!  * We use this to recognize which deferred triggers were fired (or marked
!  * for firing) within an aborted subtransaction.
   *
   * We use GetCurrentTransactionNestLevel() to determine the correct array
!  * index in trans_stack.  maxtransdepth is the number of allocated entries in
!  * trans_stack.  (By not keeping our own stack pointer, we can avoid trouble
   * in cases where errors during subxact abort cause multiple invocations
   * of AfterTriggerEndSubXact() at the same nesting depth.)
+  *
+  * We create an AfterTriggersTableData struct for each target table of the
+  * current query, and each operation mode (INSERT/UPDATE/DELETE), that has
+  * either transition tables or AFTER STATEMENT triggers.  This is used to
+  * hold the relevant transition tables, as well as info tracking whether
+  * we already queued the AFTER STATEMENT triggers.  (We use that info to
+  * prevent, as much as possible, firing the same AFTER STATEMENT trigger
+  * more than once per statement.)  These structs, along with the transition
+  * table tuplestores, live in the (sub)transaction's CurTransactionContext.
+  * That's sufficient lifespan because we don't allow transition tables to be
+  * used by deferrable triggers, so they only need to survive until
+  * AfterTriggerEndQuery.
   */
+ typedef struct AfterTriggersQueryData AfterTriggersQueryData;
+ typedef struct AfterTriggersTransData AfterTriggersTransData;
+ typedef struct AfterTriggersTableData AfterTriggersTableData;
+
  typedef struct AfterTriggersData
  {
      CommandId    firing_counter; /* next firing ID to assign */
      SetConstraintState state;    /* the active S C state */
      AfterTriggerEventList events;    /* deferred-event list */
      MemoryContext event_cxt;    /* memory context for events, if any */

!     /* per-query-level data: */
!     AfterTriggersQueryData *query_stack;    /* array of structs shown above */
!     int            query_depth;    /* current index in above array */
!     int            maxquerydepth;    /* allocated len of above array */

!     /* per-subtransaction-level data: */
!     AfterTriggersTransData *trans_stack;    /* array of structs shown above */
!     int            maxtransdepth;    /* allocated len of above array */
  } AfterTriggersData;

+ struct AfterTriggersQueryData
+ {
+     AfterTriggerEventList events;    /* events pending from this query */
+     Tuplestorestate *fdw_tuplestore;    /* foreign tuples for said events */
+     List       *tables;            /* list of AfterTriggersTableData */
+ };
+
+ struct AfterTriggersTransData
+ {
+     /* these fields are just for resetting at subtrans abort: */
+     SetConstraintState state;    /* saved S C state, or NULL if not yet saved */
+     AfterTriggerEventList events;    /* saved list pointer */
+     int            query_depth;    /* saved query_depth */
+     CommandId    firing_counter; /* saved firing_counter */
+ };
+
+ struct AfterTriggersTableData
+ {
+     /* relid + cmdType form the lookup key for these structs: */
+     Oid            relid;            /* target table's OID */
+     CmdType        cmdType;        /* event type, CMD_INSERT/UPDATE/DELETE */
+     bool        closed;            /* true when no longer OK to add tuples */
+     bool        stmt_trig_done; /* did we already queue stmt-level triggers? */
+     AfterTriggerEventList stmt_trig_events; /* if so, saved list pointer */
+     Tuplestorestate *old_tuplestore;    /* "old" transition table, if any */
+     Tuplestorestate *new_tuplestore;    /* "new" transition table, if any */
+ };
+
  static AfterTriggersData afterTriggers;

  static void AfterTriggerExecute(AfterTriggerEvent event,
*************** static void AfterTriggerExecute(AfterTri
*** 3591,3628 ****
                      Instrumentation *instr,
                      MemoryContext per_tuple_context,
                      TupleTableSlot *trig_tuple_slot1,
!                     TupleTableSlot *trig_tuple_slot2,
!                     TransitionCaptureState *transition_capture);
  static SetConstraintState SetConstraintStateCreate(int numalloc);
  static SetConstraintState SetConstraintStateCopy(SetConstraintState state);
  static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
                            Oid tgoid, bool tgisdeferred);


  /*
!  * Gets a current query transition tuplestore and initializes it if necessary.
   */
  static Tuplestorestate *
! GetTriggerTransitionTuplestore(Tuplestorestate **tss)
  {
      Tuplestorestate *ret;

!     ret = tss[afterTriggers.query_depth];
      if (ret == NULL)
      {
          MemoryContext oldcxt;
          ResourceOwner saveResourceOwner;

          /*
!          * Make the tuplestore valid until end of transaction.  This is the
!          * allocation lifespan of the associated events list, but we really
           * only need it until AfterTriggerEndQuery().
           */
!         oldcxt = MemoryContextSwitchTo(TopTransactionContext);
          saveResourceOwner = CurrentResourceOwner;
          PG_TRY();
          {
!             CurrentResourceOwner = TopTransactionResourceOwner;
              ret = tuplestore_begin_heap(false, false, work_mem);
          }
          PG_CATCH();
--- 3590,3630 ----
                      Instrumentation *instr,
                      MemoryContext per_tuple_context,
                      TupleTableSlot *trig_tuple_slot1,
!                     TupleTableSlot *trig_tuple_slot2);
! static AfterTriggersTableData *GetAfterTriggersTableData(Oid relid,
!                           CmdType cmdType);
! static void AfterTriggerFreeQuery(AfterTriggersQueryData *qs);
  static SetConstraintState SetConstraintStateCreate(int numalloc);
  static SetConstraintState SetConstraintStateCopy(SetConstraintState state);
  static SetConstraintState SetConstraintStateAddItem(SetConstraintState state,
                            Oid tgoid, bool tgisdeferred);
+ static void cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent);


  /*
!  * Get the FDW tuplestore for the current trigger query level, creating it
!  * if necessary.
   */
  static Tuplestorestate *
! GetCurrentFDWTuplestore(void)
  {
      Tuplestorestate *ret;

!     ret = afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore;
      if (ret == NULL)
      {
          MemoryContext oldcxt;
          ResourceOwner saveResourceOwner;

          /*
!          * Make the tuplestore valid until end of subtransaction.  We really
           * only need it until AfterTriggerEndQuery().
           */
!         oldcxt = MemoryContextSwitchTo(CurTransactionContext);
          saveResourceOwner = CurrentResourceOwner;
          PG_TRY();
          {
!             CurrentResourceOwner = CurTransactionResourceOwner;
              ret = tuplestore_begin_heap(false, false, work_mem);
          }
          PG_CATCH();
*************** GetTriggerTransitionTuplestore(Tuplestor
*** 3634,3640 ****
          CurrentResourceOwner = saveResourceOwner;
          MemoryContextSwitchTo(oldcxt);

!         tss[afterTriggers.query_depth] = ret;
      }

      return ret;
--- 3636,3642 ----
          CurrentResourceOwner = saveResourceOwner;
          MemoryContextSwitchTo(oldcxt);

!         afterTriggers.query_stack[afterTriggers.query_depth].fdw_tuplestore = ret;
      }

      return ret;
*************** afterTriggerAddEvent(AfterTriggerEventLi
*** 3780,3786 ****
          if (newshared->ats_tgoid == evtshared->ats_tgoid &&
              newshared->ats_relid == evtshared->ats_relid &&
              newshared->ats_event == evtshared->ats_event &&
!             newshared->ats_transition_capture == evtshared->ats_transition_capture &&
              newshared->ats_firing_id == 0)
              break;
      }
--- 3782,3788 ----
          if (newshared->ats_tgoid == evtshared->ats_tgoid &&
              newshared->ats_relid == evtshared->ats_relid &&
              newshared->ats_event == evtshared->ats_event &&
!             newshared->ats_table == evtshared->ats_table &&
              newshared->ats_firing_id == 0)
              break;
      }
*************** AfterTriggerExecute(AfterTriggerEvent ev
*** 3892,3899 ****
                      FmgrInfo *finfo, Instrumentation *instr,
                      MemoryContext per_tuple_context,
                      TupleTableSlot *trig_tuple_slot1,
!                     TupleTableSlot *trig_tuple_slot2,
!                     TransitionCaptureState *transition_capture)
  {
      AfterTriggerShared evtshared = GetTriggerSharedData(event);
      Oid            tgoid = evtshared->ats_tgoid;
--- 3894,3900 ----
                      FmgrInfo *finfo, Instrumentation *instr,
                      MemoryContext per_tuple_context,
                      TupleTableSlot *trig_tuple_slot1,
!                     TupleTableSlot *trig_tuple_slot2)
  {
      AfterTriggerShared evtshared = GetTriggerSharedData(event);
      Oid            tgoid = evtshared->ats_tgoid;
*************** AfterTriggerExecute(AfterTriggerEvent ev
*** 3934,3942 ****
      {
          case AFTER_TRIGGER_FDW_FETCH:
              {
!                 Tuplestorestate *fdw_tuplestore =
!                 GetTriggerTransitionTuplestore
!                 (afterTriggers.fdw_tuplestores);

                  if (!tuplestore_gettupleslot(fdw_tuplestore, true, false,
                                               trig_tuple_slot1))
--- 3935,3941 ----
      {
          case AFTER_TRIGGER_FDW_FETCH:
              {
!                 Tuplestorestate *fdw_tuplestore = GetCurrentFDWTuplestore();

                  if (!tuplestore_gettupleslot(fdw_tuplestore, true, false,
                                               trig_tuple_slot1))
*************** AfterTriggerExecute(AfterTriggerEvent ev
*** 4006,4041 ****
      }

      /*
!      * Set up the tuplestore information.
       */
      LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
!     if (transition_capture != NULL)
      {
          if (LocTriggerData.tg_trigger->tgoldtable)
-             LocTriggerData.tg_oldtable = transition_capture->tcs_old_tuplestore;
-         if (LocTriggerData.tg_trigger->tgnewtable)
          {
!             /*
!              * Currently a trigger with transition tables may only be defined
!              * for a single event type (here AFTER INSERT or AFTER UPDATE, but
!              * not AFTER INSERT OR ...).
!              */
!             Assert((TRIGGER_FOR_INSERT(LocTriggerData.tg_trigger->tgtype) != 0) ^
!                    (TRIGGER_FOR_UPDATE(LocTriggerData.tg_trigger->tgtype) != 0));

!             /*
!              * Show either the insert or update new tuple images, depending on
!              * which event type the trigger was registered for.  A single
!              * statement may have produced both in the case of INSERT ... ON
!              * CONFLICT ... DO UPDATE, and in that case the event determines
!              * which tuplestore the trigger sees as the NEW TABLE.
!              */
!             if (TRIGGER_FOR_INSERT(LocTriggerData.tg_trigger->tgtype))
!                 LocTriggerData.tg_newtable =
!                     transition_capture->tcs_insert_tuplestore;
!             else
!                 LocTriggerData.tg_newtable =
!                     transition_capture->tcs_update_tuplestore;
          }
      }

--- 4005,4029 ----
      }

      /*
!      * Set up the tuplestore information to let the trigger have access to
!      * transition tables.  When we first make a transition table available to
!      * a trigger, mark it "closed" so that it cannot change anymore.  If any
!      * additional events of the same type get queued in the current trigger
!      * query level, they'll go into new transition tables.
       */
      LocTriggerData.tg_oldtable = LocTriggerData.tg_newtable = NULL;
!     if (evtshared->ats_table)
      {
          if (LocTriggerData.tg_trigger->tgoldtable)
          {
!             LocTriggerData.tg_oldtable = evtshared->ats_table->old_tuplestore;
!             evtshared->ats_table->closed = true;
!         }

!         if (LocTriggerData.tg_trigger->tgnewtable)
!         {
!             LocTriggerData.tg_newtable = evtshared->ats_table->new_tuplestore;
!             evtshared->ats_table->closed = true;
          }
      }

*************** afterTriggerInvokeEvents(AfterTriggerEve
*** 4245,4252 ****
                   * won't try to re-fire it.
                   */
                  AfterTriggerExecute(event, rel, trigdesc, finfo, instr,
!                                     per_tuple_context, slot1, slot2,
!                                     evtshared->ats_transition_capture);

                  /*
                   * Mark the event as done.
--- 4233,4239 ----
                   * won't try to re-fire it.
                   */
                  AfterTriggerExecute(event, rel, trigdesc, finfo, instr,
!                                     per_tuple_context, slot1, slot2);

                  /*
                   * Mark the event as done.
*************** afterTriggerInvokeEvents(AfterTriggerEve
*** 4296,4301 ****
--- 4283,4448 ----
  }


+ /*
+  * GetAfterTriggersTableData
+  *
+  * Find or create an AfterTriggersTableData struct for the specified
+  * trigger event (relation + operation type).  Ignore existing structs
+  * marked "closed"; we don't want to put any additional tuples into them,
+  * nor change their stmt-triggers-fired state.
+  *
+  * Note: the AfterTriggersTableData list is allocated in the current
+  * (sub)transaction's CurTransactionContext.  This is OK because
+  * we don't need it to live past AfterTriggerEndQuery.
+  */
+ static AfterTriggersTableData *
+ GetAfterTriggersTableData(Oid relid, CmdType cmdType)
+ {
+     AfterTriggersTableData *table;
+     AfterTriggersQueryData *qs;
+     MemoryContext oldcxt;
+     ListCell   *lc;
+
+     /* Caller should have ensured query_depth is OK. */
+     Assert(afterTriggers.query_depth >= 0 &&
+            afterTriggers.query_depth < afterTriggers.maxquerydepth);
+     qs = &afterTriggers.query_stack[afterTriggers.query_depth];
+
+     foreach(lc, qs->tables)
+     {
+         table = (AfterTriggersTableData *) lfirst(lc);
+         if (table->relid == relid && table->cmdType == cmdType &&
+             !table->closed)
+             return table;
+     }
+
+     oldcxt = MemoryContextSwitchTo(CurTransactionContext);
+
+     table = (AfterTriggersTableData *) palloc0(sizeof(AfterTriggersTableData));
+     table->relid = relid;
+     table->cmdType = cmdType;
+     qs->tables = lappend(qs->tables, table);
+
+     MemoryContextSwitchTo(oldcxt);
+
+     return table;
+ }
+
+
+ /*
+  * MakeTransitionCaptureState
+  *
+  * Make a TransitionCaptureState object for the given TriggerDesc, target
+  * relation, and operation type.  The TCS object holds all the state needed
+  * to decide whether to capture tuples in transition tables.
+  *
+  * If there are no triggers in 'trigdesc' that request relevant transition
+  * tables, then return NULL.
+  *
+  * The resulting object can be passed to the ExecAR* functions.  The caller
+  * should set tcs_map or tcs_original_insert_tuple as appropriate when dealing
+  * with child tables.
+  *
+  * Note that we copy the flags from a parent table into this struct (rather
+  * than subsequently using the relation's TriggerDesc directly) so that we can
+  * use it to control collection of transition tuples from child tables.
+  *
+  * Per SQL spec, all operations of the same kind (INSERT/UPDATE/DELETE)
+  * on the same table during one query should share one transition table.
+  * Therefore, the Tuplestores are owned by an AfterTriggersTableData struct
+  * looked up using the table OID + CmdType, and are merely referenced by
+  * the TransitionCaptureState objects we hand out to callers.
+  */
+ TransitionCaptureState *
+ MakeTransitionCaptureState(TriggerDesc *trigdesc, Oid relid, CmdType cmdType)
+ {
+     TransitionCaptureState *state;
+     bool        need_old,
+                 need_new;
+     AfterTriggersTableData *table;
+     MemoryContext oldcxt;
+     ResourceOwner saveResourceOwner;
+
+     if (trigdesc == NULL)
+         return NULL;
+
+     /* Detect which table(s) we need. */
+     switch (cmdType)
+     {
+         case CMD_INSERT:
+             need_old = false;
+             need_new = trigdesc->trig_insert_new_table;
+             break;
+         case CMD_UPDATE:
+             need_old = trigdesc->trig_update_old_table;
+             need_new = trigdesc->trig_update_new_table;
+             break;
+         case CMD_DELETE:
+             need_old = trigdesc->trig_delete_old_table;
+             need_new = false;
+             break;
+         default:
+             elog(ERROR, "unexpected CmdType: %d", (int) cmdType);
+             need_old = need_new = false;    /* keep compiler quiet */
+             break;
+     }
+     if (!need_old && !need_new)
+         return NULL;
+
+     /* Check state, like AfterTriggerSaveEvent. */
+     if (afterTriggers.query_depth < 0)
+         elog(ERROR, "MakeTransitionCaptureState() called outside of query");
+
+     /* Be sure we have enough space to record events at this query depth. */
+     if (afterTriggers.query_depth >= afterTriggers.maxquerydepth)
+         AfterTriggerEnlargeQueryState();
+
+     /*
+      * Find or create an AfterTriggersTableData struct to hold the
+      * tuplestore(s).  If there's a matching struct but it's marked closed,
+      * ignore it; we need a newer one.
+      *
+      * Note: the AfterTriggersTableData list, as well as the tuplestores, are
+      * allocated in the current (sub)transaction's CurTransactionContext, and
+      * the tuplestores are managed by the (sub)transaction's resource owner.
+      * This is sufficient lifespan because we do not allow triggers using
+      * transition tables to be deferrable; they will be fired during
+      * AfterTriggerEndQuery, after which it's okay to delete the data.
+      */
+     table = GetAfterTriggersTableData(relid, cmdType);
+
+     /* Now create required tuplestore(s), if we don't have them already. */
+     oldcxt = MemoryContextSwitchTo(CurTransactionContext);
+     saveResourceOwner = CurrentResourceOwner;
+     PG_TRY();
+     {
+         CurrentResourceOwner = CurTransactionResourceOwner;
+         if (need_old && table->old_tuplestore == NULL)
+             table->old_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+         if (need_new && table->new_tuplestore == NULL)
+             table->new_tuplestore = tuplestore_begin_heap(false, false, work_mem);
+     }
+     PG_CATCH();
+     {
+         CurrentResourceOwner = saveResourceOwner;
+         PG_RE_THROW();
+     }
+     PG_END_TRY();
+     CurrentResourceOwner = saveResourceOwner;
+     MemoryContextSwitchTo(oldcxt);
+
+     /* Now build the TransitionCaptureState struct, in caller's context */
+     state = (TransitionCaptureState *) palloc0(sizeof(TransitionCaptureState));
+     state->tcs_delete_old_table = trigdesc->trig_delete_old_table;
+     state->tcs_update_old_table = trigdesc->trig_update_old_table;
+     state->tcs_update_new_table = trigdesc->trig_update_new_table;
+     state->tcs_insert_new_table = trigdesc->trig_insert_new_table;
+     state->tcs_private = table;
+
+     return state;
+ }
+
+
  /* ----------
   * AfterTriggerBeginXact()
   *
*************** AfterTriggerBeginXact(void)
*** 4319,4332 ****
       */
      Assert(afterTriggers.state == NULL);
      Assert(afterTriggers.query_stack == NULL);
-     Assert(afterTriggers.fdw_tuplestores == NULL);
      Assert(afterTriggers.maxquerydepth == 0);
      Assert(afterTriggers.event_cxt == NULL);
      Assert(afterTriggers.events.head == NULL);
!     Assert(afterTriggers.state_stack == NULL);
!     Assert(afterTriggers.events_stack == NULL);
!     Assert(afterTriggers.depth_stack == NULL);
!     Assert(afterTriggers.firing_stack == NULL);
      Assert(afterTriggers.maxtransdepth == 0);
  }

--- 4466,4475 ----
       */
      Assert(afterTriggers.state == NULL);
      Assert(afterTriggers.query_stack == NULL);
      Assert(afterTriggers.maxquerydepth == 0);
      Assert(afterTriggers.event_cxt == NULL);
      Assert(afterTriggers.events.head == NULL);
!     Assert(afterTriggers.trans_stack == NULL);
      Assert(afterTriggers.maxtransdepth == 0);
  }

*************** AfterTriggerBeginQuery(void)
*** 4362,4370 ****
  void
  AfterTriggerEndQuery(EState *estate)
  {
-     AfterTriggerEventList *events;
-     Tuplestorestate *fdw_tuplestore;
-
      /* Must be inside a query, too */
      Assert(afterTriggers.query_depth >= 0);

--- 4505,4510 ----
*************** AfterTriggerEndQuery(EState *estate)
*** 4393,4430 ****
       * will instead fire any triggers in a dedicated query level.  Foreign key
       * enforcement triggers do add to the current query level, thanks to their
       * passing fire_triggers = false to SPI_execute_snapshot().  Other
!      * C-language triggers might do likewise.  Be careful here: firing a
!      * trigger could result in query_stack being repalloc'd, so we can't save
!      * its address across afterTriggerInvokeEvents calls.
       *
       * If we find no firable events, we don't have to increment
       * firing_counter.
       */
      for (;;)
      {
!         events = &afterTriggers.query_stack[afterTriggers.query_depth];
!         if (afterTriggerMarkEvents(events, &afterTriggers.events, true))
          {
              CommandId    firing_id = afterTriggers.firing_counter++;

              /* OK to delete the immediate events after processing them */
!             if (afterTriggerInvokeEvents(events, firing_id, estate, true))
                  break;            /* all fired */
          }
          else
              break;
      }

!     /* Release query-local storage for events, including tuplestore if any */
!     fdw_tuplestore = afterTriggers.fdw_tuplestores[afterTriggers.query_depth];
!     if (fdw_tuplestore)
      {
!         tuplestore_end(fdw_tuplestore);
!         afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
      }
-     afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);

!     afterTriggers.query_depth--;
  }


--- 4533,4618 ----
       * will instead fire any triggers in a dedicated query level.  Foreign key
       * enforcement triggers do add to the current query level, thanks to their
       * passing fire_triggers = false to SPI_execute_snapshot().  Other
!      * C-language triggers might do likewise.
       *
       * If we find no firable events, we don't have to increment
       * firing_counter.
       */
      for (;;)
      {
!         AfterTriggersQueryData *qs;
!
!         /*
!          * Firing a trigger could result in query_stack being repalloc'd, so
!          * we must recalculate qs after each afterTriggerInvokeEvents call.
!          */
!         qs = &afterTriggers.query_stack[afterTriggers.query_depth];
!
!         if (afterTriggerMarkEvents(&qs->events, &afterTriggers.events, true))
          {
              CommandId    firing_id = afterTriggers.firing_counter++;

              /* OK to delete the immediate events after processing them */
!             if (afterTriggerInvokeEvents(&qs->events, firing_id, estate, true))
                  break;            /* all fired */
          }
          else
              break;
      }

!     /* Release query-level-local storage, including tuplestores if any */
!     AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
!
!     afterTriggers.query_depth--;
! }
!
!
! /*
!  * AfterTriggerFreeQuery
!  *    Release subsidiary storage for a trigger query level.
!  *    This includes closing down tuplestores.
!  *    Note: it's important for this to be safe if interrupted by an error
!  *    and then called again for the same query level.
!  */
! static void
! AfterTriggerFreeQuery(AfterTriggersQueryData *qs)
! {
!     Tuplestorestate *ts;
!     List       *tables;
!     ListCell   *lc;
!
!     /* Drop the trigger events */
!     afterTriggerFreeEventList(&qs->events);
!
!     /* Drop FDW tuplestore if any */
!     ts = qs->fdw_tuplestore;
!     qs->fdw_tuplestore = NULL;
!     if (ts)
!         tuplestore_end(ts);
!
!     /* Release per-table subsidiary storage */
!     tables = qs->tables;
!     foreach(lc, tables)
      {
!         AfterTriggersTableData *table = (AfterTriggersTableData *) lfirst(lc);
!
!         ts = table->old_tuplestore;
!         table->old_tuplestore = NULL;
!         if (ts)
!             tuplestore_end(ts);
!         ts = table->new_tuplestore;
!         table->new_tuplestore = NULL;
!         if (ts)
!             tuplestore_end(ts);
      }

!     /*
!      * Now free the AfterTriggersTableData structs and list cells.  Reset list
!      * pointer first; if list_free_deep somehow gets an error, better to leak
!      * that storage than have an infinite loop.
!      */
!     qs->tables = NIL;
!     list_free_deep(tables);
  }


*************** AfterTriggerEndXact(bool isCommit)
*** 4521,4530 ****
       * large, we let the eventual reset of TopTransactionContext free the
       * memory instead of doing it here.
       */
!     afterTriggers.state_stack = NULL;
!     afterTriggers.events_stack = NULL;
!     afterTriggers.depth_stack = NULL;
!     afterTriggers.firing_stack = NULL;
      afterTriggers.maxtransdepth = 0;


--- 4709,4715 ----
       * large, we let the eventual reset of TopTransactionContext free the
       * memory instead of doing it here.
       */
!     afterTriggers.trans_stack = NULL;
      afterTriggers.maxtransdepth = 0;


*************** AfterTriggerEndXact(bool isCommit)
*** 4534,4540 ****
       * memory here.
       */
      afterTriggers.query_stack = NULL;
-     afterTriggers.fdw_tuplestores = NULL;
      afterTriggers.maxquerydepth = 0;
      afterTriggers.state = NULL;

--- 4719,4724 ----
*************** AfterTriggerBeginSubXact(void)
*** 4553,4600 ****
      int            my_level = GetCurrentTransactionNestLevel();

      /*
!      * Allocate more space in the stacks if needed.  (Note: because the
       * minimum nest level of a subtransaction is 2, we waste the first couple
!      * entries of each array; not worth the notational effort to avoid it.)
       */
      while (my_level >= afterTriggers.maxtransdepth)
      {
          if (afterTriggers.maxtransdepth == 0)
          {
!             MemoryContext old_cxt;
!
!             old_cxt = MemoryContextSwitchTo(TopTransactionContext);
!
! #define DEFTRIG_INITALLOC 8
!             afterTriggers.state_stack = (SetConstraintState *)
!                 palloc(DEFTRIG_INITALLOC * sizeof(SetConstraintState));
!             afterTriggers.events_stack = (AfterTriggerEventList *)
!                 palloc(DEFTRIG_INITALLOC * sizeof(AfterTriggerEventList));
!             afterTriggers.depth_stack = (int *)
!                 palloc(DEFTRIG_INITALLOC * sizeof(int));
!             afterTriggers.firing_stack = (CommandId *)
!                 palloc(DEFTRIG_INITALLOC * sizeof(CommandId));
!             afterTriggers.maxtransdepth = DEFTRIG_INITALLOC;
!
!             MemoryContextSwitchTo(old_cxt);
          }
          else
          {
!             /* repalloc will keep the stacks in the same context */
              int            new_alloc = afterTriggers.maxtransdepth * 2;

!             afterTriggers.state_stack = (SetConstraintState *)
!                 repalloc(afterTriggers.state_stack,
!                          new_alloc * sizeof(SetConstraintState));
!             afterTriggers.events_stack = (AfterTriggerEventList *)
!                 repalloc(afterTriggers.events_stack,
!                          new_alloc * sizeof(AfterTriggerEventList));
!             afterTriggers.depth_stack = (int *)
!                 repalloc(afterTriggers.depth_stack,
!                          new_alloc * sizeof(int));
!             afterTriggers.firing_stack = (CommandId *)
!                 repalloc(afterTriggers.firing_stack,
!                          new_alloc * sizeof(CommandId));
              afterTriggers.maxtransdepth = new_alloc;
          }
      }
--- 4737,4764 ----
      int            my_level = GetCurrentTransactionNestLevel();

      /*
!      * Allocate more space in the trans_stack if needed.  (Note: because the
       * minimum nest level of a subtransaction is 2, we waste the first couple
!      * entries of the array; not worth the notational effort to avoid it.)
       */
      while (my_level >= afterTriggers.maxtransdepth)
      {
          if (afterTriggers.maxtransdepth == 0)
          {
!             /* Arbitrarily initialize for max of 8 subtransaction levels */
!             afterTriggers.trans_stack = (AfterTriggersTransData *)
!                 MemoryContextAlloc(TopTransactionContext,
!                                    8 * sizeof(AfterTriggersTransData));
!             afterTriggers.maxtransdepth = 8;
          }
          else
          {
!             /* repalloc will keep the stack in the same context */
              int            new_alloc = afterTriggers.maxtransdepth * 2;

!             afterTriggers.trans_stack = (AfterTriggersTransData *)
!                 repalloc(afterTriggers.trans_stack,
!                          new_alloc * sizeof(AfterTriggersTransData));
              afterTriggers.maxtransdepth = new_alloc;
          }
      }
*************** AfterTriggerBeginSubXact(void)
*** 4604,4613 ****
       * is not saved until/unless changed.  Likewise, we don't make a
       * per-subtransaction event context until needed.
       */
!     afterTriggers.state_stack[my_level] = NULL;
!     afterTriggers.events_stack[my_level] = afterTriggers.events;
!     afterTriggers.depth_stack[my_level] = afterTriggers.query_depth;
!     afterTriggers.firing_stack[my_level] = afterTriggers.firing_counter;
  }

  /*
--- 4768,4777 ----
       * is not saved until/unless changed.  Likewise, we don't make a
       * per-subtransaction event context until needed.
       */
!     afterTriggers.trans_stack[my_level].state = NULL;
!     afterTriggers.trans_stack[my_level].events = afterTriggers.events;
!     afterTriggers.trans_stack[my_level].query_depth = afterTriggers.query_depth;
!     afterTriggers.trans_stack[my_level].firing_counter = afterTriggers.firing_counter;
  }

  /*
*************** AfterTriggerEndSubXact(bool isCommit)
*** 4631,4700 ****
      {
          Assert(my_level < afterTriggers.maxtransdepth);
          /* If we saved a prior state, we don't need it anymore */
!         state = afterTriggers.state_stack[my_level];
          if (state != NULL)
              pfree(state);
          /* this avoids double pfree if error later: */
!         afterTriggers.state_stack[my_level] = NULL;
          Assert(afterTriggers.query_depth ==
!                afterTriggers.depth_stack[my_level]);
      }
      else
      {
          /*
           * Aborting.  It is possible subxact start failed before calling
           * AfterTriggerBeginSubXact, in which case we mustn't risk touching
!          * stack levels that aren't there.
           */
          if (my_level >= afterTriggers.maxtransdepth)
              return;

          /*
!          * Release any event lists from queries being aborted, and restore
           * query_depth to its pre-subxact value.  This assumes that a
           * subtransaction will not add events to query levels started in a
           * earlier transaction state.
           */
!         while (afterTriggers.query_depth > afterTriggers.depth_stack[my_level])
          {
              if (afterTriggers.query_depth < afterTriggers.maxquerydepth)
!             {
!                 Tuplestorestate *ts;
!
!                 ts = afterTriggers.fdw_tuplestores[afterTriggers.query_depth];
!                 if (ts)
!                 {
!                     tuplestore_end(ts);
!                     afterTriggers.fdw_tuplestores[afterTriggers.query_depth] = NULL;
!                 }
!
!                 afterTriggerFreeEventList(&afterTriggers.query_stack[afterTriggers.query_depth]);
!             }
!
              afterTriggers.query_depth--;
          }
          Assert(afterTriggers.query_depth ==
!                afterTriggers.depth_stack[my_level]);

          /*
           * Restore the global deferred-event list to its former length,
           * discarding any events queued by the subxact.
           */
          afterTriggerRestoreEventList(&afterTriggers.events,
!                                      &afterTriggers.events_stack[my_level]);

          /*
           * Restore the trigger state.  If the saved state is NULL, then this
           * subxact didn't save it, so it doesn't need restoring.
           */
!         state = afterTriggers.state_stack[my_level];
          if (state != NULL)
          {
              pfree(afterTriggers.state);
              afterTriggers.state = state;
          }
          /* this avoids double pfree if error later: */
!         afterTriggers.state_stack[my_level] = NULL;

          /*
           * Scan for any remaining deferred events that were marked DONE or IN
--- 4795,4852 ----
      {
          Assert(my_level < afterTriggers.maxtransdepth);
          /* If we saved a prior state, we don't need it anymore */
!         state = afterTriggers.trans_stack[my_level].state;
          if (state != NULL)
              pfree(state);
          /* this avoids double pfree if error later: */
!         afterTriggers.trans_stack[my_level].state = NULL;
          Assert(afterTriggers.query_depth ==
!                afterTriggers.trans_stack[my_level].query_depth);
      }
      else
      {
          /*
           * Aborting.  It is possible subxact start failed before calling
           * AfterTriggerBeginSubXact, in which case we mustn't risk touching
!          * trans_stack levels that aren't there.
           */
          if (my_level >= afterTriggers.maxtransdepth)
              return;

          /*
!          * Release query-level storage for queries being aborted, and restore
           * query_depth to its pre-subxact value.  This assumes that a
           * subtransaction will not add events to query levels started in a
           * earlier transaction state.
           */
!         while (afterTriggers.query_depth > afterTriggers.trans_stack[my_level].query_depth)
          {
              if (afterTriggers.query_depth < afterTriggers.maxquerydepth)
!                 AfterTriggerFreeQuery(&afterTriggers.query_stack[afterTriggers.query_depth]);
              afterTriggers.query_depth--;
          }
          Assert(afterTriggers.query_depth ==
!                afterTriggers.trans_stack[my_level].query_depth);

          /*
           * Restore the global deferred-event list to its former length,
           * discarding any events queued by the subxact.
           */
          afterTriggerRestoreEventList(&afterTriggers.events,
!                                      &afterTriggers.trans_stack[my_level].events);

          /*
           * Restore the trigger state.  If the saved state is NULL, then this
           * subxact didn't save it, so it doesn't need restoring.
           */
!         state = afterTriggers.trans_stack[my_level].state;
          if (state != NULL)
          {
              pfree(afterTriggers.state);
              afterTriggers.state = state;
          }
          /* this avoids double pfree if error later: */
!         afterTriggers.trans_stack[my_level].state = NULL;

          /*
           * Scan for any remaining deferred events that were marked DONE or IN
*************** AfterTriggerEndSubXact(bool isCommit)
*** 4704,4710 ****
           * (This essentially assumes that the current subxact includes all
           * subxacts started after it.)
           */
!         subxact_firing_id = afterTriggers.firing_stack[my_level];
          for_each_event_chunk(event, chunk, afterTriggers.events)
          {
              AfterTriggerShared evtshared = GetTriggerSharedData(event);
--- 4856,4862 ----
           * (This essentially assumes that the current subxact includes all
           * subxacts started after it.)
           */
!         subxact_firing_id = afterTriggers.trans_stack[my_level].firing_counter;
          for_each_event_chunk(event, chunk, afterTriggers.events)
          {
              AfterTriggerShared evtshared = GetTriggerSharedData(event);
*************** AfterTriggerEnlargeQueryState(void)
*** 4740,4751 ****
      {
          int            new_alloc = Max(afterTriggers.query_depth + 1, 8);

!         afterTriggers.query_stack = (AfterTriggerEventList *)
              MemoryContextAlloc(TopTransactionContext,
!                                new_alloc * sizeof(AfterTriggerEventList));
!         afterTriggers.fdw_tuplestores = (Tuplestorestate **)
!             MemoryContextAllocZero(TopTransactionContext,
!                                    new_alloc * sizeof(Tuplestorestate *));
          afterTriggers.maxquerydepth = new_alloc;
      }
      else
--- 4892,4900 ----
      {
          int            new_alloc = Max(afterTriggers.query_depth + 1, 8);

!         afterTriggers.query_stack = (AfterTriggersQueryData *)
              MemoryContextAlloc(TopTransactionContext,
!                                new_alloc * sizeof(AfterTriggersQueryData));
          afterTriggers.maxquerydepth = new_alloc;
      }
      else
*************** AfterTriggerEnlargeQueryState(void)
*** 4755,4781 ****
          int            new_alloc = Max(afterTriggers.query_depth + 1,
                                      old_alloc * 2);

!         afterTriggers.query_stack = (AfterTriggerEventList *)
              repalloc(afterTriggers.query_stack,
!                      new_alloc * sizeof(AfterTriggerEventList));
!         afterTriggers.fdw_tuplestores = (Tuplestorestate **)
!             repalloc(afterTriggers.fdw_tuplestores,
!                      new_alloc * sizeof(Tuplestorestate *));
!         /* Clear newly-allocated slots for subsequent lazy initialization. */
!         memset(afterTriggers.fdw_tuplestores + old_alloc,
!                0, (new_alloc - old_alloc) * sizeof(Tuplestorestate *));
          afterTriggers.maxquerydepth = new_alloc;
      }

!     /* Initialize new query lists to empty */
      while (init_depth < afterTriggers.maxquerydepth)
      {
!         AfterTriggerEventList *events;

!         events = &afterTriggers.query_stack[init_depth];
!         events->head = NULL;
!         events->tail = NULL;
!         events->tailfree = NULL;

          ++init_depth;
      }
--- 4904,4925 ----
          int            new_alloc = Max(afterTriggers.query_depth + 1,
                                      old_alloc * 2);

!         afterTriggers.query_stack = (AfterTriggersQueryData *)
              repalloc(afterTriggers.query_stack,
!                      new_alloc * sizeof(AfterTriggersQueryData));
          afterTriggers.maxquerydepth = new_alloc;
      }

!     /* Initialize new array entries to empty */
      while (init_depth < afterTriggers.maxquerydepth)
      {
!         AfterTriggersQueryData *qs = &afterTriggers.query_stack[init_depth];

!         qs->events.head = NULL;
!         qs->events.tail = NULL;
!         qs->events.tailfree = NULL;
!         qs->fdw_tuplestore = NULL;
!         qs->tables = NIL;

          ++init_depth;
      }
*************** AfterTriggerSetState(ConstraintsSetStmt
*** 4873,4881 ****
       * save it so it can be restored if the subtransaction aborts.
       */
      if (my_level > 1 &&
!         afterTriggers.state_stack[my_level] == NULL)
      {
!         afterTriggers.state_stack[my_level] =
              SetConstraintStateCopy(afterTriggers.state);
      }

--- 5017,5025 ----
       * save it so it can be restored if the subtransaction aborts.
       */
      if (my_level > 1 &&
!         afterTriggers.trans_stack[my_level].state == NULL)
      {
!         afterTriggers.trans_stack[my_level].state =
              SetConstraintStateCopy(afterTriggers.state);
      }

*************** AfterTriggerPendingOnRel(Oid relid)
*** 5184,5190 ****
       */
      for (depth = 0; depth <= afterTriggers.query_depth && depth < afterTriggers.maxquerydepth; depth++)
      {
!         for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth])
          {
              AfterTriggerShared evtshared = GetTriggerSharedData(event);

--- 5328,5334 ----
       */
      for (depth = 0; depth <= afterTriggers.query_depth && depth < afterTriggers.maxquerydepth; depth++)
      {
!         for_each_event_chunk(event, chunk, afterTriggers.query_stack[depth].events)
          {
              AfterTriggerShared evtshared = GetTriggerSharedData(event);

*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5229,5235 ****
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
      AfterTriggerEventData new_event;
      AfterTriggerSharedData new_shared;
!     char        relkind = relinfo->ri_RelationDesc->rd_rel->relkind;
      int            tgtype_event;
      int            tgtype_level;
      int            i;
--- 5373,5379 ----
      TriggerDesc *trigdesc = relinfo->ri_TrigDesc;
      AfterTriggerEventData new_event;
      AfterTriggerSharedData new_shared;
!     char        relkind = rel->rd_rel->relkind;
      int            tgtype_event;
      int            tgtype_level;
      int            i;
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5266,5272 ****
              Tuplestorestate *old_tuplestore;

              Assert(oldtup != NULL);
!             old_tuplestore = transition_capture->tcs_old_tuplestore;

              if (map != NULL)
              {
--- 5410,5416 ----
              Tuplestorestate *old_tuplestore;

              Assert(oldtup != NULL);
!             old_tuplestore = transition_capture->tcs_private->old_tuplestore;

              if (map != NULL)
              {
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5284,5293 ****
              Tuplestorestate *new_tuplestore;

              Assert(newtup != NULL);
!             if (event == TRIGGER_EVENT_INSERT)
!                 new_tuplestore = transition_capture->tcs_insert_tuplestore;
!             else
!                 new_tuplestore = transition_capture->tcs_update_tuplestore;

              if (original_insert_tuple != NULL)
                  tuplestore_puttuple(new_tuplestore, original_insert_tuple);
--- 5428,5434 ----
              Tuplestorestate *new_tuplestore;

              Assert(newtup != NULL);
!             new_tuplestore = transition_capture->tcs_private->new_tuplestore;

              if (original_insert_tuple != NULL)
                  tuplestore_puttuple(new_tuplestore, original_insert_tuple);
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5316,5321 ****
--- 5457,5467 ----
       * The event code will be used both as a bitmask and an array offset, so
       * validation is important to make sure we don't walk off the edge of our
       * arrays.
+      *
+      * Also, if we're considering statement-level triggers, check whether we
+      * already queued a set of them for this event, and cancel the prior set
+      * if so.  This preserves the behavior that statement-level triggers fire
+      * just once per statement and fire after row-level triggers.
       */
      switch (event)
      {
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5334,5339 ****
--- 5480,5487 ----
                  Assert(newtup == NULL);
                  ItemPointerSetInvalid(&(new_event.ate_ctid1));
                  ItemPointerSetInvalid(&(new_event.ate_ctid2));
+                 cancel_prior_stmt_triggers(RelationGetRelid(rel),
+                                            CMD_INSERT, event);
              }
              break;
          case TRIGGER_EVENT_DELETE:
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5351,5356 ****
--- 5499,5506 ----
                  Assert(newtup == NULL);
                  ItemPointerSetInvalid(&(new_event.ate_ctid1));
                  ItemPointerSetInvalid(&(new_event.ate_ctid2));
+                 cancel_prior_stmt_triggers(RelationGetRelid(rel),
+                                            CMD_DELETE, event);
              }
              break;
          case TRIGGER_EVENT_UPDATE:
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5368,5373 ****
--- 5518,5525 ----
                  Assert(newtup == NULL);
                  ItemPointerSetInvalid(&(new_event.ate_ctid1));
                  ItemPointerSetInvalid(&(new_event.ate_ctid2));
+                 cancel_prior_stmt_triggers(RelationGetRelid(rel),
+                                            CMD_UPDATE, event);
              }
              break;
          case TRIGGER_EVENT_TRUNCATE:
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5407,5415 ****
          {
              if (fdw_tuplestore == NULL)
              {
!                 fdw_tuplestore =
!                     GetTriggerTransitionTuplestore
!                     (afterTriggers.fdw_tuplestores);
                  new_event.ate_flags = AFTER_TRIGGER_FDW_FETCH;
              }
              else
--- 5559,5565 ----
          {
              if (fdw_tuplestore == NULL)
              {
!                 fdw_tuplestore = GetCurrentFDWTuplestore();
                  new_event.ate_flags = AFTER_TRIGGER_FDW_FETCH;
              }
              else
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5465,5470 ****
--- 5615,5622 ----

          /*
           * Fill in event structure and add it to the current query's queue.
+          * Note we set ats_table to NULL whenever this trigger doesn't use
+          * transition tables, to improve sharability of the shared event data.
           */
          new_shared.ats_event =
              (event & TRIGGER_EVENT_OPMASK) |
*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5474,5484 ****
          new_shared.ats_tgoid = trigger->tgoid;
          new_shared.ats_relid = RelationGetRelid(rel);
          new_shared.ats_firing_id = 0;
!         /* deferrable triggers cannot access transition data */
!         new_shared.ats_transition_capture =
!             trigger->tgdeferrable ? NULL : transition_capture;

!         afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth],
                               &new_event, &new_shared);
      }

--- 5626,5638 ----
          new_shared.ats_tgoid = trigger->tgoid;
          new_shared.ats_relid = RelationGetRelid(rel);
          new_shared.ats_firing_id = 0;
!         if ((trigger->tgoldtable || trigger->tgnewtable) &&
!             transition_capture != NULL)
!             new_shared.ats_table = transition_capture->tcs_private;
!         else
!             new_shared.ats_table = NULL;

!         afterTriggerAddEvent(&afterTriggers.query_stack[afterTriggers.query_depth].events,
                               &new_event, &new_shared);
      }

*************** AfterTriggerSaveEvent(EState *estate, Re
*** 5496,5501 ****
--- 5650,5749 ----
      }
  }

+ /*
+  * If we previously queued a set of AFTER STATEMENT triggers for the given
+  * relation + operation, and they've not been fired yet, cancel them.  The
+  * caller will queue a fresh set that's after any row-level triggers that may
+  * have been queued by the current sub-statement, preserving (as much as
+  * possible) the property that AFTER ROW triggers fire before AFTER STATEMENT
+  * triggers, and that the latter only fire once.  This deals with the
+  * situation where several FK enforcement triggers sequentially queue triggers
+  * for the same table into the same trigger query level.  We can't fully
+  * prevent odd behavior though: if there are AFTER ROW triggers taking
+  * transition tables, we don't want to change the transition tables once the
+  * first such trigger has seen them.  In such a case, any additional events
+  * will result in creating new transition tables and allowing new firings of
+  * statement triggers.
+  *
+  * This also saves the current event list location so that a later invocation
+  * of this function can cheaply find the triggers we're about to queue and
+  * cancel them.
+  */
+ static void
+ cancel_prior_stmt_triggers(Oid relid, CmdType cmdType, int tgevent)
+ {
+     AfterTriggersTableData *table;
+     AfterTriggersQueryData *qs = &afterTriggers.query_stack[afterTriggers.query_depth];
+
+     /*
+      * We keep this state in the AfterTriggersTableData that also holds
+      * transition tables for the relation + operation.  In this way, if we are
+      * forced to make a new set of transition tables because more tuples get
+      * entered after we've already fired triggers, we will allow a new set of
+      * statement triggers to get queued without canceling the old ones.
+      */
+     table = GetAfterTriggersTableData(relid, cmdType);
+
+     if (table->stmt_trig_done)
+     {
+         /*
+          * We want to start scanning from the tail location that existed just
+          * before we inserted any statement triggers.  But the events list
+          * might've been entirely empty then, in which case scan from the
+          * current head.
+          */
+         AfterTriggerEvent event;
+         AfterTriggerEventChunk *chunk;
+
+         if (table->stmt_trig_events.tail)
+         {
+             chunk = table->stmt_trig_events.tail;
+             event = (AfterTriggerEvent) table->stmt_trig_events.tailfree;
+         }
+         else
+         {
+             chunk = qs->events.head;
+             event = NULL;
+         }
+
+         for_each_chunk_from(chunk)
+         {
+             if (event == NULL)
+                 event = (AfterTriggerEvent) CHUNK_DATA_START(chunk);
+             for_each_event_from(event, chunk)
+             {
+                 AfterTriggerShared evtshared = GetTriggerSharedData(event);
+
+                 /*
+                  * Exit loop when we reach events that aren't AS triggers for
+                  * the target relation.
+                  */
+                 if (evtshared->ats_relid != relid)
+                     goto done;
+                 if ((evtshared->ats_event & TRIGGER_EVENT_OPMASK) != tgevent)
+                     goto done;
+                 if (!TRIGGER_FIRED_FOR_STATEMENT(evtshared->ats_event))
+                     goto done;
+                 if (!TRIGGER_FIRED_AFTER(evtshared->ats_event))
+                     goto done;
+                 /* OK, mark it DONE */
+                 event->ate_flags &= ~AFTER_TRIGGER_IN_PROGRESS;
+                 event->ate_flags |= AFTER_TRIGGER_DONE;
+             }
+             /* signal we must reinitialize event ptr for next chunk */
+             event = NULL;
+         }
+     }
+ done:
+
+     /* In any case, save current insertion point for next time */
+     table->stmt_trig_done = true;
+     table->stmt_trig_events = qs->events;
+ }
+
+ /*
+  * SQL function pg_trigger_depth()
+  */
  Datum
  pg_trigger_depth(PG_FUNCTION_ARGS)
  {
diff --git a/src/backend/executor/README b/src/backend/executor/README
index a004506..b3e74aa 100644
*** a/src/backend/executor/README
--- b/src/backend/executor/README
*************** This is a sketch of control flow for ful
*** 241,251 ****
          CreateExecutorState
              creates per-query context
          switch to per-query context to run ExecInitNode
          ExecInitNode --- recursively scans plan tree
              CreateExprContext
                  creates per-tuple context
              ExecInitExpr
-         AfterTriggerBeginQuery

      ExecutorRun
          ExecProcNode --- recursively called in per-query context
--- 241,251 ----
          CreateExecutorState
              creates per-query context
          switch to per-query context to run ExecInitNode
+         AfterTriggerBeginQuery
          ExecInitNode --- recursively scans plan tree
              CreateExprContext
                  creates per-tuple context
              ExecInitExpr

      ExecutorRun
          ExecProcNode --- recursively called in per-query context
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4b594d4..6255cc6 100644
*** a/src/backend/executor/execMain.c
--- b/src/backend/executor/execMain.c
*************** standard_ExecutorStart(QueryDesc *queryD
*** 252,268 ****
      estate->es_instrument = queryDesc->instrument_options;

      /*
-      * Initialize the plan state tree
-      */
-     InitPlan(queryDesc, eflags);
-
-     /*
       * Set up an AFTER-trigger statement context, unless told not to, or
       * unless it's EXPLAIN-only mode (when ExecutorFinish won't be called).
       */
      if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
          AfterTriggerBeginQuery();

      MemoryContextSwitchTo(oldcontext);
  }

--- 252,268 ----
      estate->es_instrument = queryDesc->instrument_options;

      /*
       * Set up an AFTER-trigger statement context, unless told not to, or
       * unless it's EXPLAIN-only mode (when ExecutorFinish won't be called).
       */
      if (!(eflags & (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
          AfterTriggerBeginQuery();

+     /*
+      * Initialize the plan state tree
+      */
+     InitPlan(queryDesc, eflags);
+
      MemoryContextSwitchTo(oldcontext);
  }

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 49586a3..845c409 100644
*** a/src/backend/executor/nodeModifyTable.c
--- b/src/backend/executor/nodeModifyTable.c
*************** ExecInsert(ModifyTableState *mtstate,
*** 343,348 ****
--- 343,351 ----
                  mtstate->mt_transition_capture->tcs_map = NULL;
              }
          }
+         if (mtstate->mt_oc_transition_capture != NULL)
+             mtstate->mt_oc_transition_capture->tcs_map =
+                 mtstate->mt_transition_tupconv_maps[leaf_part_index];

          /*
           * We might need to convert from the parent rowtype to the partition
*************** lreplace:;
*** 1158,1163 ****
--- 1161,1168 ----
      /* AFTER ROW UPDATE Triggers */
      ExecARUpdateTriggers(estate, resultRelInfo, tupleid, oldtuple, tuple,
                           recheckIndexes,
+                          mtstate->operation == CMD_INSERT ?
+                          mtstate->mt_oc_transition_capture :
                           mtstate->mt_transition_capture);

      list_free(recheckIndexes);
*************** fireASTriggers(ModifyTableState *node)
*** 1444,1450 ****
              if (node->mt_onconflict == ONCONFLICT_UPDATE)
                  ExecASUpdateTriggers(node->ps.state,
                                       resultRelInfo,
!                                      node->mt_transition_capture);
              ExecASInsertTriggers(node->ps.state, resultRelInfo,
                                   node->mt_transition_capture);
              break;
--- 1449,1455 ----
              if (node->mt_onconflict == ONCONFLICT_UPDATE)
                  ExecASUpdateTriggers(node->ps.state,
                                       resultRelInfo,
!                                      node->mt_oc_transition_capture);
              ExecASInsertTriggers(node->ps.state, resultRelInfo,
                                   node->mt_transition_capture);
              break;
*************** ExecSetupTransitionCaptureState(ModifyTa
*** 1474,1487 ****

      /* Check for transition tables on the directly targeted relation. */
      mtstate->mt_transition_capture =
!         MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc);

      /*
       * If we found that we need to collect transition tuples then we may also
       * need tuple conversion maps for any children that have TupleDescs that
!      * aren't compatible with the tuplestores.
       */
!     if (mtstate->mt_transition_capture != NULL)
      {
          ResultRelInfo *resultRelInfos;
          int            numResultRelInfos;
--- 1479,1502 ----

      /* Check for transition tables on the directly targeted relation. */
      mtstate->mt_transition_capture =
!         MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
!                                    RelationGetRelid(targetRelInfo->ri_RelationDesc),
!                                    mtstate->operation);
!     if (mtstate->operation == CMD_INSERT &&
!         mtstate->mt_onconflict == ONCONFLICT_UPDATE)
!         mtstate->mt_oc_transition_capture =
!             MakeTransitionCaptureState(targetRelInfo->ri_TrigDesc,
!                                        RelationGetRelid(targetRelInfo->ri_RelationDesc),
!                                        CMD_UPDATE);

      /*
       * If we found that we need to collect transition tuples then we may also
       * need tuple conversion maps for any children that have TupleDescs that
!      * aren't compatible with the tuplestores.  (We can share these maps
!      * between the regular and ON CONFLICT cases.)
       */
!     if (mtstate->mt_transition_capture != NULL ||
!         mtstate->mt_oc_transition_capture != NULL)
      {
          ResultRelInfo *resultRelInfos;
          int            numResultRelInfos;
*************** ExecSetupTransitionCaptureState(ModifyTa
*** 1522,1531 ****
          /*
           * Install the conversion map for the first plan for UPDATE and DELETE
           * operations.  It will be advanced each time we switch to the next
!          * plan.  (INSERT operations set it every time.)
           */
!         mtstate->mt_transition_capture->tcs_map =
!             mtstate->mt_transition_tupconv_maps[0];
      }
  }

--- 1537,1548 ----
          /*
           * Install the conversion map for the first plan for UPDATE and DELETE
           * operations.  It will be advanced each time we switch to the next
!          * plan.  (INSERT operations set it every time, so we need not update
!          * mtstate->mt_oc_transition_capture here.)
           */
!         if (mtstate->mt_transition_capture)
!             mtstate->mt_transition_capture->tcs_map =
!                 mtstate->mt_transition_tupconv_maps[0];
      }
  }

*************** ExecModifyTable(PlanState *pstate)
*** 1629,1641 ****
                  estate->es_result_relation_info = resultRelInfo;
                  EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
                                      node->mt_arowmarks[node->mt_whichplan]);
                  if (node->mt_transition_capture != NULL)
                  {
-                     /* Prepare to convert transition tuples from this child. */
                      Assert(node->mt_transition_tupconv_maps != NULL);
                      node->mt_transition_capture->tcs_map =
                          node->mt_transition_tupconv_maps[node->mt_whichplan];
                  }
                  continue;
              }
              else
--- 1646,1664 ----
                  estate->es_result_relation_info = resultRelInfo;
                  EvalPlanQualSetPlan(&node->mt_epqstate, subplanstate->plan,
                                      node->mt_arowmarks[node->mt_whichplan]);
+                 /* Prepare to convert transition tuples from this child. */
                  if (node->mt_transition_capture != NULL)
                  {
                      Assert(node->mt_transition_tupconv_maps != NULL);
                      node->mt_transition_capture->tcs_map =
                          node->mt_transition_tupconv_maps[node->mt_whichplan];
                  }
+                 if (node->mt_oc_transition_capture != NULL)
+                 {
+                     Assert(node->mt_transition_tupconv_maps != NULL);
+                     node->mt_oc_transition_capture->tcs_map =
+                         node->mt_transition_tupconv_maps[node->mt_whichplan];
+                 }
                  continue;
              }
              else
*************** ExecInitModifyTable(ModifyTable *node, E
*** 1934,1941 ****
          mtstate->mt_partition_tuple_slot = partition_tuple_slot;
      }

!     /* Build state for collecting transition tuples */
!     ExecSetupTransitionCaptureState(mtstate, estate);

      /*
       * Initialize any WITH CHECK OPTION constraints if needed.
--- 1957,1968 ----
          mtstate->mt_partition_tuple_slot = partition_tuple_slot;
      }

!     /*
!      * Build state for collecting transition tuples.  This requires having a
!      * valid trigger query context, so skip it in explain-only mode.
!      */
!     if (!(eflags & EXEC_FLAG_EXPLAIN_ONLY))
!         ExecSetupTransitionCaptureState(mtstate, estate);

      /*
       * Initialize any WITH CHECK OPTION constraints if needed.
*************** ExecEndModifyTable(ModifyTableState *nod
*** 2319,2334 ****
      int            i;

      /*
-      * Free transition tables, unless this query is being run in
-      * EXEC_FLAG_SKIP_TRIGGERS mode, which means that it may have queued AFTER
-      * triggers that won't be run till later.  In that case we'll just leak
-      * the transition tables till end of (sub)transaction.
-      */
-     if (node->mt_transition_capture != NULL &&
-         !(node->ps.state->es_top_eflags & EXEC_FLAG_SKIP_TRIGGERS))
-         DestroyTransitionCaptureState(node->mt_transition_capture);
-
-     /*
       * Allow any FDWs to shut down
       */
      for (i = 0; i < node->mt_nplans; i++)
--- 2346,2351 ----
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index aeb363f..adbcfa1 100644
*** a/src/include/commands/trigger.h
--- b/src/include/commands/trigger.h
*************** typedef struct TriggerData
*** 43,55 ****

  /*
   * The state for capturing old and new tuples into transition tables for a
!  * single ModifyTable node.
   */
  typedef struct TransitionCaptureState
  {
      /*
       * Is there at least one trigger specifying each transition relation on
       * the relation explicitly named in the DML statement or COPY command?
       */
      bool        tcs_delete_old_table;
      bool        tcs_update_old_table;
--- 43,63 ----

  /*
   * The state for capturing old and new tuples into transition tables for a
!  * single ModifyTable node (or other operation source, e.g. copy.c).
!  *
!  * This is per-caller to avoid conflicts in setting tcs_map or
!  * tcs_original_insert_tuple.  Note, however, that the pointed-to
!  * private data may be shared across multiple callers.
   */
+ struct AfterTriggersTableData;    /* private in trigger.c */
+
  typedef struct TransitionCaptureState
  {
      /*
       * Is there at least one trigger specifying each transition relation on
       * the relation explicitly named in the DML statement or COPY command?
+      * Note: in current usage, these flags could be part of the private state,
+      * but it seems possibly useful to let callers see them.
       */
      bool        tcs_delete_old_table;
      bool        tcs_update_old_table;
*************** typedef struct TransitionCaptureState
*** 60,66 ****
       * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
       * new and old tuples from a child table's format to the format of the
       * relation named in a query so that it is compatible with the transition
!      * tuplestores.
       */
      TupleConversionMap *tcs_map;

--- 68,74 ----
       * For UPDATE and DELETE, AfterTriggerSaveEvent may need to convert the
       * new and old tuples from a child table's format to the format of the
       * relation named in a query so that it is compatible with the transition
!      * tuplestores.  The caller must store the conversion map here if so.
       */
      TupleConversionMap *tcs_map;

*************** typedef struct TransitionCaptureState
*** 74,90 ****
      HeapTuple    tcs_original_insert_tuple;

      /*
!      * The tuplestores backing the transition tables.  We use separate
!      * tuplestores for INSERT and UPDATE, because INSERT ... ON CONFLICT ...
!      * DO UPDATE causes INSERT and UPDATE triggers to fire and needs a way to
!      * keep track of the new tuple images resulting from the two cases
!      * separately.  We only need a single old image tuplestore, because there
!      * is no statement that can both update and delete at the same time.
       */
!     Tuplestorestate *tcs_old_tuplestore;    /* for DELETE and UPDATE old
!                                              * images */
!     Tuplestorestate *tcs_insert_tuplestore; /* for INSERT new images */
!     Tuplestorestate *tcs_update_tuplestore; /* for UPDATE new images */
  } TransitionCaptureState;

  /*
--- 82,90 ----
      HeapTuple    tcs_original_insert_tuple;

      /*
!      * Private data including the tuplestore(s) into which to insert tuples.
       */
!     struct AfterTriggersTableData *tcs_private;
  } TransitionCaptureState;

  /*
*************** extern void RelationBuildTriggers(Relati
*** 174,181 ****
  extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc);

  extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc);
! extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc);
! extern void DestroyTransitionCaptureState(TransitionCaptureState *tcs);

  extern void FreeTriggerDesc(TriggerDesc *trigdesc);

--- 174,182 ----
  extern TriggerDesc *CopyTriggerDesc(TriggerDesc *trigdesc);

  extern const char *FindTriggerIncompatibleWithInheritance(TriggerDesc *trigdesc);
!
! extern TransitionCaptureState *MakeTransitionCaptureState(TriggerDesc *trigdesc,
!                            Oid relid, CmdType cmdType);

  extern void FreeTriggerDesc(TriggerDesc *trigdesc);

diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 90a60ab..c6d3021 100644
*** a/src/include/nodes/execnodes.h
--- b/src/include/nodes/execnodes.h
*************** typedef struct ModifyTableState
*** 983,989 ****
      /* Per partition tuple conversion map */
      TupleTableSlot *mt_partition_tuple_slot;
      struct TransitionCaptureState *mt_transition_capture;
!     /* controls transition table population */
      TupleConversionMap **mt_transition_tupconv_maps;
      /* Per plan/partition tuple conversion */
  } ModifyTableState;
--- 983,991 ----
      /* Per partition tuple conversion map */
      TupleTableSlot *mt_partition_tuple_slot;
      struct TransitionCaptureState *mt_transition_capture;
!     /* controls transition table population for specified operation */
!     struct TransitionCaptureState *mt_oc_transition_capture;
!     /* controls transition table population for INSERT...ON CONFLICT UPDATE */
      TupleConversionMap **mt_transition_tupconv_maps;
      /* Per plan/partition tuple conversion */
  } ModifyTableState;
diff --git a/src/test/regress/expected/triggers.out b/src/test/regress/expected/triggers.out
index 620fac1..5b6007a 100644
*** a/src/test/regress/expected/triggers.out
--- b/src/test/regress/expected/triggers.out
*************** with wcte as (insert into table1 values
*** 2217,2222 ****
--- 2217,2239 ----
    insert into table2 values ('hello world');
  NOTICE:  trigger = table2_trig, new table = ("hello world")
  NOTICE:  trigger = table1_trig, new table = (42)
+ with wcte as (insert into table1 values (43))
+   insert into table1 values (44);
+ NOTICE:  trigger = table1_trig, new table = (43), (44)
+ select * from table1;
+  a
+ ----
+  42
+  44
+  43
+ (3 rows)
+
+ select * from table2;
+       a
+ -------------
+  hello world
+ (1 row)
+
  drop table table1;
  drop table table2;
  --
*************** create trigger my_table_multievent_trig
*** 2256,2261 ****
--- 2273,2286 ----
    after insert or update on my_table referencing new table as new_table
    for each statement execute procedure dump_insert();
  ERROR:  transition tables cannot be specified for triggers with more than one event
+ --
+ -- Verify that you can't create a trigger with transition tables with
+ -- a column list.
+ --
+ create trigger my_table_col_update_trig
+   after update of b on my_table referencing new table as new_table
+   for each statement execute procedure dump_insert();
+ ERROR:  transition tables cannot be specified for triggers with column lists
  drop table my_table;
  --
  -- Test firing of triggers with transition tables by foreign key cascades
*************** select * from trig_table;
*** 2299,2306 ****
  (6 rows)

  delete from refd_table where length(b) = 3;
! NOTICE:  trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b")
! NOTICE:  trigger = trig_table_delete_trig, old table = (11,"one a"), (11,"one b")
  select * from trig_table;
   a |    b
  ---+---------
--- 2324,2330 ----
  (6 rows)

  delete from refd_table where length(b) = 3;
! NOTICE:  trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b"), (11,"one a"), (11,"one b")
  select * from trig_table;
   a |    b
  ---+---------
*************** select * from trig_table;
*** 2309,2314 ****
--- 2333,2357 ----
  (2 rows)

  drop table refd_table, trig_table;
+ --
+ -- self-referential FKs are even more fun
+ --
+ create table self_ref (a int primary key,
+                        b int references self_ref(a) on delete cascade);
+ create trigger self_ref_r_trig
+   after delete on self_ref referencing old table as old_table
+   for each row execute procedure dump_delete();
+ create trigger self_ref_s_trig
+   after delete on self_ref referencing old table as old_table
+   for each statement execute procedure dump_delete();
+ insert into self_ref values (1, null), (2, 1), (3, 2);
+ delete from self_ref where a = 1;
+ NOTICE:  trigger = self_ref_r_trig, old table = (1,), (2,1)
+ NOTICE:  trigger = self_ref_r_trig, old table = (1,), (2,1)
+ NOTICE:  trigger = self_ref_s_trig, old table = (1,), (2,1)
+ NOTICE:  trigger = self_ref_r_trig, old table = (3,2)
+ NOTICE:  trigger = self_ref_s_trig, old table = (3,2)
+ drop table self_ref;
  -- cleanup
  drop function dump_insert();
  drop function dump_update();
diff --git a/src/test/regress/sql/triggers.sql b/src/test/regress/sql/triggers.sql
index c6deb56..6c7e1c0 100644
*** a/src/test/regress/sql/triggers.sql
--- b/src/test/regress/sql/triggers.sql
*************** create trigger table2_trig
*** 1729,1734 ****
--- 1729,1740 ----
  with wcte as (insert into table1 values (42))
    insert into table2 values ('hello world');

+ with wcte as (insert into table1 values (43))
+   insert into table1 values (44);
+
+ select * from table1;
+ select * from table2;
+
  drop table table1;
  drop table table2;

*************** create trigger my_table_multievent_trig
*** 1769,1774 ****
--- 1775,1789 ----
    after insert or update on my_table referencing new table as new_table
    for each statement execute procedure dump_insert();

+ --
+ -- Verify that you can't create a trigger with transition tables with
+ -- a column list.
+ --
+
+ create trigger my_table_col_update_trig
+   after update of b on my_table referencing new table as new_table
+   for each statement execute procedure dump_insert();
+
  drop table my_table;

  --
*************** select * from trig_table;
*** 1812,1817 ****
--- 1827,1852 ----

  drop table refd_table, trig_table;

+ --
+ -- self-referential FKs are even more fun
+ --
+
+ create table self_ref (a int primary key,
+                        b int references self_ref(a) on delete cascade);
+
+ create trigger self_ref_r_trig
+   after delete on self_ref referencing old table as old_table
+   for each row execute procedure dump_delete();
+ create trigger self_ref_s_trig
+   after delete on self_ref referencing old table as old_table
+   for each statement execute procedure dump_delete();
+
+ insert into self_ref values (1, null), (2, 1), (3, 2);
+
+ delete from self_ref where a = 1;
+
+ drop table self_ref;
+
  -- cleanup
  drop function dump_insert();
  drop function dump_update();

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: [BUGS] BUG #14813: pg_get_serial_sequence does not return seqencename for IDENTITY columns
Следующее
От: Dave Vitek
Дата:
Сообщение: [BUGS] PANIC during exit on behalf of FATAL semop error