Re: [HACKERS] MERGE SQL Statement for PG11

Поиск
Список
Период
Сортировка
От Pavan Deolasee
Тема Re: [HACKERS] MERGE SQL Statement for PG11
Дата
Msg-id CABOikdMrWyEvttnzzG-naGAHt+00d=ZuNcGB8tnEB_fKEG+NvQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] MERGE SQL Statement for PG11  (Andres Freund <andres@anarazel.de>)
Ответы Re: [HACKERS] MERGE SQL Statement for PG11  (Simon Riggs <simon@2ndquadrant.com>)
Re: [HACKERS] MERGE SQL Statement for PG11  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers


On Tue, Apr 3, 2018 at 7:48 AM, Andres Freund <andres@anarazel.de> wrote:


@@ -3828,8 +3846,14 @@ struct AfterTriggersTableData
    bool        before_trig_done;   /* did we already queue BS triggers? */
    bool        after_trig_done;    /* did we already queue AS triggers? */
    AfterTriggerEventList after_trig_events;    /* if so, saved list pointer */
-   Tuplestorestate *old_tuplestore;    /* "old" transition table, if any */
-   Tuplestorestate *new_tuplestore;    /* "new" transition table, if any */
+   /* "old" transition table for UPDATE, if any */
+   Tuplestorestate *old_upd_tuplestore;
+   /* "new" transition table for UPDATE, if any */
+   Tuplestorestate *new_upd_tuplestore;
+   /* "old" transition table for DELETE, if any */
+   Tuplestorestate *old_del_tuplestore;
+   /* "new" transition table INSERT, if any */
+   Tuplestorestate *new_ins_tuplestore;
 };

A comment somewhere why we have all of these (presumably because they
can now happen in the context of a single statement) would be good.


Done.
 

@@ -5744,12 +5796,28 @@ AfterTriggerSaveEvent(EState *estate, ResultRelInfo *relinfo,
                 newtup == NULL));

        if (oldtup != NULL &&
-           ((event == TRIGGER_EVENT_DELETE && delete_old_table) ||
-            (event == TRIGGER_EVENT_UPDATE && update_old_table)))
+           (event == TRIGGER_EVENT_DELETE && delete_old_table))
        {
            Tuplestorestate *old_tuplestore;

-           old_tuplestore = transition_capture->tcs_private->old_tuplestore;
+           old_tuplestore = transition_capture->tcs_private->old_del_tuplestore;
+
+           if (map != NULL)
+           {
+               HeapTuple   converted = do_convert_tuple(oldtup, map);
+
+               tuplestore_puttuple(old_tuplestore, converted);
+               pfree(converted);
+           }
+           else
+               tuplestore_puttuple(old_tuplestore, oldtup);
+       }

Very similar code is now repeated four times. Could you abstract this
into a separate function?


Ok. I gave it a try, please check. It's definitely a lot lesser code.
 


--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -37,6 +37,16 @@ the plan tree returns the computed tuples to be updated, plus a "junk"
 one.  For DELETE, the plan tree need only deliver a CTID column, and the
 ModifyTable node visits each of those rows and marks the row deleted.

+MERGE runs one generic plan that returns candidate target rows. Each row
+consists of a super-row that contains all the columns needed by any of the
+individual actions, plus a CTID and a TABLEOID junk columns. The CTID column is

"plus a CTID and a TABLEOID junk columns" sounds a bit awkward?

Changed.



+required to know if a matching target row was found or not and the TABLEOID
+column is needed to find the underlying target partition, in case when the
+target table is a partition table. If the CTID column is set we attempt to
+activate WHEN MATCHED actions, or if it is NULL then we will attempt to

"if it is NULL then" sounds wrong.

Made some adjustments.
 

+activate WHEN NOT MATCHED actions. Once we know which action is activated we
+form the final result row and apply only those changes.
+
 XXX a great deal more documentation needs to be written here...

Are we using tableoids in a similar way in other places?


AFAIK, no. 
 


+/*
+ * Given OID of the partition leaf, return the index of the leaf in the
+ * partition hierarchy.
+ */
+int
+ExecFindPartitionByOid(PartitionTupleRouting *proute, Oid partoid)
+{
+   int i;
+
+   for (i = 0; i < proute->num_partitions; i++)
+   {
+       if (proute->partition_oids[i] == partoid)
+           break;
+   }
+
+   Assert(i < proute->num_partitions);
+   return i;
+}

Shouldn't we at least warn in a comment that this is O(N)? And document
that it does weird stuff if the OID isn't found?

Yeah, added a comment. Also added a ereport(ERROR) if we do not find the partition. There was already an Assert, but may be ERROR is better. 
 

Perhaps just introduce a PARTOID syscache?


Probably as a separate patch. Anything more than a handful partitions is anyways known to be too slow and I doubt this code will add anything material impact to that.
 


diff --git a/src/backend/executor/nodeMerge.c b/src/backend/executor/nodeMerge.c
new file mode 100644
index 00000000000..0e0d0795d4d
--- /dev/null
+++ b/src/backend/executor/nodeMerge.c
@@ -0,0 +1,575 @@
+/*-------------------------------------------------------------------------
+ *
+ * nodeMerge.c
+ *   routines to handle Merge nodes relating to the MERGE command

Isn't this file misnamed and it should be execMerge.c?  The rest of the
node*.c files are for nodes that are invoked via execProcnode /
ExecProcNode().  This isn't an actual executor node.

Makes sense. Done. (Now that the patch is committed, I don't know if there would be a rethink about changing file names. May be not, just raising that concern)
 

Might also be worthwhile to move the merge related init stuff here?


Done.
 


What's the reasoning behind making this be an anomaluous type of
executor node?


Didn't quite get that. I think naming of the file was bad (fixed now), but I think it's a good idea to move the new code in a new file, from maintainability as well as coverage perspective. If you've something very different in mind, can you explain in more details?
 


FWIW, I'd re-order this file so this routine is above
ExecMergeMatched(), ExecMergeNotMatched(), easier to understand.

Done.
 


+   /*
+    * If there are not WHEN MATCHED actions, we are done.
+    */
+   if (mergeMatchedActionStates == NIL)
+       return true;

Maybe I'm confused, but why is mergeMatchedActionStates attached to
per-partition info? How can this differ in number between partitions,
requiring us to re-check it below fetching the partition info above?


Because each partition may have a columns in different order, dropped attributes etc. So we need to give treatment to the quals/targetlists. See ON CONFLICT DO UPDATE for similar code.
 

+       /*
+        * Check for any concurrent update/delete operation which may have
+        * prevented our update/delete. We also check for situations where we
+        * might be trying to update/delete the same tuple twice.
+        */
+       if ((action->commandType == CMD_UPDATE && !tuple_updated) ||
+           (action->commandType == CMD_DELETE && !tuple_deleted))
+
+       {
+           switch (hufd.result)
+           {
+               case HeapTupleMayBeUpdated:
+                   break;
+               case HeapTupleInvisible:
+
+                   /*
+                    * This state should never be reached since the underlying
+                    * JOIN runs with a MVCC snapshot and should only return
+                    * rows visible to us.
+                    */

Given EPQ, that reasoning isn't correct. I think this should still be
unreachable, just not for the reason described here.


Agree. Updated the comment, but please check if it's satisfactory or you would like to say something more/different.
 


It seems fairly bad architecturally to me that we now have
EvalPlanQual() loops in both this routine *and*
ExecUpdate()/ExecDelete().


This was done after review by Peter and I think I like the new way too. Also keeps the regular UPDATE/DELETE code paths least changed and let Merge handle concurrency issues specific to it.
 

+
+/*
+ * Execute the first qualifying NOT MATCHED action.
+ */
+static void
+ExecMergeNotMatched(ModifyTableState *mtstate, EState *estate,
+                   TupleTableSlot *slot)
+{
+   PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
+   ExprContext *econtext = mtstate->ps.ps_ExprContext;
+   List       *mergeNotMatchedActionStates = NIL;
+   ResultRelInfo *resultRelInfo;
+   ListCell   *l;
+   TupleTableSlot  *myslot;
+
+   /*
+    * We are dealing with NOT MATCHED tuple. Since for MERGE, partition tree

*the partition tree

Fixed.
 


+    * is not expanded for the result relation, we continue to work with the
+    * currently active result relation, which should be of the root of the
+    * partition tree.
+    */
+   resultRelInfo = mtstate->resultRelInfo;

"should be"? "is", I hope?  Given that it's referencing
mtstate->resultRelInfo which is only set in one place...


Yeah, fixed.
 


+/* flags for mt_merge_subcommands */
+#define MERGE_INSERT   0x01
+#define MERGE_UPDATE   0x02
+#define MERGE_DELETE   0x04

Hm, it seems a bit weird to define these file-local, given there's
another file implementing a good chunk of merge.

Ok. Moved them execMerge.h, which made sense after I moved the initialisation related code to execMerge.c

 

+   /*
+    * Initialize hufdp. Since the caller is only interested in the failure
+    * status, initialize with the state that is used to indicate successful
+    * operation.
+    */
+   if (hufdp)
+       hufdp->result = HeapTupleMayBeUpdated;
+

This signals for me that the addition of the result field to HUFD wasn't
architecturally the right thing. HUFD is supposed to be supposed to be
returned by heap_update(), reusing and setting it from other places
seems like a layering violation to me.


I am not sure I agree. Sure we can keep adding more parameters to ExecUpdate/ExecDelete and such routines, but I thought passing back all information via a single struct makes more sense.
 


diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 69dd327f0c9..cd540a0df5b 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -851,6 +851,60 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
                        fix_scan_list(root, splan->exclRelTlist, rtoffset);
                }

+               /*
+                * The MERGE produces the target rows by performing a right

s/the MERGE/the MERGE statement/?


Fixed.
 

+                * join between the target relation and the source relation
+                * (which could be a plain relation or a subquery). The INSERT
+                * and UPDATE actions of the MERGE requires access to the

same.


Fixed.
 


+opt_and_condition:

Renaming this to be a bit more merge specific seems like a good idea.

Renamed to opt_merge_when_and_condition
 

+
+   /*
+    * Add a whole-row-Var entry to support references to "source.*".
+    */
+   var = makeWholeRowVar(rt_rte, rt_rtindex, 0, false);
+   te = makeTargetEntry((Expr *) var, list_length(*mergeSourceTargetList) + 1,
+                        NULL, true);

Why can't this properly dealt with by transformWholeRowRef() etc?


I just followed ON CONFLICT style. May be there is a better way, but not clear how.
 


+                   if (selectStmt && (selectStmt->valuesLists == NIL ||
+                                      selectStmt->sortClause != NIL ||
+                                      selectStmt->limitOffset != NULL ||
+                                      selectStmt->limitCount != NULL ||
+                                      selectStmt->lockingClause != NIL ||
+                                      selectStmt->withClause != NULL))
+                       ereport(ERROR,
+                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                errmsg("SELECT not allowed in MERGE INSERT statement")));

+                   if (selectStmt && list_length(selectStmt->valuesLists) > 1)
+                       ereport(ERROR,
+                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                errmsg("Multiple VALUES clauses not allowed in MERGE INSERT statement")));

Shouldn't this include an error position?


I will work on this as a separate follow-up patch. I tried adding location to MergeAction, but that alone is probably not sufficient. So it was turning out a slightly bigger patch than I anticipated.
 

Why is this, and not building a proper executor node for merge that
knows how to get the tuples, the right approach?  We did a rough
equivalent for matview updates, and I think it turned out to be a pretty
bad plan.


I am still not sure why that would be any better. Can you explain in detail what exactly you've in mind and how's that significantly better than what we have today?
 


+   /*
+    * XXX MERGE is unsupported in various cases
+    */
+   if (!(pstate->p_target_relation->rd_rel->relkind == RELKIND_RELATION ||
+         pstate->p_target_relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
+       ereport(ERROR,
+               (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+                errmsg("MERGE is not supported for this relation type")));

Shouldn't this report what relation type the error talking about? It's
not going to necessarily be obvious to the user.  Also, errposition etc
would be good.


Hmm, ok. There is getRelationTypeDescription() but otherwise I am surprised that I couldn't find a ready way to get string representation of relation kind. Am I missing something? Of course, we can write for the purpose, but wanted to ensure we are not duplicating something already available.
 


+                       /*
+                        * Do basic expression transformation (same as a ROW()
+                        * expr, but allow SetToDefault at top level)
+                        */
+                       exprList = transformExpressionList(pstate,
+                                                          (List *) linitial(valuesLists),
+                                                          EXPR_KIND_VALUES_SINGLE,
+                                                          true);
+
+                       /* Prepare row for assignment to target table */
+                       exprList = transformInsertRow(pstate, exprList,
+                                                     istmt->cols,
+                                                     icolumns, attrnos,
+                                                     false);
+                   }

Can't we handle this with a littlebit less code duplication?

Hmm, yeah, that will be good. But given Tom's suggestions on the other thread, I would like to postpone any refactoring here.
 

 typedef struct HeapUpdateFailureData
 {
+   HTSU_Result result;
    ItemPointerData ctid;
    TransactionId xmax;
    CommandId   cmax;
+   LockTupleMode   lockmode;
 } HeapUpdateFailureData;


These new fields seem not really relateto HUFD, but rather just fields
the merge code should maintain?

As I said, we can possibly track those separately. But they all arise as a result of update/delete/lock failure. So I would prefer to keep them along with other fields such as ctid and xmax/cmax. But if you or others insist, I can move them and pass in/out of various function calls where we need to maintain those.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] GUC for cleanup indexes threshold.
Следующее
От: Etsuro Fujita
Дата:
Сообщение: Re: [HACKERS] Add support for tuple routing to foreign partitions