RE: Parallel INSERT (INTO ... SELECT ...)

Поиск
Список
Период
Сортировка
От tsunakawa.takay@fujitsu.com
Тема RE: Parallel INSERT (INTO ... SELECT ...)
Дата
Msg-id TYAPR01MB29903F0BEB8D41D287697612FEBE0@TYAPR01MB2990.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Ответы Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Список pgsql-hackers
Hello Greg-san,


Second group of comments (I'll reply to (1) - (4) later):


(5)
@@ -790,7 +790,8 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
... 
-    if (plannedstmt->commandType != CMD_SELECT || plannedstmt->hasModifyingCTE)
+    if ((plannedstmt->commandType != CMD_SELECT &&
+         !IsModifySupportedInParallelMode(plannedstmt->commandType)) || plannedstmt->hasModifyingCTE)
         PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
 }

Now that we're trying to allow parallel writes (INSERT), we should:

* use ExecCheckXactReadOnly() solely for checking read-only transactions, as the function name represents.  That is,
movethe call to PreventCommandIfParallelMode() up to standard_ExecutorStart().
 

* Update the comment  above the call to ExecCheckXactReadOnly().


(6)
@@ -764,6 +777,22 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
...
+    else
+    {
+        pei->processed_count = NULL;
+    }

The braces can be deleted.


(7)
@@ -1400,6 +1439,16 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
                                          true);
     queryDesc = ExecParallelGetQueryDesc(toc, receiver, instrument_options);
 
+    Assert(queryDesc->operation == CMD_SELECT || IsModifySupportedInParallelMode(queryDesc->operation));
+    if (IsModifySupportedInParallelMode(queryDesc->operation))
+    {
+        /*
+         * Record that the CurrentCommandId is used, at the start of the
+         * parallel operation.
+         */
+        SetCurrentCommandIdUsedForWorker();
+    }
+
     /* Setting debug_query_string for individual workers */
     debug_query_string = queryDesc->sourceText;

@@ -765,12 +779,16 @@ GetCurrentCommandId(bool used)
     if (used)
     {
         /*
-         * Forbid setting currentCommandIdUsed in a parallel worker, because
-         * we have no provision for communicating this back to the leader.  We
-         * could relax this restriction when currentCommandIdUsed was already
-         * true at the start of the parallel operation.
+         * If in a parallel worker, only allow setting currentCommandIdUsed if
+         * currentCommandIdUsed was already true at the start of the parallel
+         * operation (by way of SetCurrentCommandIdUsed()), otherwise forbid
+         * setting currentCommandIdUsed because we have no provision for
+         * communicating this back to the leader. Once currentCommandIdUsed is
+         * set, the commandId used by leader and workers can't be changed,
+         * because CommandCounterIncrement() then prevents any attempted
+         * increment of the current commandId.
          */
-        Assert(!IsParallelWorker());
+        Assert(!(IsParallelWorker() && !currentCommandIdUsed));
         currentCommandIdUsed = true;
     }
     return currentCommandId;

What happens without these changes?  If this kind of change is really necessary, it seems more natural to pass
currentCommandIdUsedtogether with currentCommandId through SerializeTransactionState() and
StartParallelWorkerTransaction(),instead of the above changes.
 

As an aside, SetCurrentCommandIdUsed() in the comment should be SetCurrentCommandIdUsedForWorker().


(8)
+        /*
+         * If the trigger type is RI_TRIGGER_FK, this indicates a FK exists in
+         * the relation, and this would result in creation of new CommandIds
+         * on insert/update/delete and this isn't supported in a parallel
+         * worker (but is safe in the parallel leader).
+         */
+        trigtype = RI_FKey_trigger_type(trigger->tgfoid);
+        if (trigtype == RI_TRIGGER_FK)
+        {
+            if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
+                return true;
+        }

Here, RI_TRIGGER_FK should instead be RI_TRIGGER_PK, because RI_TRIGGER_FK triggers do not generate command IDs.  See
RI_FKey_check()which is called in RI_TRIGGER_FK case.  In there, ri_PerformCheck() is called with the detectNewRows
argumentset to false, which causes CommandCounterIncrement() to not be called.
 

Plus, tables that have RI_TRIGGER_PK should allow parallel INSERT in a parallel-safe manner, because those triggers
onlyfire for UPDATE and DELETE.  So, for the future parallel UPDATE/DELETE support, the above check should be performed
inUPDATE and DELETE cases.
 

(In a data warehouse, fact tables, which store large amounts of historical data, typically have foreign keys to smaller
dimensiontables.  Thus, it's important to allow parallel INSERTs on tables with foreign keys.)
 


Regards
Takayuki Tsunakawa


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Rofail
Дата:
Сообщение: Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Следующее
От: Masahiro Ikeda
Дата:
Сообщение: Re: About to add WAL write/fsync statistics to pg_stat_wal view