RE: Parallel INSERT (INTO ... SELECT ...)

Поиск
Список
Период
Сортировка
От tsunakawa.takay@fujitsu.com
Тема RE: Parallel INSERT (INTO ... SELECT ...)
Дата
Msg-id TYAPR01MB2990F7A8F316883ED57609C4FEA00@TYAPR01MB2990.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Ответы Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Re: Parallel INSERT (INTO ... SELECT ...)  (Greg Nancarrow <gregn4422@gmail.com>)
Список pgsql-hackers
Hello Greg-san,


Initially, some miner comments:


(1)
-     * (Note that we do allow CREATE TABLE AS, SELECT INTO, and CREATE
-     * MATERIALIZED VIEW to use parallel plans, but as of now, only the leader
-     * backend writes into a completely new table.  In the future, we can
-     * extend it to allow workers to write into the table.  However, to allow
-     * parallel updates and deletes, we have to solve other problems,
-     * especially around combo CIDs.)
+     * (Note that we do allow CREATE TABLE AS, INSERT INTO...SELECT, SELECT
+     * INTO, and CREATE MATERIALIZED VIEW to use parallel plans. However, as
+     * of now, only the leader backend writes into a completely new table. In

This can read "In INSERT INTO...SELECT case, like other existing cases, only the leader backend writes into a
completelynew table."  The reality is that workers as well as the leader can write into an empty or non-empty table in
parallel,isn't it?
 


(2)
 /*
  * RELATION_IS_LOCAL
- *        If a rel is either temp or newly created in the current transaction,
- *        it can be assumed to be accessible only to the current backend.
- *        This is typically used to decide that we can skip acquiring locks.
+ *        If a rel is temp, it can be assumed to be accessible only to the
+ *        current backend. This is typically used to decide that we can
+ *        skip acquiring locks.
  *
  * Beware of multiple eval of argument
  */
 #define RELATION_IS_LOCAL(relation) \
-    ((relation)->rd_islocaltemp || \
-     (relation)->rd_createSubid != InvalidSubTransactionId)
+    ((relation)->rd_islocaltemp)

How is this correct?  At least, this change would cause a transaction that creates a new relation acquire an
unnecessarylock.  I'm not sure if that overhead is worth worrying about (perhaps not, I guess).  But can we still check
>rd_createSubidin non-parallel mode?  If we adopt the above change, the comments at call sites need modification - "new
ortemp relation" becomes "temp relations".
 


(3)
@@ -173,9 +175,11 @@ ExecSerializePlan(Plan *plan, EState *estate)
...
-    pstmt->commandType = CMD_SELECT;
+    Assert(estate->es_plannedstmt->commandType == CMD_SELECT ||
+           IsModifySupportedInParallelMode(estate->es_plannedstmt->commandType));
+    pstmt->commandType = IsA(plan, ModifyTable) ? castNode(ModifyTable, plan)->operation : CMD_SELECT;

The last line can just be as follows, according to the Assert():

+    pstmt->commandType = estate->es_plannedstmt->commandType);


(4)
@@ -1527,7 +1528,9 @@ ExecutePlan(EState *estate,
     estate->es_use_parallel_mode = use_parallel_mode;
     if (use_parallel_mode)
     {
-        PrepareParallelMode(estate->es_plannedstmt->commandType);
+        bool        isParallelModifyLeader = IsA(planstate, GatherState) && IsA(outerPlanState(planstate),
ModifyTableState);
+
+        PrepareParallelMode(estate->es_plannedstmt->commandType, isParallelModifyLeader);
         EnterParallelMode();
     }

@@ -1021,12 +1039,25 @@ IsInParallelMode(void)
  * Prepare for entering parallel mode, based on command-type.
  */
 void
-PrepareParallelMode(CmdType commandType)
+PrepareParallelMode(CmdType commandType, bool isParallelModifyLeader)
 {
     if (IsModifySupportedInParallelMode(commandType))
     {
         Assert(!IsInParallelMode());
 
+        if (isParallelModifyLeader)
+        {
+            /*
+             * Set currentCommandIdUsed to true, to ensure that the current
+             * CommandId (which will be used by the parallel workers) won't
+             * change during this parallel operation, as starting new
+             * commands in parallel-mode is not currently supported.
+             * See related comments in GetCurrentCommandId and
+             * CommandCounterIncrement.
+             */
+            (void) GetCurrentCommandId(true);
+        }

I think we can eliminate the second argument of PrepareParallelMode() and the new code in ExecutePlan().
PrepareParallelMode()can use !IsParallelWorker() in the if condition, because the caller is either a would-be parallel
leaderor a parallel worker.
 

BTW, why do we want to add PrepareParallelMode() separately from EnterParallelMode()?  Someone who will read other call
sitesof EnterParallelMode() (index build, VACUUM) may be worried that PrepareParallelMode() call is missing there.  Can
wejust add an argument to EnterParallelMode()?  Other call sites can use CMD_UNKNOWN or CMD_UTILITY, if we want to use
CMD_XX.


Regards
Takayuki Tsunakawa


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joel Jacobson"
Дата:
Сообщение: Re: Add primary keys to system catalogs
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: PoC/WIP: Extended statistics on expressions