RE: [POC] Fast COPY FROM command for the table with foreign partitions

Поиск
Список
Период
Сортировка
От tsunakawa.takay@fujitsu.com
Тема RE: [POC] Fast COPY FROM command for the table with foreign partitions
Дата
Msg-id TYAPR01MB29902ABAE555355D9B90CC84FE1E0@TYAPR01MB2990.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [POC] Fast COPY FROM command for the table with foreign partitions  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Ответы Re: [POC] Fast COPY FROM command for the table with foreign partitions  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Список pgsql-hackers
Hello Andrey-san,


Thank you for challenging an interesting feature.  Below are my review comments.


(1)
-    /* for use by copy.c when performing multi-inserts */
+    /*
+     * The following fields are currently only relevant to copy.c.
+     *
+     * True if okay to use multi-insert on this relation
+     */
+    bool ri_usesMultiInsert;
+
+    /* Buffer allocated to this relation when using multi-insert mode */
     struct CopyMultiInsertBuffer *ri_CopyMultiInsertBuffer;
 } ResultRelInfo;

It's better to place the new bool member next to an existing bool member, so that the structure doesn't get larger.


(2)
+    Assert(rri->ri_usesMultiInsert == false);

As the above assertion represents, I'm afraid the semantics of ExecRelationAllowsMultiInsert() and
ResultRelInfo->ri_usesMultiInsertare unclear.  In CopyFrom(), ri_usesMultiInsert is set by also considering the
COPY-specificconditions:
 

+    if (!cstate->volatile_defexprs &&
+        !contain_volatile_functions(cstate->whereClause) &&
+        ExecRelationAllowsMultiInsert(target_resultRelInfo, NULL))
+        target_resultRelInfo->ri_usesMultiInsert = true;

On the other hand, in below ExecInitPartitionInfo(), ri_usesMultiInsert is set purely based on the relation's
characteristics.

+    leaf_part_rri->ri_usesMultiInsert =
+        ExecRelationAllowsMultiInsert(leaf_part_rri, rootResultRelInfo);

In addition to these differences, I think it's a bit confusing that the function itself doesn't record the check result
inri_usesMultiInsert.
 

It's probably easy to understand to not add ri_usesMultiInsert, and the function just encapsulates the check logic
basedsolely on the relation characteristics and returns the result.  So, the argument is just one ResultRelInfo.  The
caller(e.g. COPY) combines the function result with other specific conditions.
 


(3)
+typedef void (*BeginForeignCopy_function) (ModifyTableState *mtstate,
+                                             ResultRelInfo *rinfo);
+
+typedef void (*EndForeignCopy_function) (EState *estate,
+                                           ResultRelInfo *rinfo);
+
+typedef void (*ExecForeignCopy_function) (ResultRelInfo *rinfo,
+                                                       TupleTableSlot **slots,
+                                                       int nslots);

To align with other function groups, it's better to place the functions in order of Begin, Exec, and End.


(4)
+    /* COPY a bulk of tuples into a foreign relation */
+    BeginForeignCopy_function BeginForeignCopy;
+    EndForeignCopy_function EndForeignCopy;
+    ExecForeignCopy_function ExecForeignCopy;

To align with the other functions' comment, the comment should be:
    /* Support functions for COPY */


(5)
+<programlisting>
+TupleTableSlot *
+ExecForeignCopy(ResultRelInfo *rinfo,
+                  TupleTableSlot **slots,
+                  int nslots);
+</programlisting>
+
+     Copy a bulk of tuples into the foreign table.
+     <literal>estate</literal> is global execution state for the query.

The return type is void.


(6)
+     <literal>nslots</literal> cis a number of tuples in the <literal>slots</literal>

cis -> is


(7)
+    <para>
+     If the <function>ExecForeignCopy</function> pointer is set to
+     <literal>NULL</literal>, attempts to insert into the foreign table will fail
+     with an error message.
+    </para>

"attempts to insert into" should be "attempts to run COPY on", because it's used for COPY.
Furthermore, if ExecForeignCopy is NULL, COPY should use ExecForeignInsert() instead, right?  Otherwise, existing FDWs
wouldbecome unable to be used for COPY.
 


(8)
+    bool        pipe = (filename == NULL) && (data_dest_cb == NULL);

The above pipe in BeginCopyTo() is changed to not match pipe in DoCopyTo(), which only refers to filename.  Should pipe
inDoCopyTo() also be changed?  If no, the use of the same variable name for different conditions is confusing.
 


(9)
-     * partitions will be done later.
+-     * partitions will be done later.

This is an unintended addition of '-'?


(10)
-    if (resultRelInfo->ri_FdwRoutine != NULL &&
-        resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-        resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
-                                                         resultRelInfo);
+    if (target_resultRelInfo->ri_FdwRoutine != NULL)
+    {
+        if (target_resultRelInfo->ri_usesMultiInsert)
+            target_resultRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate,
+                                                                  resultRelInfo);
+        else if (target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+            target_resultRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,
+                                                                    resultRelInfo);
+    }

BeginForeignCopy() should be called if it's defined, because BeginForeignCopy() is an optional function.


(11) 
+        oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
+
+        table_multi_insert(resultRelInfo->ri_RelationDesc,
+                           slots,

The extra empty line seems unintended.


(12)
@@ -585,7 +583,8 @@ CopySendEndOfRow(CopyState cstate)
             (void) pq_putmessage('d', fe_msgbuf->data, fe_msgbuf->len);
             break;
         case COPY_CALLBACK:
-            Assert(false);        /* Not yet supported. */
+            CopySendChar(cstate, '\n');
+            cstate->data_dest_cb(fe_msgbuf->data, fe_msgbuf->len);

As in the COPY_FILENAME case, shouldn't the line terminator be sent only in text format, and be changed to \r\n on
Windows? I'm asking this as I'm probably a bit confused about in what situation COPY_CALLBACK could be used.  I thought
thebinary format and \r\n line terminator could be necessary depending on the FDW implementation.
 


(13)
@@ -1001,9 +1001,13 @@ ExecInitRoutingInfo(ModifyTableState *mtstate,
      * If the partition is a foreign table, let the FDW init itself for
      * routing tuples to the partition.
      */
-    if (partRelInfo->ri_FdwRoutine != NULL &&
-        partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
-        partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+    if (partRelInfo->ri_FdwRoutine != NULL)
+    {
+        if (partRelInfo->ri_usesMultiInsert)
+            partRelInfo->ri_FdwRoutine->BeginForeignCopy(mtstate, partRelInfo);
+        else if (partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
+            partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
+    }

BeginForeignCopy() should be called only if it's defined, because BeginForeignCopy() is an optional function.


(14)
@@ -1205,10 +1209,18 @@ ExecCleanupTupleRouting(ModifyTableState *mtstate,
         ResultRelInfo *resultRelInfo = proute->partitions[i];
 
         /* Allow any FDWs to shut down */
-        if (resultRelInfo->ri_FdwRoutine != NULL &&
-            resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
-            resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
-                                                           resultRelInfo);
+        if (resultRelInfo->ri_FdwRoutine != NULL)
+        {
+            if (resultRelInfo->ri_usesMultiInsert)
+            {
+                Assert(resultRelInfo->ri_FdwRoutine->EndForeignCopy != NULL);
+                resultRelInfo->ri_FdwRoutine->EndForeignCopy(mtstate->ps.state,
+                                                               resultRelInfo);
+            }
+            else if (resultRelInfo->ri_FdwRoutine->EndForeignInsert != NULL)
+                resultRelInfo->ri_FdwRoutine->EndForeignInsert(mtstate->ps.state,
+                                                               resultRelInfo);
+        }

EndForeignCopy() is an optional function, isn't it?  That is, it's called if it's defined.


(15)
+static void
+pgfdw_copy_dest_cb(void *buf, int len)
+{
+    PGconn *conn = copy_fmstate->conn;
+
+    if (PQputCopyData(conn, (char *) buf, len) <= 0)
+    {
+        PGresult *res = PQgetResult(conn);
+
+        pgfdw_report_error(ERROR, res, conn, true, copy_fmstate->query);
+    }
+}

The following page says "Use PQerrorMessage to retrieve details if the return value is -1."  So, it's correct to not
usePGresult here and pass NULL as the second argument to pgfdw_report_error().
 

https://www.postgresql.org/docs/devel/libpq-copy.html


(16)
+        for (i = 0; i < nslots; i++)
+            CopyOneRowTo(fmstate->cstate, slots[i]);
+
+        status = true;
+    }

I'm afraid it's not intuitive what "status is true" means.  I think copy_data_sent or copy_send_success would be better
forthe variable name.
 


(17)
+        if (PQputCopyEnd(conn, status ? NULL : _("canceled by server")) <= 0 ||
+            PQflush(conn))
+            ereport(ERROR,
+                    (errmsg("error returned by PQputCopyEnd: %s",
+                            PQerrorMessage(conn))));

As the places that call PQsendQuery(), it seems preferrable to call pgfdw_report_error() here too.


Regards
Takayuki Tsunakawa


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tang, Haiying"
Дата:
Сообщение: RE: Use of "long" in incremental sort code
Следующее
От: Ian Lawrence Barwick
Дата:
Сообщение: [patch] ENUM errdetail should mention bytes, not chars